Class Util


  • public final class Util
    extends java.lang.Object
    Static helper methods.
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private Util()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      (package private) static <T> int binarySearch​(FST<T> fst, FST.Arc<T> arc, int targetLabel)
      Perform a binary search of Arcs encoded as a packed array
      private static void emitDotState​(java.io.Writer out, java.lang.String name, java.lang.String shape, java.lang.String color, java.lang.String label)
      Emit a single state in the dot language.
      static <T> T get​(FST<T> fst, BytesRef input)
      Looks up the output for this input, or null if the input is not accepted
      static <T> T get​(FST<T> fst, IntsRef input)
      Looks up the output for this input, or null if the input is not accepted.
      private static java.lang.String printableLabel​(int label)
      Ensures an arc's label is indeed printable (dot uses US-ASCII).
      static <T> FST.Arc<T> readCeilArc​(int label, FST<T> fst, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
      Reads the first arc greater or equal than the given label into the provided arc in place and returns it iff found, otherwise return null.
      static <T> Util.TopResults<T> shortestPaths​(FST<T> fst, FST.Arc<T> fromNode, T startOutput, java.util.Comparator<T> comparator, int topN, boolean allowEmptyString)
      Starting from node, find the top N min cost completions to a final node.
      static BytesRef toBytesRef​(IntsRef input, BytesRefBuilder scratch)
      Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.
      static <T> void toDot​(FST<T> fst, java.io.Writer out, boolean sameRank, boolean labelStates)
      Dumps an FST to a GraphViz's dot language description for visualization.
      static IntsRef toIntsRef​(BytesRef input, IntsRefBuilder scratch)
      Just takes unsigned byte values from the BytesRef and converts into an IntsRef.
      static IntsRef toUTF16​(java.lang.CharSequence s, IntsRefBuilder scratch)
      Just maps each UTF16 unit (char) to the ints in an IntsRef.
      static IntsRef toUTF32​(char[] s, int offset, int length, IntsRefBuilder scratch)
      Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.
      static IntsRef toUTF32​(java.lang.CharSequence s, IntsRefBuilder scratch)
      Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Util

        private Util()
    • Method Detail

      • get

        public static <T> T get​(FST<T> fst,
                                IntsRef input)
                         throws java.io.IOException
        Looks up the output for this input, or null if the input is not accepted.
        Throws:
        java.io.IOException
      • get

        public static <T> T get​(FST<T> fst,
                                BytesRef input)
                         throws java.io.IOException
        Looks up the output for this input, or null if the input is not accepted
        Throws:
        java.io.IOException
      • shortestPaths

        public static <T> Util.TopResults<T> shortestPaths​(FST<T> fst,
                                                           FST.Arc<T> fromNode,
                                                           T startOutput,
                                                           java.util.Comparator<T> comparator,
                                                           int topN,
                                                           boolean allowEmptyString)
                                                    throws java.io.IOException
        Starting from node, find the top N min cost completions to a final node.
        Throws:
        java.io.IOException
      • toDot

        public static <T> void toDot​(FST<T> fst,
                                     java.io.Writer out,
                                     boolean sameRank,
                                     boolean labelStates)
                              throws java.io.IOException
        Dumps an FST to a GraphViz's dot language description for visualization. Example of use:
         PrintWriter pw = new PrintWriter("out.dot");
         Util.toDot(fst, pw, true, true);
         pw.close();
         
        and then, from command line:
         dot -Tpng -o out.png out.dot
         

        Note: larger FSTs (a few thousand nodes) won't even render, don't bother.

        Parameters:
        sameRank - If true, the resulting dot file will try to order states in layers of breadth-first traversal. This may mess up arcs, but makes the output FST's structure a bit clearer.
        labelStates - If true states will have labels equal to their offsets in their binary format. Expands the graph considerably.
        Throws:
        java.io.IOException
        See Also:
        graphviz project
      • emitDotState

        private static void emitDotState​(java.io.Writer out,
                                         java.lang.String name,
                                         java.lang.String shape,
                                         java.lang.String color,
                                         java.lang.String label)
                                  throws java.io.IOException
        Emit a single state in the dot language.
        Throws:
        java.io.IOException
      • printableLabel

        private static java.lang.String printableLabel​(int label)
        Ensures an arc's label is indeed printable (dot uses US-ASCII).
      • toUTF16

        public static IntsRef toUTF16​(java.lang.CharSequence s,
                                      IntsRefBuilder scratch)
        Just maps each UTF16 unit (char) to the ints in an IntsRef.
      • toUTF32

        public static IntsRef toUTF32​(java.lang.CharSequence s,
                                      IntsRefBuilder scratch)
        Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
      • toUTF32

        public static IntsRef toUTF32​(char[] s,
                                      int offset,
                                      int length,
                                      IntsRefBuilder scratch)
        Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.
      • toIntsRef

        public static IntsRef toIntsRef​(BytesRef input,
                                        IntsRefBuilder scratch)
        Just takes unsigned byte values from the BytesRef and converts into an IntsRef.
      • toBytesRef

        public static BytesRef toBytesRef​(IntsRef input,
                                          BytesRefBuilder scratch)
        Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.
      • readCeilArc

        public static <T> FST.Arc<T> readCeilArc​(int label,
                                                 FST<T> fst,
                                                 FST.Arc<T> follow,
                                                 FST.Arc<T> arc,
                                                 FST.BytesReader in)
                                          throws java.io.IOException
        Reads the first arc greater or equal than the given label into the provided arc in place and returns it iff found, otherwise return null.
        Parameters:
        label - the label to ceil on
        fst - the fst to operate on
        follow - the arc to follow reading the label from
        arc - the arc to read into in place
        in - the fst's FST.BytesReader
        Throws:
        java.io.IOException
      • binarySearch

        static <T> int binarySearch​(FST<T> fst,
                                    FST.Arc<T> arc,
                                    int targetLabel)
                             throws java.io.IOException
        Perform a binary search of Arcs encoded as a packed array
        Type Parameters:
        T - the output type of the FST
        Parameters:
        fst - the FST from which to read
        arc - the starting arc; sibling arcs greater than this will be searched. Usually the first arc in the array.
        targetLabel - the label to search for
        Returns:
        the index of the Arc having the target label, or if no Arc has the matching label, -1 - idx), where idx is the index of the Arc with the next highest label, or the total number of arcs if the target label exceeds the maximum.
        Throws:
        java.io.IOException - when the FST reader does