Class FST<T>

  • All Implemented Interfaces:
    Accountable

    public final class FST<T>
    extends java.lang.Object
    implements Accountable
    Represents an finite state machine (FST), using a compact byte[] format.

    The format is similar to what's used by Morfologik (https://github.com/morfologik/morfologik-stemming).

    See the package documentation for some simple examples.

    • Field Detail

      • BASE_RAM_BYTES_USED

        private static final long BASE_RAM_BYTES_USED
      • BIT_ARC_HAS_OUTPUT

        public static final int BIT_ARC_HAS_OUTPUT
        This flag is set if the arc has an output.
        See Also:
        Constant Field Values
      • ARCS_FOR_BINARY_SEARCH

        public static final byte ARCS_FOR_BINARY_SEARCH
        Value of the arc flags to declare a node with fixed length (sparse) arcs designed for binary search.
        See Also:
        Constant Field Values
      • ARCS_FOR_DIRECT_ADDRESSING

        static final byte ARCS_FOR_DIRECT_ADDRESSING
        Value of the arc flags to declare a node with fixed length dense arcs and bit table designed for direct addressing.
        See Also:
        Constant Field Values
      • ARCS_FOR_CONTINUOUS

        static final byte ARCS_FOR_CONTINUOUS
        Value of the arc flags to declare a node with continuous arcs designed for pos the arc directly with labelToPos - firstLabel. like ARCS_FOR_BINARY_SEARCH we use flag combinations that will not occur at the same time.
        See Also:
        Constant Field Values
      • FILE_FORMAT_NAME

        private static final java.lang.String FILE_FORMAT_NAME
        See Also:
        Constant Field Values
      • VERSION_START

        public static final int VERSION_START
        First supported version, this is the version that was used when releasing Lucene 7.0.
        See Also:
        Constant Field Values
      • VERSION_CONTINUOUS_ARCS

        public static final int VERSION_CONTINUOUS_ARCS
        Version that started storing continuous arcs.
        See Also:
        Constant Field Values
      • VERSION_CURRENT

        public static final int VERSION_CURRENT
        Current version.
        See Also:
        Constant Field Values
      • VERSION_90

        public static final int VERSION_90
        Version that was used when releasing Lucene 9.0.
        See Also:
        Constant Field Values
      • END_LABEL

        public static final int END_LABEL
        If arc has this label then that arc is final/accepted
        See Also:
        Constant Field Values
      • fstReader

        private final FSTReader fstReader
        A BytesStore, used during building, or during reading when the FST is very large (more than 1 GB). If the FST is less than 1 GB then bytesArray is set instead.
      • outputs

        public final Outputs<T> outputs
      • DEFAULT_MAX_BLOCK_BITS

        private static final int DEFAULT_MAX_BLOCK_BITS
    • Constructor Detail

      • FST

        public FST​(FST.FSTMetadata<T> metadata,
                   DataInput in,
                   FSTStore fstStore)
            throws java.io.IOException
        Load a previously saved FST with a metdata object and a FSTStore. If using OnHeapFSTStore, setting maxBlockBits allows you to control the size of the byte[] pages used to hold the FST bytes.
        Throws:
        java.io.IOException
    • Method Detail

      • flag

        private static boolean flag​(int flags,
                                    int bit)
      • readMetadata

        public static <T> FST.FSTMetadata<T> readMetadata​(DataInput metaIn,
                                                          Outputs<T> outputs)
                                                   throws java.io.IOException
        Read the FST metadata from DataInput
        Type Parameters:
        T - the output type
        Parameters:
        metaIn - the DataInput of the metadata
        outputs - the FST outputs
        Returns:
        the FST metadata
        Throws:
        java.io.IOException - if exception occurred during parsing
      • ramBytesUsed

        public long ramBytesUsed()
        Description copied from interface: Accountable
        Return the memory usage of this object in bytes. Negative values are illegal.
        Specified by:
        ramBytesUsed in interface Accountable
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • numBytes

        public long numBytes()
      • getEmptyOutput

        public T getEmptyOutput()
      • save

        public void save​(DataOutput metaOut,
                         DataOutput out)
                  throws java.io.IOException
        Throws:
        java.io.IOException
      • saveMetadata

        public void saveMetadata​(DataOutput metaOut)
                          throws java.io.IOException
        Save the metadata to a DataOutput
        Parameters:
        metaOut - the DataOutput to save
        Throws:
        java.io.IOException
      • save

        public void save​(java.nio.file.Path path)
                  throws java.io.IOException
        Writes an automaton to a file.
        Throws:
        java.io.IOException
      • read

        public static <T> FST<T> read​(java.nio.file.Path path,
                                      Outputs<T> outputs)
                               throws java.io.IOException
        Reads an automaton from a file.
        Throws:
        java.io.IOException
      • readLabel

        public int readLabel​(DataInput in)
                      throws java.io.IOException
        Reads one BYTE1/2/4 label from the provided DataInput.
        Throws:
        java.io.IOException
      • targetHasArcs

        public static <T> boolean targetHasArcs​(FST.Arc<T> arc)
        returns true if the node at this address has any outgoing arcs
      • getNumPresenceBytes

        static int getNumPresenceBytes​(int labelRange)
        Gets the number of bytes required to flag the presence of each arc in the given label range, one bit per arc.
      • readPresenceBytes

        private void readPresenceBytes​(FST.Arc<T> arc,
                                       FST.BytesReader in)
                                throws java.io.IOException
        Reads the presence bits of a direct-addressing node. Actually we don't read them here, we just keep the pointer to the bit-table start and we skip them.
        Throws:
        java.io.IOException
      • getFirstArc

        public FST.Arc<T> getFirstArc​(FST.Arc<T> arc)
        Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start node
      • readLastTargetArc

        FST.Arc<T> readLastTargetArc​(FST.Arc<T> follow,
                                     FST.Arc<T> arc,
                                     FST.BytesReader in)
                              throws java.io.IOException
        Follows the follow arc and reads the last arc of its target; this changes the provided arc (2nd arg) in-place and returns it.
        Returns:
        Returns the second argument (arc).
        Throws:
        java.io.IOException
      • readUnpackedNodeTarget

        private long readUnpackedNodeTarget​(FST.BytesReader in)
                                     throws java.io.IOException
        Throws:
        java.io.IOException
      • readFirstTargetArc

        public FST.Arc<T> readFirstTargetArc​(FST.Arc<T> follow,
                                             FST.Arc<T> arc,
                                             FST.BytesReader in)
                                      throws java.io.IOException
        Follow the follow arc and read the first arc of its target; this changes the provided arc (2nd arg) in-place and returns it.
        Returns:
        Returns the second argument (arc).
        Throws:
        java.io.IOException
      • readFirstArcInfo

        private void readFirstArcInfo​(long nodeAddress,
                                      FST.Arc<T> arc,
                                      FST.BytesReader in)
                               throws java.io.IOException
        Throws:
        java.io.IOException
      • readFirstRealTargetArc

        public FST.Arc<T> readFirstRealTargetArc​(long nodeAddress,
                                                 FST.Arc<T> arc,
                                                 FST.BytesReader in)
                                          throws java.io.IOException
        Throws:
        java.io.IOException
      • isExpandedTarget

        boolean isExpandedTarget​(FST.Arc<T> follow,
                                 FST.BytesReader in)
                          throws java.io.IOException
        Returns whether arc's target points to a node in expanded format (fixed length arcs).
        Throws:
        java.io.IOException
      • readNextArc

        public FST.Arc<T> readNextArc​(FST.Arc<T> arc,
                                      FST.BytesReader in)
                               throws java.io.IOException
        In-place read; returns the arc.
        Throws:
        java.io.IOException
      • readNextArcLabel

        int readNextArcLabel​(FST.Arc<T> arc,
                             FST.BytesReader in)
                      throws java.io.IOException
        Peeks at next arc's label; does not alter arc. Do not call this if arc.isLast()!
        Throws:
        java.io.IOException
      • readArcByIndex

        public FST.Arc<T> readArcByIndex​(FST.Arc<T> arc,
                                         FST.BytesReader in,
                                         int idx)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • readArcByContinuous

        public FST.Arc<T> readArcByContinuous​(FST.Arc<T> arc,
                                              FST.BytesReader in,
                                              int rangeIndex)
                                       throws java.io.IOException
        Reads a Continuous node arc, with the provided index in the label range.
        Parameters:
        rangeIndex - The index of the arc in the label range. It must be within the label range.
        Throws:
        java.io.IOException
      • readArcByDirectAddressing

        public FST.Arc<T> readArcByDirectAddressing​(FST.Arc<T> arc,
                                                    FST.BytesReader in,
                                                    int rangeIndex)
                                             throws java.io.IOException
        Reads a present direct addressing node arc, with the provided index in the label range.
        Parameters:
        rangeIndex - The index of the arc in the label range. It must be present. The real arc offset is computed based on the presence bits of the direct addressing node.
        Throws:
        java.io.IOException
      • readArcByDirectAddressing

        private FST.Arc<T> readArcByDirectAddressing​(FST.Arc<T> arc,
                                                     FST.BytesReader in,
                                                     int rangeIndex,
                                                     int presenceIndex)
                                              throws java.io.IOException
        Reads a present direct addressing node arc, with the provided index in the label range and its corresponding presence index (which is the count of presence bits before it).
        Throws:
        java.io.IOException
      • readLastArcByContinuous

        public FST.Arc<T> readLastArcByContinuous​(FST.Arc<T> arc,
                                                  FST.BytesReader in)
                                           throws java.io.IOException
        Reads the last arc of a continuous node.
        Throws:
        java.io.IOException
      • readNextRealArc

        public FST.Arc<T> readNextRealArc​(FST.Arc<T> arc,
                                          FST.BytesReader in)
                                   throws java.io.IOException
        Never returns null, but you should never call this if arc.isLast() is true.
        Throws:
        java.io.IOException
      • readArc

        private FST.Arc<T> readArc​(FST.Arc<T> arc,
                                   FST.BytesReader in)
                            throws java.io.IOException
        Reads an arc.
        Precondition: The arc flags byte has already been read and set; the given BytesReader is positioned just after the arc flags byte.
        Throws:
        java.io.IOException
      • findTargetArc

        public FST.Arc<T> findTargetArc​(int labelToMatch,
                                        FST.Arc<T> follow,
                                        FST.Arc<T> arc,
                                        FST.BytesReader in)
                                 throws java.io.IOException
        Finds an arc leaving the incoming arc, replacing the arc in place. This returns null if the arc was not found, else the incoming arc.
        Throws:
        java.io.IOException
      • seekToNextNode

        private void seekToNextNode​(FST.BytesReader in)
                             throws java.io.IOException
        Throws:
        java.io.IOException