- java.lang.Object
-
- org.apache.lucene.util.fst.FST<T>
-
- All Implemented Interfaces:
Accountable
public final class FST<T> extends java.lang.Object implements Accountable
Represents an finite state machine (FST), using a compact byte[] format.The format is similar to what's used by Morfologik (https://github.com/morfologik/morfologik-stemming).
See the
package documentation
for some simple examples.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
FST.Arc<T>
Represents a single arc.static class
FST.BytesReader
Reads bytes stored in an FST.static class
FST.FSTMetadata<T>
Represent the FST metadatastatic class
FST.INPUT_TYPE
Specifies allowed range of each int input label for this FST.
-
Field Summary
Fields Modifier and Type Field Description static byte
ARCS_FOR_BINARY_SEARCH
Value of the arc flags to declare a node with fixed length (sparse) arcs designed for binary search.(package private) static byte
ARCS_FOR_CONTINUOUS
Value of the arc flags to declare a node with continuous arcs designed for pos the arc directly with labelToPos - firstLabel.(package private) static byte
ARCS_FOR_DIRECT_ADDRESSING
Value of the arc flags to declare a node with fixed length dense arcs and bit table designed for direct addressing.private static long
BASE_RAM_BYTES_USED
(package private) static int
BIT_ARC_HAS_FINAL_OUTPUT
static int
BIT_ARC_HAS_OUTPUT
This flag is set if the arc has an output.(package private) static int
BIT_FINAL_ARC
(package private) static int
BIT_LAST_ARC
(package private) static int
BIT_STOP_NODE
(package private) static int
BIT_TARGET_NEXT
private static int
DEFAULT_MAX_BLOCK_BITS
static int
END_LABEL
If arc has this label then that arc is final/acceptedprivate static java.lang.String
FILE_FORMAT_NAME
(package private) static long
FINAL_END_NODE
private FSTReader
fstReader
ABytesStore
, used during building, or during reading when the FST is very large (more than 1 GB).(package private) FST.FSTMetadata<T>
metadata
(package private) static long
NON_FINAL_END_NODE
Outputs<T>
outputs
static int
VERSION_90
Version that was used when releasing Lucene 9.0.static int
VERSION_CONTINUOUS_ARCS
Version that started storing continuous arcs.static int
VERSION_CURRENT
Current version.private static int
VERSION_LITTLE_ENDIAN
static int
VERSION_START
First supported version, this is the version that was used when releasing Lucene 7.0.-
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
-
Constructor Summary
Constructors Constructor Description FST(FST.FSTMetadata<T> metadata, DataInput in)
Load a previously saved FST with a DataInput for metdata using anOnHeapFSTStore
with maxBlockBits set toDEFAULT_MAX_BLOCK_BITS
FST(FST.FSTMetadata<T> metadata, DataInput in, FSTStore fstStore)
Load a previously saved FST with a metdata object and a FSTStore.FST(FST.FSTMetadata<T> metadata, FSTReader fstReader)
Create the FST with a metadata object and a FSTReader.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description FST.Arc<T>
findTargetArc(int labelToMatch, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
Finds an arc leaving the incoming arc, replacing the arc in place.private static boolean
flag(int flags, int bit)
FST.BytesReader
getBytesReader()
Returns aFST.BytesReader
for this FST, positioned at position 0.T
getEmptyOutput()
FST.Arc<T>
getFirstArc(FST.Arc<T> arc)
Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start nodeFST.FSTMetadata<T>
getMetadata()
(package private) static int
getNumPresenceBytes(int labelRange)
Gets the number of bytes required to flag the presence of each arc in the given label range, one bit per arc.(package private) boolean
isExpandedTarget(FST.Arc<T> follow, FST.BytesReader in)
Returns whetherarc
's target points to a node in expanded format (fixed length arcs).long
numBytes()
long
ramBytesUsed()
Return the memory usage of this object in bytes.static <T> FST<T>
read(java.nio.file.Path path, Outputs<T> outputs)
Reads an automaton from a file.private FST.Arc<T>
readArc(FST.Arc<T> arc, FST.BytesReader in)
Reads an arc.FST.Arc<T>
readArcByContinuous(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex)
Reads a Continuous node arc, with the provided index in the label range.FST.Arc<T>
readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex)
Reads a present direct addressing node arc, with the provided index in the label range.private FST.Arc<T>
readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex, int presenceIndex)
Reads a present direct addressing node arc, with the provided index in the label range and its corresponding presence index (which is the count of presence bits before it).FST.Arc<T>
readArcByIndex(FST.Arc<T> arc, FST.BytesReader in, int idx)
(package private) static <T> FST.Arc<T>
readEndArc(FST.Arc<T> follow, FST.Arc<T> arc)
private void
readFirstArcInfo(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in)
FST.Arc<T>
readFirstRealTargetArc(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in)
FST.Arc<T>
readFirstTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
Follow thefollow
arc and read the first arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.int
readLabel(DataInput in)
Reads one BYTE1/2/4 label from the providedDataInput
.FST.Arc<T>
readLastArcByContinuous(FST.Arc<T> arc, FST.BytesReader in)
Reads the last arc of a continuous node.FST.Arc<T>
readLastArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in)
Reads the last arc of a direct addressing node.(package private) FST.Arc<T>
readLastTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
Follows thefollow
arc and reads the last arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.static <T> FST.FSTMetadata<T>
readMetadata(DataInput metaIn, Outputs<T> outputs)
Read the FST metadata from DataInputFST.Arc<T>
readNextArc(FST.Arc<T> arc, FST.BytesReader in)
In-place read; returns the arc.(package private) int
readNextArcLabel(FST.Arc<T> arc, FST.BytesReader in)
Peeks at next arc's label; does not alter arc.FST.Arc<T>
readNextRealArc(FST.Arc<T> arc, FST.BytesReader in)
Never returns null, but you should never call this if arc.isLast() is true.private void
readPresenceBytes(FST.Arc<T> arc, FST.BytesReader in)
Reads the presence bits of a direct-addressing node.private long
readUnpackedNodeTarget(FST.BytesReader in)
void
save(java.nio.file.Path path)
Writes an automaton to a file.void
save(DataOutput metaOut, DataOutput out)
void
saveMetadata(DataOutput metaOut)
Save the metadata to a DataOutputprivate void
seekToNextNode(FST.BytesReader in)
static <T> boolean
targetHasArcs(FST.Arc<T> arc)
returns true if the node at this address has any outgoing arcsjava.lang.String
toString()
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
-
-
-
Field Detail
-
metadata
final FST.FSTMetadata<T> metadata
-
BASE_RAM_BYTES_USED
private static final long BASE_RAM_BYTES_USED
-
BIT_FINAL_ARC
static final int BIT_FINAL_ARC
- See Also:
- Constant Field Values
-
BIT_LAST_ARC
static final int BIT_LAST_ARC
- See Also:
- Constant Field Values
-
BIT_TARGET_NEXT
static final int BIT_TARGET_NEXT
- See Also:
- Constant Field Values
-
BIT_STOP_NODE
static final int BIT_STOP_NODE
- See Also:
- Constant Field Values
-
BIT_ARC_HAS_OUTPUT
public static final int BIT_ARC_HAS_OUTPUT
This flag is set if the arc has an output.- See Also:
- Constant Field Values
-
BIT_ARC_HAS_FINAL_OUTPUT
static final int BIT_ARC_HAS_FINAL_OUTPUT
- See Also:
- Constant Field Values
-
ARCS_FOR_BINARY_SEARCH
public static final byte ARCS_FOR_BINARY_SEARCH
Value of the arc flags to declare a node with fixed length (sparse) arcs designed for binary search.- See Also:
- Constant Field Values
-
ARCS_FOR_DIRECT_ADDRESSING
static final byte ARCS_FOR_DIRECT_ADDRESSING
Value of the arc flags to declare a node with fixed length dense arcs and bit table designed for direct addressing.- See Also:
- Constant Field Values
-
ARCS_FOR_CONTINUOUS
static final byte ARCS_FOR_CONTINUOUS
Value of the arc flags to declare a node with continuous arcs designed for pos the arc directly with labelToPos - firstLabel. likeARCS_FOR_BINARY_SEARCH
we use flag combinations that will not occur at the same time.- See Also:
- Constant Field Values
-
FILE_FORMAT_NAME
private static final java.lang.String FILE_FORMAT_NAME
- See Also:
- Constant Field Values
-
VERSION_START
public static final int VERSION_START
First supported version, this is the version that was used when releasing Lucene 7.0.- See Also:
- Constant Field Values
-
VERSION_LITTLE_ENDIAN
private static final int VERSION_LITTLE_ENDIAN
- See Also:
- Constant Field Values
-
VERSION_CONTINUOUS_ARCS
public static final int VERSION_CONTINUOUS_ARCS
Version that started storing continuous arcs.- See Also:
- Constant Field Values
-
VERSION_CURRENT
public static final int VERSION_CURRENT
Current version.- See Also:
- Constant Field Values
-
VERSION_90
public static final int VERSION_90
Version that was used when releasing Lucene 9.0.- See Also:
- Constant Field Values
-
FINAL_END_NODE
static final long FINAL_END_NODE
- See Also:
- Constant Field Values
-
NON_FINAL_END_NODE
static final long NON_FINAL_END_NODE
- See Also:
- Constant Field Values
-
END_LABEL
public static final int END_LABEL
If arc has this label then that arc is final/accepted- See Also:
- Constant Field Values
-
fstReader
private final FSTReader fstReader
ABytesStore
, used during building, or during reading when the FST is very large (more than 1 GB). If the FST is less than 1 GB then bytesArray is set instead.
-
DEFAULT_MAX_BLOCK_BITS
private static final int DEFAULT_MAX_BLOCK_BITS
-
-
Constructor Detail
-
FST
public FST(FST.FSTMetadata<T> metadata, DataInput in) throws java.io.IOException
Load a previously saved FST with a DataInput for metdata using anOnHeapFSTStore
with maxBlockBits set toDEFAULT_MAX_BLOCK_BITS
- Throws:
java.io.IOException
-
FST
public FST(FST.FSTMetadata<T> metadata, DataInput in, FSTStore fstStore) throws java.io.IOException
Load a previously saved FST with a metdata object and a FSTStore. If usingOnHeapFSTStore
, setting maxBlockBits allows you to control the size of the byte[] pages used to hold the FST bytes.- Throws:
java.io.IOException
-
FST
FST(FST.FSTMetadata<T> metadata, FSTReader fstReader)
Create the FST with a metadata object and a FSTReader.
-
-
Method Detail
-
flag
private static boolean flag(int flags, int bit)
-
readMetadata
public static <T> FST.FSTMetadata<T> readMetadata(DataInput metaIn, Outputs<T> outputs) throws java.io.IOException
Read the FST metadata from DataInput- Type Parameters:
T
- the output type- Parameters:
metaIn
- the DataInput of the metadataoutputs
- the FST outputs- Returns:
- the FST metadata
- Throws:
java.io.IOException
- if exception occurred during parsing
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
numBytes
public long numBytes()
-
getEmptyOutput
public T getEmptyOutput()
-
getMetadata
public FST.FSTMetadata<T> getMetadata()
-
save
public void save(DataOutput metaOut, DataOutput out) throws java.io.IOException
- Throws:
java.io.IOException
-
saveMetadata
public void saveMetadata(DataOutput metaOut) throws java.io.IOException
Save the metadata to a DataOutput- Parameters:
metaOut
- the DataOutput to save- Throws:
java.io.IOException
-
save
public void save(java.nio.file.Path path) throws java.io.IOException
Writes an automaton to a file.- Throws:
java.io.IOException
-
read
public static <T> FST<T> read(java.nio.file.Path path, Outputs<T> outputs) throws java.io.IOException
Reads an automaton from a file.- Throws:
java.io.IOException
-
readLabel
public int readLabel(DataInput in) throws java.io.IOException
Reads one BYTE1/2/4 label from the providedDataInput
.- Throws:
java.io.IOException
-
targetHasArcs
public static <T> boolean targetHasArcs(FST.Arc<T> arc)
returns true if the node at this address has any outgoing arcs
-
getNumPresenceBytes
static int getNumPresenceBytes(int labelRange)
Gets the number of bytes required to flag the presence of each arc in the given label range, one bit per arc.
-
readPresenceBytes
private void readPresenceBytes(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Reads the presence bits of a direct-addressing node. Actually we don't read them here, we just keep the pointer to the bit-table start and we skip them.- Throws:
java.io.IOException
-
getFirstArc
public FST.Arc<T> getFirstArc(FST.Arc<T> arc)
Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start node
-
readLastTargetArc
FST.Arc<T> readLastTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Follows thefollow
arc and reads the last arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.- Returns:
- Returns the second argument (
arc
). - Throws:
java.io.IOException
-
readUnpackedNodeTarget
private long readUnpackedNodeTarget(FST.BytesReader in) throws java.io.IOException
- Throws:
java.io.IOException
-
readFirstTargetArc
public FST.Arc<T> readFirstTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Follow thefollow
arc and read the first arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.- Returns:
- Returns the second argument (
arc
). - Throws:
java.io.IOException
-
readFirstArcInfo
private void readFirstArcInfo(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
- Throws:
java.io.IOException
-
readFirstRealTargetArc
public FST.Arc<T> readFirstRealTargetArc(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
- Throws:
java.io.IOException
-
isExpandedTarget
boolean isExpandedTarget(FST.Arc<T> follow, FST.BytesReader in) throws java.io.IOException
Returns whetherarc
's target points to a node in expanded format (fixed length arcs).- Throws:
java.io.IOException
-
readNextArc
public FST.Arc<T> readNextArc(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
In-place read; returns the arc.- Throws:
java.io.IOException
-
readNextArcLabel
int readNextArcLabel(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Peeks at next arc's label; does not alter arc. Do not call this if arc.isLast()!- Throws:
java.io.IOException
-
readArcByIndex
public FST.Arc<T> readArcByIndex(FST.Arc<T> arc, FST.BytesReader in, int idx) throws java.io.IOException
- Throws:
java.io.IOException
-
readArcByContinuous
public FST.Arc<T> readArcByContinuous(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex) throws java.io.IOException
Reads a Continuous node arc, with the provided index in the label range.- Parameters:
rangeIndex
- The index of the arc in the label range. It must be within the label range.- Throws:
java.io.IOException
-
readArcByDirectAddressing
public FST.Arc<T> readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex) throws java.io.IOException
Reads a present direct addressing node arc, with the provided index in the label range.- Parameters:
rangeIndex
- The index of the arc in the label range. It must be present. The real arc offset is computed based on the presence bits of the direct addressing node.- Throws:
java.io.IOException
-
readArcByDirectAddressing
private FST.Arc<T> readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex, int presenceIndex) throws java.io.IOException
Reads a present direct addressing node arc, with the provided index in the label range and its corresponding presence index (which is the count of presence bits before it).- Throws:
java.io.IOException
-
readLastArcByDirectAddressing
public FST.Arc<T> readLastArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Reads the last arc of a direct addressing node. This method is equivalent to callreadArcByDirectAddressing(Arc, BytesReader, int)
withrangeIndex
equal toarc.numArcs() - 1
, but it is faster.- Throws:
java.io.IOException
-
readLastArcByContinuous
public FST.Arc<T> readLastArcByContinuous(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Reads the last arc of a continuous node.- Throws:
java.io.IOException
-
readNextRealArc
public FST.Arc<T> readNextRealArc(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Never returns null, but you should never call this if arc.isLast() is true.- Throws:
java.io.IOException
-
readArc
private FST.Arc<T> readArc(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Reads an arc.
Precondition: The arc flags byte has already been read and set; the given BytesReader is positioned just after the arc flags byte.- Throws:
java.io.IOException
-
findTargetArc
public FST.Arc<T> findTargetArc(int labelToMatch, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Finds an arc leaving the incoming arc, replacing the arc in place. This returns null if the arc was not found, else the incoming arc.- Throws:
java.io.IOException
-
seekToNextNode
private void seekToNextNode(FST.BytesReader in) throws java.io.IOException
- Throws:
java.io.IOException
-
getBytesReader
public FST.BytesReader getBytesReader()
Returns aFST.BytesReader
for this FST, positioned at position 0.
-
-