- java.lang.Object
-
- org.apache.lucene.index.TermsEnum
-
- org.apache.lucene.index.FilteredTermsEnum
-
- org.apache.lucene.index.AutomatonTermsEnum
-
- All Implemented Interfaces:
BytesRefIterator
public class AutomatonTermsEnum extends FilteredTermsEnum
A FilteredTermsEnum that enumerates terms based upon what is accepted by a DFA.The algorithm is such:
- As long as matches are successful, keep reading sequentially.
- When a match fails, skip to the next string in lexicographic order that does not enter a reject state.
The algorithm does not attempt to actually skip to the next string that is completely accepted. This is not possible when the language accepted by the FSM is not finite (i.e. * operator).
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.index.FilteredTermsEnum
FilteredTermsEnum.AcceptStatus
-
Nested classes/interfaces inherited from class org.apache.lucene.index.TermsEnum
TermsEnum.SeekStatus
-
-
Field Summary
Fields Modifier and Type Field Description private Automaton
automaton
private BytesRef
commonSuffixRef
private short
curGen
private boolean
finite
private boolean
linear
private BytesRef
linearUpperBound
private ByteRunAutomaton
runAutomaton
private IntsRefBuilder
savedStates
private BytesRefBuilder
seekBytesRef
private Transition
transition
private short[]
visited
-
Fields inherited from class org.apache.lucene.index.FilteredTermsEnum
actualTerm, tenum
-
-
Constructor Summary
Constructors Constructor Description AutomatonTermsEnum(TermsEnum tenum, CompiledAutomaton compiled)
Construct an enumerator based upon an automaton, enumerating the specified field, working on a supplied TermsEnum
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected FilteredTermsEnum.AcceptStatus
accept(BytesRef term)
Returns true if the term matches the automaton.private int
backtrack(int position)
Attempts to backtrack thru the string after encountering a dead end at some given position.private boolean
isVisited(int state)
Indicates whether the given state has been visited.protected BytesRef
nextSeekTerm(BytesRef term)
On the first call toFilteredTermsEnum.next()
or ifFilteredTermsEnum.accept(org.apache.lucene.util.BytesRef)
returnsFilteredTermsEnum.AcceptStatus.YES_AND_SEEK
orFilteredTermsEnum.AcceptStatus.NO_AND_SEEK
, this method will be called to eventually seek the underlying TermsEnum to a new position.private boolean
nextString()
Increments the byte buffer to the next String in binary order after s that will not put the machine into a reject state.private boolean
nextString(int state, int position)
Returns the next String in lexicographic order that will not put the machine into a reject state.private void
setLinear(int position)
Sets the enum to operate in linear fashion, as we have found a looping transition at position: we set an upper bound and act like a TermRangeQuery for this portion of the term space.private void
setVisited(int state)
Records the given state has been visited.-
Methods inherited from class org.apache.lucene.index.FilteredTermsEnum
attributes, docFreq, impacts, next, ord, postings, seekCeil, seekExact, seekExact, seekExact, setInitialSeekTerm, term, termState, totalTermFreq
-
-
-
-
Field Detail
-
runAutomaton
private final ByteRunAutomaton runAutomaton
-
commonSuffixRef
private final BytesRef commonSuffixRef
-
finite
private final boolean finite
-
automaton
private final Automaton automaton
-
visited
private final short[] visited
-
curGen
private short curGen
-
seekBytesRef
private final BytesRefBuilder seekBytesRef
-
linear
private boolean linear
-
linearUpperBound
private final BytesRef linearUpperBound
-
transition
private final Transition transition
-
savedStates
private final IntsRefBuilder savedStates
-
-
Constructor Detail
-
AutomatonTermsEnum
public AutomatonTermsEnum(TermsEnum tenum, CompiledAutomaton compiled)
Construct an enumerator based upon an automaton, enumerating the specified field, working on a supplied TermsEnum- Parameters:
compiled
- CompiledAutomaton
-
-
Method Detail
-
setVisited
private void setVisited(int state)
Records the given state has been visited.
-
isVisited
private boolean isVisited(int state)
Indicates whether the given state has been visited.
-
accept
protected FilteredTermsEnum.AcceptStatus accept(BytesRef term)
Returns true if the term matches the automaton. Also stashes away the term to assist with smart enumeration.- Specified by:
accept
in classFilteredTermsEnum
-
nextSeekTerm
protected BytesRef nextSeekTerm(BytesRef term) throws java.io.IOException
Description copied from class:FilteredTermsEnum
On the first call toFilteredTermsEnum.next()
or ifFilteredTermsEnum.accept(org.apache.lucene.util.BytesRef)
returnsFilteredTermsEnum.AcceptStatus.YES_AND_SEEK
orFilteredTermsEnum.AcceptStatus.NO_AND_SEEK
, this method will be called to eventually seek the underlying TermsEnum to a new position. On the first call,currentTerm
will benull
, later calls will provide the term the underlying enum is positioned at. This method returns per default only one time the initial seek term and thennull
, so no repositioning is ever done.Override this method, if you want a more sophisticated TermsEnum, that repositions the iterator during enumeration. If this method always returns
null
the enum is empty.Please note: This method should always provide a greater term than the last enumerated term, else the behaviour of this enum violates the contract for TermsEnums.
- Overrides:
nextSeekTerm
in classFilteredTermsEnum
- Throws:
java.io.IOException
-
setLinear
private void setLinear(int position)
Sets the enum to operate in linear fashion, as we have found a looping transition at position: we set an upper bound and act like a TermRangeQuery for this portion of the term space.
-
nextString
private boolean nextString()
Increments the byte buffer to the next String in binary order after s that will not put the machine into a reject state. If such a string does not exist, returns false.The correctness of this method depends upon the automaton being deterministic, and having no transitions to dead states.
- Returns:
- true if more possible solutions exist for the DFA
-
nextString
private boolean nextString(int state, int position)
Returns the next String in lexicographic order that will not put the machine into a reject state.This method traverses the DFA from the given position in the String, starting at the given state.
If this cannot satisfy the machine, returns false. This method will walk the minimal path, in lexicographic order, as long as possible.
If this method returns false, then there might still be more solutions, it is necessary to backtrack to find out.
- Parameters:
state
- current non-reject stateposition
- useful portion of the string- Returns:
- true if more possible solutions exist for the DFA from this position
-
backtrack
private int backtrack(int position)
Attempts to backtrack thru the string after encountering a dead end at some given position. Returns false if no more possible strings can match.- Parameters:
position
- current position in the input String- Returns:
position >= 0
if more possible solutions exist for the DFA
-
-