Class FSTCompletionLookup

  • All Implemented Interfaces:
    Accountable

    public class FSTCompletionLookup
    extends Lookup
    An adapter from Lookup API to FSTCompletion.

    This adapter differs from FSTCompletion in that it attempts to discretize any "weights" as passed from in InputIterator.weight() to match the number of buckets. For the rationale for bucketing, see FSTCompletion.

    Note:Discretization requires an additional sorting pass.

    The range of weights for bucketing/ discretization is determined by sorting the input by weight and then dividing into equal ranges. Then, scores within each range are assigned to that bucket.

    Note that this means that even large differences in weights may be lost during automaton construction, but the overall distinction between "classes" of weights will be preserved regardless of the distribution of weights.

    For fine-grained control over which weights are assigned to which buckets, use FSTCompletion directly or TSTLookup, for example.

    See Also:
    FSTCompletion
    • Field Detail

      • sharedTailLength

        private static final int sharedTailLength
        Shared tail length for conflating in the created automaton. Setting this to larger values (Integer.MAX_VALUE) will create smaller (or minimal) automata at the cost of RAM for keeping nodes hash in the FST.

        Empirical pick.

        See Also:
        Constant Field Values
      • tempFileNamePrefix

        private final java.lang.String tempFileNamePrefix
      • buckets

        private int buckets
      • exactMatchFirst

        private boolean exactMatchFirst
      • higherWeightsCompletion

        private FSTCompletion higherWeightsCompletion
        Automaton used for completions with higher weights reordering.
      • normalCompletion

        private FSTCompletion normalCompletion
        Automaton used for normal completions.
      • count

        private volatile long count
        Number of entries the lookup was built with
    • Constructor Detail

      • FSTCompletionLookup

        public FSTCompletionLookup()
        This constructor should only be used to read a previously saved suggester.
      • FSTCompletionLookup

        public FSTCompletionLookup​(Directory tempDir,
                                   java.lang.String tempFileNamePrefix)
        This constructor prepares for creating a suggested FST using the build(InputIterator) method. The number of weight discretization buckets is set to FSTCompletion.DEFAULT_BUCKETS and exact matches are promoted to the top of the suggestions list.
      • FSTCompletionLookup

        public FSTCompletionLookup​(Directory tempDir,
                                   java.lang.String tempFileNamePrefix,
                                   int buckets,
                                   boolean exactMatchFirst)
        This constructor prepares for creating a suggested FST using the build(InputIterator) method.
        Parameters:
        buckets - The number of weight discretization buckets (see FSTCompletion for details).
        exactMatchFirst - If true exact matches are promoted to the top of the suggestions list. Otherwise they appear in the order of discretized weight and alphabetical within the bucket.
      • FSTCompletionLookup

        public FSTCompletionLookup​(Directory tempDir,
                                   java.lang.String tempFileNamePrefix,
                                   FSTCompletion completion,
                                   boolean exactMatchFirst)
        This constructor takes a pre-built automaton.
        Parameters:
        completion - An instance of FSTCompletion.
        exactMatchFirst - If true exact matches are promoted to the top of the suggestions list. Otherwise they appear in the order of discretized weight and alphabetical within the bucket.
    • Method Detail

      • build

        public void build​(InputIterator iterator)
                   throws java.io.IOException
        Description copied from class: Lookup
        Builds up a new internal Lookup representation based on the given InputIterator. The implementation might re-sort the data internally.
        Specified by:
        build in class Lookup
        Throws:
        java.io.IOException
      • encodeWeight

        private static int encodeWeight​(long value)
        weight -> cost
      • lookup

        public java.util.List<Lookup.LookupResult> lookup​(java.lang.CharSequence key,
                                                          java.util.Set<BytesRef> contexts,
                                                          boolean higherWeightsFirst,
                                                          int num)
        Description copied from class: Lookup
        Look up a key and return possible completion for this key.
        Specified by:
        lookup in class Lookup
        Parameters:
        key - lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix.
        contexts - contexts to filter the lookup by, or null if all contexts are allowed; if the suggestion contains any of the contexts, it's a match
        higherWeightsFirst - return only more popular results
        num - maximum number of results to return
        Returns:
        a list of possible completions, with their relative weight (e.g. popularity)
      • get

        public java.lang.Object get​(java.lang.CharSequence key)
        Returns the bucket (weight) as a Long for the provided key if it exists, otherwise null if it does not.
      • store

        public boolean store​(DataOutput output)
                      throws java.io.IOException
        Description copied from class: Lookup
        Persist the constructed lookup data to a directory. Optional operation.
        Specified by:
        store in class Lookup
        Parameters:
        output - DataOutput to write the data to.
        Returns:
        true if successful, false if unsuccessful or not supported.
        Throws:
        java.io.IOException - when fatal IO error occurs.
      • load

        public boolean load​(DataInput input)
                     throws java.io.IOException
        Description copied from class: Lookup
        Discard current lookup data and load it from a previously saved copy. Optional operation.
        Specified by:
        load in class Lookup
        Parameters:
        input - the DataInput to load the lookup data.
        Returns:
        true if completed successfully, false if unsuccessful or not supported.
        Throws:
        java.io.IOException - when fatal IO error occurs.
      • ramBytesUsed

        public long ramBytesUsed()
        Description copied from interface: Accountable
        Return the memory usage of this object in bytes. Negative values are illegal.
      • getChildResources

        public java.util.Collection<Accountable> getChildResources()
        Description copied from interface: Accountable
        Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).
        See Also:
        Accountables
      • getCount

        public long getCount()
        Description copied from class: Lookup
        Get the number of entries the lookup was built with
        Specified by:
        getCount in class Lookup
        Returns:
        total number of suggester entries