Class Lucene40BlockTreeTermsReader

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, java.lang.Iterable<java.lang.String>

    public final class Lucene40BlockTreeTermsReader
    extends FieldsProducer
    A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes. The terms index is a prefix trie whose leaves are term blocks. The advantage of this approach is that seekExact is often able to determine a term cannot exist without doing any IO, and intersection with Automata is very fast. Note that this terms dictionary has its own fixed terms index (ie, it does not support a pluggable terms index implementation).

    NOTE: this terms dictionary supports min/maxItemsPerBlock during indexing to control how much memory the terms index uses.

    The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones.

    Use CheckIndex with the -verbose option to see summary statistics on the blocks in the dictionary.

    See BlockTreeTermsWriter.

    • Field Detail

      • NO_OUTPUT

        static final BytesRef NO_OUTPUT
      • TERMS_EXTENSION

        static final java.lang.String TERMS_EXTENSION
        Extension of terms file
        See Also:
        Constant Field Values
      • VERSION_START

        public static final int VERSION_START
        Initial terms format.
        See Also:
        Constant Field Values
      • VERSION_META_LONGS_REMOVED

        public static final int VERSION_META_LONGS_REMOVED
        The long[] + byte[] metadata has been replaced with a single byte[].
        See Also:
        Constant Field Values
      • VERSION_COMPRESSED_SUFFIXES

        public static final int VERSION_COMPRESSED_SUFFIXES
        Suffixes are compressed to save space.
        See Also:
        Constant Field Values
      • VERSION_META_FILE

        public static final int VERSION_META_FILE
        Metadata is written to its own file.
        See Also:
        Constant Field Values
      • VERSION_CURRENT

        public static final int VERSION_CURRENT
        Current terms format.
        See Also:
        Constant Field Values
      • TERMS_INDEX_EXTENSION

        static final java.lang.String TERMS_INDEX_EXTENSION
        Extension of terms index file
        See Also:
        Constant Field Values
      • TERMS_INDEX_CODEC_NAME

        static final java.lang.String TERMS_INDEX_CODEC_NAME
        See Also:
        Constant Field Values
      • TERMS_META_EXTENSION

        static final java.lang.String TERMS_META_EXTENSION
        Extension of terms meta file
        See Also:
        Constant Field Values
      • TERMS_META_CODEC_NAME

        static final java.lang.String TERMS_META_CODEC_NAME
        See Also:
        Constant Field Values
      • fieldMap

        private final java.util.Map<java.lang.String,​FieldReader> fieldMap
      • fieldList

        private final java.util.List<java.lang.String> fieldList
      • segment

        final java.lang.String segment
      • version

        final int version
    • Constructor Detail

      • Lucene40BlockTreeTermsReader

        public Lucene40BlockTreeTermsReader​(PostingsReaderBase postingsReader,
                                            SegmentReadState state)
                                     throws java.io.IOException
        Sole constructor.
        Throws:
        java.io.IOException
    • Method Detail

      • readBytesRef

        private static BytesRef readBytesRef​(IndexInput in)
                                      throws java.io.IOException
        Throws:
        java.io.IOException
      • seekDir

        private static void seekDir​(IndexInput input)
                             throws java.io.IOException
        Seek input to the directory offset.
        Throws:
        java.io.IOException
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Specified by:
        close in class FieldsProducer
        Throws:
        java.io.IOException
      • iterator

        public java.util.Iterator<java.lang.String> iterator()
        Description copied from class: Fields
        Returns an iterator that will step through all fields names. This will not return null.
        Specified by:
        iterator in interface java.lang.Iterable<java.lang.String>
        Specified by:
        iterator in class Fields
      • terms

        public Terms terms​(java.lang.String field)
                    throws java.io.IOException
        Description copied from class: Fields
        Get the Terms for this field. This will return null if the field does not exist.
        Specified by:
        terms in class Fields
        Throws:
        java.io.IOException
      • size

        public int size()
        Description copied from class: Fields
        Returns the number of fields or -1 if the number of distinct field names is unknown. If >= 0, Fields.iterator() will return as many field names.
        Specified by:
        size in class Fields
      • brToString

        java.lang.String brToString​(BytesRef b)
      • checkIntegrity

        public void checkIntegrity()
                            throws java.io.IOException
        Description copied from class: FieldsProducer
        Checks consistency of this reader.

        Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files.

        Specified by:
        checkIntegrity in class FieldsProducer
        Throws:
        java.io.IOException
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object