Class IndexedDISI


  • final class IndexedDISI
    extends DocIdSetIterator
    Disk-based implementation of a DocIdSetIterator which can return the index of the current document, i.e. the ordinal of the current document among the list of documents that this iterator can return. This is useful to implement sparse doc values by only having to encode values for documents that actually have a value.

    Implementation-wise, this DocIdSetIterator is inspired of roaring bitmaps and encodes ranges of 65536 documents independently and picks between 3 encodings depending on the density of the range:

    • ALL if the range contains 65536 documents exactly,
    • DENSE if the range contains 4096 documents or more; in that case documents are stored in a bit set,
    • SPARSE otherwise, and the lower 16 bits of the doc IDs are stored in a short.

    Only ranges that contain at least one value are encoded.

    This implementation uses 6 bytes per document in the worst-case, which happens in the case that all ranges contain exactly one document.

    • Field Detail

      • cost

        private final long cost
      • block

        private int block
      • blockEnd

        private long blockEnd
      • nextBlockIndex

        private int nextBlockIndex
      • doc

        private int doc
      • index

        private int index
      • exists

        boolean exists
      • word

        private long word
      • wordIndex

        private int wordIndex
      • numberOfOnes

        private int numberOfOnes
      • gap

        private int gap
    • Constructor Detail

      • IndexedDISI

        IndexedDISI​(IndexInput in,
                    long offset,
                    long length,
                    long cost)
             throws java.io.IOException
        Throws:
        java.io.IOException
      • IndexedDISI

        IndexedDISI​(IndexInput slice,
                    long cost)
             throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • flush

        private static void flush​(int block,
                                  FixedBitSet buffer,
                                  int cardinality,
                                  IndexOutput out)
                           throws java.io.IOException
        Throws:
        java.io.IOException
      • writeBitSet

        static void writeBitSet​(DocIdSetIterator it,
                                IndexOutput out)
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • advance

        public int advance​(int target)
                    throws java.io.IOException
        Description copied from class: DocIdSetIterator
        Advances to the first beyond the current whose document number is greater than or equal to target, and returns the document number itself. Exhausts the iterator and returns DocIdSetIterator.NO_MORE_DOCS if target is greater than the highest document number in the set.

        The behavior of this method is undefined when called with target ≤ current , or after the iterator has exhausted. Both cases may result in unpredicted behavior.

        When target > current it behaves as if written:

         int advance(int target) {
           int doc;
           while ((doc = nextDoc()) < target) {
           }
           return doc;
         }
         
        Some implementations are considerably more efficient than that.

        NOTE: this method may be called with DocIdSetIterator.NO_MORE_DOCS for efficiency by some Scorers. If your implementation cannot efficiently determine that it should exhaust, it is recommended that you check for that value in each call to this method.

        Specified by:
        advance in class DocIdSetIterator
        Throws:
        java.io.IOException
      • advanceExact

        public boolean advanceExact​(int target)
                             throws java.io.IOException
        Throws:
        java.io.IOException
      • advanceBlock

        private void advanceBlock​(int targetBlock)
                           throws java.io.IOException
        Throws:
        java.io.IOException
      • readBlockHeader

        private void readBlockHeader()
                              throws java.io.IOException
        Throws:
        java.io.IOException
      • nextDoc

        public int nextDoc()
                    throws java.io.IOException
        Description copied from class: DocIdSetIterator
        Advances to the next document in the set and returns the doc it is currently on, or DocIdSetIterator.NO_MORE_DOCS if there are no more docs in the set.
        NOTE: after the iterator has exhausted you should not call this method, as it may result in unpredicted behavior.
        Specified by:
        nextDoc in class DocIdSetIterator
        Throws:
        java.io.IOException
      • index

        public int index()
      • cost

        public long cost()
        Description copied from class: DocIdSetIterator
        Returns the estimated cost of this DocIdSetIterator.

        This is generally an upper bound of the number of documents this iterator might match, but may be a rough heuristic, hardcoded value, or otherwise completely inaccurate.

        Specified by:
        cost in class DocIdSetIterator