Class DocumentsWriter

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, Accountable

    final class DocumentsWriter
    extends java.lang.Object
    implements java.io.Closeable, Accountable
    This class accepts multiple added documents and directly writes segment files.

    Each added document is passed to the indexing chain, which in turn processes the document into the different codec formats. Some formats write bytes to files immediately, e.g. stored fields and term vectors, while others are buffered by the indexing chain and written only on flush.

    Once we have used our allowed RAM buffer, or the number of added docs is large enough (in the case we are flushing by doc count instead of RAM usage), we create a real segment and flush it to the Directory.

    Threads:

    Multiple threads are allowed into addDocument at once. There is an initial synchronized call to DocumentsWriterFlushControl.obtainAndLock() which allocates a DWPT for this indexing thread. The same thread will not necessarily get the same DWPT over time. Then updateDocuments is called on that DWPT without synchronization (most of the "heavy lifting" is in this call). Once a DWPT fills up enough RAM or hold enough documents in memory the DWPT is checked out for flush and all changes are written to the directory. Each DWPT corresponds to one segment being written.

    When flush is called by IndexWriter we check out all DWPTs that are associated with the current DocumentsWriterDeleteQueue out of the DocumentsWriterPerThreadPool and write them to disk. The flush process can piggy-back on incoming indexing threads or even block them from adding documents if flushing can't keep up with new documents being added. Unless the stall control kicks in to block indexing threads flushes are happening concurrently to actual index requests.

    Exceptions:

    Because this class directly updates in-memory posting lists, and flushes stored fields and term vectors directly to files in the directory, there are certain limited times when an exception can corrupt this state. For example, a disk full while flushing stored fields leaves this file in a corrupt state. Or, an OOM exception while appending to the in-memory posting lists can corrupt that posting list. We call such exceptions "aborting exceptions". In these cases we must call abort() to discard all docs added since the last flush.

    All other exceptions ("non-aborting exceptions") can still partially update the index structures. These updates are consistent, but, they represent only a part of the document seen up until the exception was hit. When this happens, we immediately mark the document as deleted so that the document is always atomically ("all or none") added to the index.

    • Method Detail

      • deleteQueries

        long deleteQueries​(Query... queries)
                    throws java.io.IOException
        Throws:
        java.io.IOException
      • deleteTerms

        long deleteTerms​(Term... terms)
                  throws java.io.IOException
        Throws:
        java.io.IOException
      • updateDocValues

        long updateDocValues​(DocValuesUpdate... updates)
                      throws java.io.IOException
        Throws:
        java.io.IOException
      • applyDeleteOrUpdate

        private long applyDeleteOrUpdate​(java.util.function.ToLongFunction<DocumentsWriterDeleteQueue> function)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • applyAllDeletes

        private boolean applyAllDeletes()
                                 throws java.io.IOException
        If buffered deletes are using too much heap, resolve them and write disk and return true.
        Throws:
        java.io.IOException
      • getNumDocs

        int getNumDocs()
        Returns how many docs are currently buffered in RAM.
      • abort

        void abort()
            throws java.io.IOException
        Called if we hit an exception at a bad time (when updating the index files) and must discard all currently buffered docs. This resets our state, discarding any docs added since last flush.
        Throws:
        java.io.IOException
      • flushOneDWPT

        final boolean flushOneDWPT()
                            throws java.io.IOException
        Throws:
        java.io.IOException
      • lockAndAbortAll

        java.io.Closeable lockAndAbortAll()
                                   throws java.io.IOException
        Locks all currently active DWPT and aborts them. The returned Closeable should be closed once the locks for the aborted DWPTs can be released.
        Throws:
        java.io.IOException
      • abortDocumentsWriterPerThread

        private void abortDocumentsWriterPerThread​(DocumentsWriterPerThread perThread)
                                            throws java.io.IOException
        Returns how many documents were aborted.
        Throws:
        java.io.IOException
      • getMaxCompletedSequenceNumber

        long getMaxCompletedSequenceNumber()
        returns the maximum sequence number for all previously completed operations
      • anyChanges

        boolean anyChanges()
      • getBufferedDeleteTermsSize

        int getBufferedDeleteTermsSize()
      • anyDeletions

        boolean anyDeletions()
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException
      • preUpdate

        private boolean preUpdate()
                           throws java.io.IOException
        Throws:
        java.io.IOException
      • postUpdate

        private boolean postUpdate​(DocumentsWriterPerThread flushingDWPT,
                                   boolean hasEvents)
                            throws java.io.IOException
        Throws:
        java.io.IOException
      • maybeFlush

        private boolean maybeFlush()
                            throws java.io.IOException
        Throws:
        java.io.IOException
      • doFlush

        private void doFlush​(DocumentsWriterPerThread flushingDWPT)
                      throws java.io.IOException
        Throws:
        java.io.IOException
      • getNextSequenceNumber

        long getNextSequenceNumber()
      • subtractFlushedNumDocs

        void subtractFlushedNumDocs​(int numFlushed)
      • flushAllThreads

        long flushAllThreads()
                      throws java.io.IOException
        Throws:
        java.io.IOException
      • finishFullFlush

        void finishFullFlush​(boolean success)
                      throws java.io.IOException
        Throws:
        java.io.IOException
      • ramBytesUsed

        public long ramBytesUsed()
        Description copied from interface: Accountable
        Return the memory usage of this object in bytes. Negative values are illegal.
        Specified by:
        ramBytesUsed in interface Accountable
      • getFlushingBytes

        long getFlushingBytes()
        Returns the number of bytes currently being flushed

        This is a subset of the value returned by ramBytesUsed()