Module org.apache.lucene.core
Class Lucene90CompressingTermVectorsWriter
- java.lang.Object
-
- org.apache.lucene.codecs.TermVectorsWriter
-
- org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingTermVectorsWriter
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,Accountable
public final class Lucene90CompressingTermVectorsWriter extends TermVectorsWriter
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
Lucene90CompressingTermVectorsWriter.CompressingTermVectorsSub
private class
Lucene90CompressingTermVectorsWriter.DocData
a pending docprivate class
Lucene90CompressingTermVectorsWriter.FieldData
a pending field
-
Field Summary
Fields Modifier and Type Field Description (package private) static boolean
BULK_MERGE_ENABLED
(package private) static java.lang.String
BULK_MERGE_ENABLED_SYSPROP
private int
chunkSize
private CompressionMode
compressionMode
private Compressor
compressor
private Lucene90CompressingTermVectorsWriter.DocData
curDoc
private Lucene90CompressingTermVectorsWriter.FieldData
curField
(package private) static int
FLAGS_BITS
private FieldsIndexWriter
indexWriter
private BytesRef
lastTerm
private int[]
lengthsBuf
private int
maxDocsPerChunk
(package private) static int
META_VERSION_START
private IndexOutput
metaStream
private long
numChunks
private long
numDirtyChunks
private long
numDirtyDocs
private int
numDocs
(package private) static int
OFFSETS
(package private) static int
PACKED_BLOCK_SIZE
private ByteBuffersDataOutput
payloadBytes
private int[]
payloadLengthsBuf
(package private) static int
PAYLOADS
private java.util.Deque<Lucene90CompressingTermVectorsWriter.DocData>
pendingDocs
(package private) static int
POSITIONS
private int[]
positionsBuf
private ByteBuffersDataOutput
scratchBuffer
private java.lang.String
segment
private int[]
startOffsetsBuf
private ByteBuffersDataOutput
termSuffixes
(package private) static java.lang.String
VECTORS_EXTENSION
(package private) static java.lang.String
VECTORS_INDEX_CODEC_NAME
(package private) static java.lang.String
VECTORS_INDEX_EXTENSION
(package private) static java.lang.String
VECTORS_META_EXTENSION
private IndexOutput
vectorsStream
(package private) static int
VERSION_CURRENT
(package private) static int
VERSION_START
private BlockPackedWriter
writer
-
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
-
Constructor Summary
Constructors Constructor Description Lucene90CompressingTermVectorsWriter(Directory directory, SegmentInfo si, java.lang.String segmentSuffix, IOContext context, java.lang.String formatName, CompressionMode compressionMode, int chunkSize, int maxDocsPerChunk, int blockShift)
Sole constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private Lucene90CompressingTermVectorsWriter.DocData
addDocData(int numVectorFields)
void
addPosition(int position, int startOffset, int endOffset, BytesRef payload)
Adds a term position and offsetsvoid
addProx(int numProx, DataInput positions, DataInput offsets)
Called by IndexWriter when writing new segments.private boolean
canPerformBulkMerge(MergeState mergeState, MatchingReaders matchingReaders, int readerIndex)
void
close()
private void
copyChunks(MergeState mergeState, Lucene90CompressingTermVectorsWriter.CompressingTermVectorsSub sub, int fromDocID, int toDocID)
void
finish(int numDocs)
Called beforeTermVectorsWriter.close()
, passing in the number of documents that were written.void
finishDocument()
Called after a doc and all its fields have been added.void
finishField()
Called after a field and all its terms have been added.private void
flush(boolean force)
private int[]
flushFieldNums()
Returns a sorted array containing unique field numbersprivate void
flushFields(int totalFields, int[] fieldNums)
private void
flushFlags(int totalFields, int[] fieldNums)
private int
flushNumFields(int chunkDocs)
private void
flushNumTerms(int totalFields)
private void
flushOffsets(int[] fieldNums)
private void
flushPayloadLengths()
private void
flushPositions()
private void
flushTermFreqs()
private void
flushTermLengths()
java.util.Collection<Accountable>
getChildResources()
Returns nested resources of this class.int
merge(MergeState mergeState)
Merges in the term vectors from the readers inmergeState
.long
ramBytesUsed()
Return the memory usage of this object in bytes.void
startDocument(int numVectorFields)
Called before writing the term vectors of the document.void
startField(FieldInfo info, int numTerms, boolean positions, boolean offsets, boolean payloads)
Called before writing the terms of the field.void
startTerm(BytesRef term, int freq)
Adds a term and its term frequencyfreq
.(package private) boolean
tooDirty(Lucene90CompressingTermVectorsReader candidate)
Returns true if we should recompress this reader, even though we could bulk merge compressed dataprivate boolean
triggerFlush()
-
Methods inherited from class org.apache.lucene.codecs.TermVectorsWriter
addAllDocVectors, finishTerm
-
-
-
-
Field Detail
-
VECTORS_EXTENSION
static final java.lang.String VECTORS_EXTENSION
- See Also:
- Constant Field Values
-
VECTORS_INDEX_EXTENSION
static final java.lang.String VECTORS_INDEX_EXTENSION
- See Also:
- Constant Field Values
-
VECTORS_META_EXTENSION
static final java.lang.String VECTORS_META_EXTENSION
- See Also:
- Constant Field Values
-
VECTORS_INDEX_CODEC_NAME
static final java.lang.String VECTORS_INDEX_CODEC_NAME
- See Also:
- Constant Field Values
-
VERSION_START
static final int VERSION_START
- See Also:
- Constant Field Values
-
VERSION_CURRENT
static final int VERSION_CURRENT
- See Also:
- Constant Field Values
-
META_VERSION_START
static final int META_VERSION_START
- See Also:
- Constant Field Values
-
PACKED_BLOCK_SIZE
static final int PACKED_BLOCK_SIZE
- See Also:
- Constant Field Values
-
POSITIONS
static final int POSITIONS
- See Also:
- Constant Field Values
-
OFFSETS
static final int OFFSETS
- See Also:
- Constant Field Values
-
PAYLOADS
static final int PAYLOADS
- See Also:
- Constant Field Values
-
FLAGS_BITS
static final int FLAGS_BITS
-
segment
private final java.lang.String segment
-
indexWriter
private FieldsIndexWriter indexWriter
-
metaStream
private IndexOutput metaStream
-
vectorsStream
private IndexOutput vectorsStream
-
compressionMode
private final CompressionMode compressionMode
-
compressor
private final Compressor compressor
-
chunkSize
private final int chunkSize
-
numChunks
private long numChunks
-
numDirtyChunks
private long numDirtyChunks
-
numDirtyDocs
private long numDirtyDocs
-
numDocs
private int numDocs
-
pendingDocs
private final java.util.Deque<Lucene90CompressingTermVectorsWriter.DocData> pendingDocs
-
curDoc
private Lucene90CompressingTermVectorsWriter.DocData curDoc
-
curField
private Lucene90CompressingTermVectorsWriter.FieldData curField
-
lastTerm
private final BytesRef lastTerm
-
positionsBuf
private int[] positionsBuf
-
startOffsetsBuf
private int[] startOffsetsBuf
-
lengthsBuf
private int[] lengthsBuf
-
payloadLengthsBuf
private int[] payloadLengthsBuf
-
termSuffixes
private final ByteBuffersDataOutput termSuffixes
-
payloadBytes
private final ByteBuffersDataOutput payloadBytes
-
writer
private final BlockPackedWriter writer
-
maxDocsPerChunk
private final int maxDocsPerChunk
-
scratchBuffer
private final ByteBuffersDataOutput scratchBuffer
-
BULK_MERGE_ENABLED_SYSPROP
static final java.lang.String BULK_MERGE_ENABLED_SYSPROP
-
BULK_MERGE_ENABLED
static final boolean BULK_MERGE_ENABLED
-
-
Constructor Detail
-
Lucene90CompressingTermVectorsWriter
Lucene90CompressingTermVectorsWriter(Directory directory, SegmentInfo si, java.lang.String segmentSuffix, IOContext context, java.lang.String formatName, CompressionMode compressionMode, int chunkSize, int maxDocsPerChunk, int blockShift) throws java.io.IOException
Sole constructor.- Throws:
java.io.IOException
-
-
Method Detail
-
addDocData
private Lucene90CompressingTermVectorsWriter.DocData addDocData(int numVectorFields)
-
close
public void close() throws java.io.IOException
- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Specified by:
close
in classTermVectorsWriter
- Throws:
java.io.IOException
-
startDocument
public void startDocument(int numVectorFields) throws java.io.IOException
Description copied from class:TermVectorsWriter
Called before writing the term vectors of the document.TermVectorsWriter.startField(FieldInfo, int, boolean, boolean, boolean)
will be callednumVectorFields
times. Note that if term vectors are enabled, this is called even if the document has no vector fields, in this casenumVectorFields
will be zero.- Specified by:
startDocument
in classTermVectorsWriter
- Throws:
java.io.IOException
-
finishDocument
public void finishDocument() throws java.io.IOException
Description copied from class:TermVectorsWriter
Called after a doc and all its fields have been added.- Overrides:
finishDocument
in classTermVectorsWriter
- Throws:
java.io.IOException
-
startField
public void startField(FieldInfo info, int numTerms, boolean positions, boolean offsets, boolean payloads) throws java.io.IOException
Description copied from class:TermVectorsWriter
Called before writing the terms of the field.TermVectorsWriter.startTerm(BytesRef, int)
will be callednumTerms
times.- Specified by:
startField
in classTermVectorsWriter
- Throws:
java.io.IOException
-
finishField
public void finishField() throws java.io.IOException
Description copied from class:TermVectorsWriter
Called after a field and all its terms have been added.- Overrides:
finishField
in classTermVectorsWriter
- Throws:
java.io.IOException
-
startTerm
public void startTerm(BytesRef term, int freq) throws java.io.IOException
Description copied from class:TermVectorsWriter
Adds a term and its term frequencyfreq
. If this field has positions and/or offsets enabled, thenTermVectorsWriter.addPosition(int, int, int, BytesRef)
will be calledfreq
times respectively.- Specified by:
startTerm
in classTermVectorsWriter
- Throws:
java.io.IOException
-
addPosition
public void addPosition(int position, int startOffset, int endOffset, BytesRef payload) throws java.io.IOException
Description copied from class:TermVectorsWriter
Adds a term position and offsets- Specified by:
addPosition
in classTermVectorsWriter
- Throws:
java.io.IOException
-
triggerFlush
private boolean triggerFlush()
-
flush
private void flush(boolean force) throws java.io.IOException
- Throws:
java.io.IOException
-
flushNumFields
private int flushNumFields(int chunkDocs) throws java.io.IOException
- Throws:
java.io.IOException
-
flushFieldNums
private int[] flushFieldNums() throws java.io.IOException
Returns a sorted array containing unique field numbers- Throws:
java.io.IOException
-
flushFields
private void flushFields(int totalFields, int[] fieldNums) throws java.io.IOException
- Throws:
java.io.IOException
-
flushFlags
private void flushFlags(int totalFields, int[] fieldNums) throws java.io.IOException
- Throws:
java.io.IOException
-
flushNumTerms
private void flushNumTerms(int totalFields) throws java.io.IOException
- Throws:
java.io.IOException
-
flushTermLengths
private void flushTermLengths() throws java.io.IOException
- Throws:
java.io.IOException
-
flushTermFreqs
private void flushTermFreqs() throws java.io.IOException
- Throws:
java.io.IOException
-
flushPositions
private void flushPositions() throws java.io.IOException
- Throws:
java.io.IOException
-
flushOffsets
private void flushOffsets(int[] fieldNums) throws java.io.IOException
- Throws:
java.io.IOException
-
flushPayloadLengths
private void flushPayloadLengths() throws java.io.IOException
- Throws:
java.io.IOException
-
finish
public void finish(int numDocs) throws java.io.IOException
Description copied from class:TermVectorsWriter
Called beforeTermVectorsWriter.close()
, passing in the number of documents that were written. Note that this is intentionally redundant (equivalent to the number of calls toTermVectorsWriter.startDocument(int)
, but a Codec should check that this is the case to detect the JRE bug described in LUCENE-1282.- Specified by:
finish
in classTermVectorsWriter
- Throws:
java.io.IOException
-
addProx
public void addProx(int numProx, DataInput positions, DataInput offsets) throws java.io.IOException
Description copied from class:TermVectorsWriter
Called by IndexWriter when writing new segments.This is an expert API that allows the codec to consume positions and offsets directly from the indexer.
The default implementation calls
TermVectorsWriter.addPosition(int, int, int, BytesRef)
, but subclasses can override this if they want to efficiently write all the positions, then all the offsets, for example.NOTE: This API is extremely expert and subject to change or removal!!!
- Overrides:
addProx
in classTermVectorsWriter
- Throws:
java.io.IOException
-
copyChunks
private void copyChunks(MergeState mergeState, Lucene90CompressingTermVectorsWriter.CompressingTermVectorsSub sub, int fromDocID, int toDocID) throws java.io.IOException
- Throws:
java.io.IOException
-
merge
public int merge(MergeState mergeState) throws java.io.IOException
Description copied from class:TermVectorsWriter
Merges in the term vectors from the readers inmergeState
. The default implementation skips over deleted documents, and usesTermVectorsWriter.startDocument(int)
,TermVectorsWriter.startField(FieldInfo, int, boolean, boolean, boolean)
,TermVectorsWriter.startTerm(BytesRef, int)
,TermVectorsWriter.addPosition(int, int, int, BytesRef)
, andTermVectorsWriter.finish(int)
, returning the number of documents that were written. Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).- Overrides:
merge
in classTermVectorsWriter
- Throws:
java.io.IOException
-
tooDirty
boolean tooDirty(Lucene90CompressingTermVectorsReader candidate)
Returns true if we should recompress this reader, even though we could bulk merge compressed dataThe last chunk written for a segment is typically incomplete, so without recompressing, in some worst-case situations (e.g. frequent reopen with tiny flushes), over time the compression ratio can degrade. This is a safety switch.
-
canPerformBulkMerge
private boolean canPerformBulkMerge(MergeState mergeState, MatchingReaders matchingReaders, int readerIndex)
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.
-
getChildResources
public java.util.Collection<Accountable> getChildResources()
Description copied from interface:Accountable
Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- See Also:
Accountables
-
-