IndexReader is an abstract class, providing an interface for accessing an
index. Search of an index is done entirely through this abstract interface,
so that any subclass which implements it is searchable.
Concrete subclasses of IndexReader are usually constructed with a call to
one of the static
open()
methods, e.g.
open(String)
.
For efficiency, in this API documents are often referred to via
document numbers, non-negative integers which each name a unique
document in the index. These document numbers are ephemeral--they may change
as documents are added to and deleted from an index. Clients should thus not
rely on a given document having the same number between sessions.
An IndexReader can be opened on a directory for which an IndexWriter is
opened already, but it cannot be used to delete documents from the index then.
close
public final void close()
throws IOException
Closes files associated with this index.
Also saves any new deletions to disk.
No other methods should be called after this has been called.
commit
protected final void commit()
throws IOException
Commit changes resulting from delete, undeleteAll, or
setNorm operations
If an exception is hit, then either no changes or all
changes will have been committed to the index
(transactional semantics).
deleteDocument
public final void deleteDocument(int docNum)
throws IOException
Deletes the document numbered
docNum
. Once a document is
deleted it will not appear in TermDocs or TermPostitions enumerations.
Attempts to read its field with the
document
method will result in an error. The presence of this document may still be
reflected in the
docFreq(Term)
statistic, though
this will be corrected eventually as the index is further modified.
deleteDocuments
public final int deleteDocuments(Term term)
throws IOException
Deletes all documents that have a given
term
indexed.
This is useful if one uses a document field to hold a unique ID string for
the document. Then to delete such a document, one merely constructs a
term with the appropriate field and the unique ID string as its text and
passes it to this method.
See
deleteDocument(int)
for information about when this deletion will
become effective.
- the number of documents deleted
directory
public Directory directory()
Returns the directory this index resides in.
doClose
protected abstract void doClose()
throws IOException
Implements close.
doCommit
protected abstract void doCommit()
throws IOException
Implements commit.
doDelete
protected abstract void doDelete(int docNum)
throws IOException
doSetNorm
protected abstract void doSetNorm(int doc,
String field,
byte value)
throws IOException
Implements setNorm in subclass.
doUndeleteAll
protected abstract void doUndeleteAll()
throws IOException
Implements actual undeleteAll() in subclass.
docFreq
public abstract int docFreq(Term t)
throws IOException
Returns the number of documents containing the term t
.
document
public Document document(int n)
throws IOException
Returns the stored fields of the n
th
Document
in this index.
document
public abstract Document document(int n,
FieldSelector fieldSelector)
throws IOException
Get the
Document
at the
n
th position. The
FieldSelector
may be used to determine what
Field
s to load and how they should be loaded.
NOTE: If this Reader (more specifically, the underlying
FieldsReader
is closed before the lazy
Field
is
loaded an exception may be thrown. If you want the value of a lazy
Field
to be available after closing you must
explicitly load it or fetch the Document again with a new loader.
n
- Get the document at the n
th positionfieldSelector
- The FieldSelector
to use to determine what Fields should be loaded on the Document. May be null, in which case all Fields will be loaded.
- The stored fields of the
Document
at the nth position
finalize
protected void finalize()
throws Throwable
Release the write lock, if needed.
getCurrentVersion
public static long getCurrentVersion(File directory)
throws IOException
Reads version number from segments files. The version number is
initialized with a timestamp and then increased by one for each change of
the index.
directory
- where the index resides.
getCurrentVersion
public static long getCurrentVersion(String directory)
throws IOException
Reads version number from segments files. The version number is
initialized with a timestamp and then increased by one for each change of
the index.
directory
- where the index resides.
getCurrentVersion
public static long getCurrentVersion(Directory directory)
throws IOException
Reads version number from segments files. The version number is
initialized with a timestamp and then increased by one for each change of
the index.
directory
- where the index resides.
getDeleter
protected org.apache.lucene.index.IndexFileDeleter getDeleter()
getFieldNames
public abstract Collection getFieldNames(IndexReader.FieldOption fldOption)
Get a list of unique field names that exist in this index and have the specified
field option information.
fldOption
- specifies which field option should be available for the returned fields
- Collection of Strings indicating the names of the fields.
getTermFreqVector
public abstract TermFreqVector getTermFreqVector(int docNumber,
String field)
throws IOException
Return a term frequency vector for the specified document and field. The
returned vector contains terms and frequencies for the terms in
the specified field of this document, if the field had the storeTermVector
flag set. If termvectors had been stored with positions or offsets, a
TermPositionsVector is returned.
docNumber
- document for which the term frequency vector is returnedfield
- field for which the term frequency vector is returned.
- term frequency vector May be null if field does not exist in the specified
document or term vector was not stored.
getTermFreqVectors
public abstract TermFreqVector[] getTermFreqVectors(int docNumber)
throws IOException
Return an array of term frequency vectors for the specified document.
The array contains a vector for each vectorized field in the document.
Each vector contains terms and frequencies for all terms in a given vectorized field.
If no such fields existed, the method returns null. The term vectors that are
returned my either be of type TermFreqVector or of type TermPositionsVector if
positions or offsets have been stored.
docNumber
- document for which term frequency vectors are returned
- array of term frequency vectors. May be null if no term vectors have been
stored for the specified document.
getVersion
public long getVersion()
Version number when this IndexReader was opened.
hasDeletions
public abstract boolean hasDeletions()
Returns true if any documents have been deleted
hasNorms
public boolean hasNorms(String field)
throws IOException
Returns true if there are norms stored for this field.
indexExists
public static boolean indexExists(File directory)
Returns true
if an index exists at the specified directory.
If the directory does not exist or if there is no index in it.
directory
- the directory to check for an index
true
if an index exists; false
otherwise
indexExists
public static boolean indexExists(String directory)
Returns true
if an index exists at the specified directory.
If the directory does not exist or if there is no index in it.
false
is returned.
directory
- the directory to check for an index
true
if an index exists; false
otherwise
indexExists
public static boolean indexExists(Directory directory)
throws IOException
Returns true
if an index exists at the specified directory.
If the directory does not exist or if there is no index in it.
directory
- the directory to check for an index
true
if an index exists; false
otherwise
isCurrent
public boolean isCurrent()
throws IOException
Check whether this IndexReader still works on a current version of the index.
If this is not the case you will need to re-open the IndexReader to
make sure you see the latest changes made to the index.
isDeleted
public abstract boolean isDeleted(int n)
Returns true if document n has been deleted
isLocked
public static boolean isLocked(String directory)
throws IOException
Returns true
iff the index in the named directory is
currently locked.
directory
- the directory to check for a lock
isLocked
public static boolean isLocked(Directory directory)
throws IOException
Returns true
iff the index in the named directory is
currently locked.
directory
- the directory to check for a lock
isOptimized
public boolean isOptimized()
Checks is the index is optimized (if it has a single segment and no deletions)
true
if the index is optimized; false
otherwise
lastModified
public static long lastModified(File fileDirectory)
throws IOException
Returns the time the index in the named directory was last modified.
Do not use this to check whether the reader is still up-to-date, use
isCurrent()
instead.
lastModified
public static long lastModified(String directory)
throws IOException
Returns the time the index in the named directory was last modified.
Do not use this to check whether the reader is still up-to-date, use
isCurrent()
instead.
lastModified
public static long lastModified(Directory directory2)
throws IOException
Returns the time the index in the named directory was last modified.
Do not use this to check whether the reader is still up-to-date, use
isCurrent()
instead.
main
public static void main(String[] args)
Prints the filename and size of each file within a given compound file.
Add the -extract flag to extract files to the current working directory.
In order to make the extracted version of the index work, you have to copy
the segments file from the compound index into the directory where the extracted files are stored.
args
- Usage: org.apache.lucene.index.IndexReader [-extract] <cfsfile>
maxDoc
public abstract int maxDoc()
Returns one greater than the largest possible document number.
This may be used to, e.g., determine how big to allocate an array which
will have an element for every document number in an index.
norms
public abstract byte[] norms(String field)
throws IOException
Returns the byte-encoded normalization factor for the named field of
every document. This is used by the search code to score documents.
norms
public abstract void norms(String field,
byte[] bytes,
int offset)
throws IOException
Reads the byte-encoded normalization factor for the named field of every
document. This is used by the search code to score documents.
numDocs
public abstract int numDocs()
Returns the number of documents in this index.
open
public static IndexReader open(File path)
throws IOException
Returns an IndexReader reading the index in an FSDirectory in the named
path.
open
public static IndexReader open(String path)
throws IOException
Returns an IndexReader reading the index in an FSDirectory in the named
path.
open
public static IndexReader open(Directory directory)
throws IOException
Returns an IndexReader reading the index in the given Directory.
setDeleter
protected void setDeleter(org.apache.lucene.index.IndexFileDeleter deleter)
setNorm
public final void setNorm(int doc,
String field,
byte value)
throws IOException
Expert: Resets the normalization factor for the named field of the named
document. The norm represents the product of the field's
boost
and its
length normalization
. Thus, to preserve the length normalization
values when resetting this, one should base the new value upon the old.
setNorm
public void setNorm(int doc,
String field,
float value)
throws IOException
Expert: Resets the normalization factor for the named field of the named
document.
termDocs
public abstract TermDocs termDocs()
throws IOException
Returns an unpositioned
TermDocs
enumerator.
termDocs
public TermDocs termDocs(Term term)
throws IOException
Returns an enumeration of all the documents which contain
term
. For each document, the document number, the frequency of
the term in that document is also provided, for use in search scoring.
Thus, this method implements the mapping:
The enumeration is ordered by document number. Each document number
is greater than all that precede it in the enumeration.
termPositions
public abstract TermPositions termPositions()
throws IOException
termPositions
public TermPositions termPositions(Term term)
throws IOException
Returns an enumeration of all the documents which contain
term
. For each document, in addition to the document number
and frequency of the term in that document, a list of all of the ordinal
positions of the term in the document is available. Thus, this method
implements the mapping:
This positional information faciliates phrase and proximity searching.
The enumeration is ordered by document number. Each document number is
greater than all that precede it in the enumeration.
terms
public abstract TermEnum terms()
throws IOException
Returns an enumeration of all the terms in the index.
The enumeration is ordered by Term.compareTo(). Each term
is greater than all that precede it in the enumeration.
terms
public abstract TermEnum terms(Term t)
throws IOException
Returns an enumeration of all terms after a given term.
The enumeration is ordered by Term.compareTo(). Each term
is greater than all that precede it in the enumeration.
undeleteAll
public final void undeleteAll()
throws IOException
Undeletes all documents currently marked as deleted in this index.
unlock
public static void unlock(Directory directory)
throws IOException
Forcibly unlocks the index in the named directory.
Caution: this should only be used by failure recovery code,
when it is known that no other process nor thread is in fact
currently accessing this index.