org.apache.lucene.index

Class IndexReader

Known Direct Subclasses:
FilterIndexReader, MultiReader, ParallelReader

public abstract class IndexReader
extends Object

IndexReader is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable.

Concrete subclasses of IndexReader are usually constructed with a call to one of the static open() methods, e.g. open(String).

For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral--they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.

An IndexReader can be opened on a directory for which an IndexWriter is opened already, but it cannot be used to delete documents from the index then.

Version:
$Id: IndexReader.java 497612 2007-01-18 22:47:03Z mikemccand $
Author:
Doug Cutting

Nested Class Summary

static class
IndexReader.FieldOption

Field Summary

protected org.apache.lucene.index.IndexFileDeleter
deleter

Constructor Summary

IndexReader(Directory directory)
Constructor used if IndexReader is not owner of its directory.

Method Summary

void
close()
Closes files associated with this index.
protected void
commit()
Commit changes resulting from delete, undeleteAll, or setNorm operations If an exception is hit, then either no changes or all changes will have been committed to the index (transactional semantics).
void
deleteDocument(int docNum)
Deletes the document numbered docNum.
int
deleteDocuments(Term term)
Deletes all documents that have a given term indexed.
Directory
directory()
Returns the directory this index resides in.
protected abstract void
doClose()
Implements close.
protected abstract void
doCommit()
Implements commit.
protected abstract void
doDelete(int docNum)
Implements deletion of the document numbered docNum.
protected abstract void
doSetNorm(int doc, String field, byte value)
Implements setNorm in subclass.
protected abstract void
doUndeleteAll()
Implements actual undeleteAll() in subclass.
abstract int
docFreq(Term t)
Returns the number of documents containing the term t.
Document
document(int n)
Returns the stored fields of the nth Document in this index.
abstract Document
document(int n, FieldSelector fieldSelector)
Get the Document at the nth position.
protected void
finalize()
Release the write lock, if needed.
static long
getCurrentVersion(File directory)
Reads version number from segments files.
static long
getCurrentVersion(String directory)
Reads version number from segments files.
static long
getCurrentVersion(Directory directory)
Reads version number from segments files.
protected org.apache.lucene.index.IndexFileDeleter
getDeleter()
abstract Collection
getFieldNames(IndexReader.FieldOption fldOption)
Get a list of unique field names that exist in this index and have the specified field option information.
abstract TermFreqVector
getTermFreqVector(int docNumber, String field)
Return a term frequency vector for the specified document and field.
abstract TermFreqVector[]
getTermFreqVectors(int docNumber)
Return an array of term frequency vectors for the specified document.
long
getVersion()
Version number when this IndexReader was opened.
abstract boolean
hasDeletions()
Returns true if any documents have been deleted
boolean
hasNorms(String field)
Returns true if there are norms stored for this field.
static boolean
indexExists(File directory)
Returns true if an index exists at the specified directory.
static boolean
indexExists(String directory)
Returns true if an index exists at the specified directory.
static boolean
indexExists(Directory directory)
Returns true if an index exists at the specified directory.
boolean
isCurrent()
Check whether this IndexReader still works on a current version of the index.
abstract boolean
isDeleted(int n)
Returns true if document n has been deleted
static boolean
isLocked(String directory)
Returns true iff the index in the named directory is currently locked.
static boolean
isLocked(Directory directory)
Returns true iff the index in the named directory is currently locked.
boolean
isOptimized()
Checks is the index is optimized (if it has a single segment and no deletions)
static long
lastModified(File fileDirectory)
Returns the time the index in the named directory was last modified.
static long
lastModified(String directory)
Returns the time the index in the named directory was last modified.
static long
lastModified(Directory directory2)
Returns the time the index in the named directory was last modified.
static void
main(String[] args)
Prints the filename and size of each file within a given compound file.
abstract int
maxDoc()
Returns one greater than the largest possible document number.
abstract byte[]
norms(String field)
Returns the byte-encoded normalization factor for the named field of every document.
abstract void
norms(String field, byte[] bytes, int offset)
Reads the byte-encoded normalization factor for the named field of every document.
abstract int
numDocs()
Returns the number of documents in this index.
static IndexReader
open(File path)
Returns an IndexReader reading the index in an FSDirectory in the named path.
static IndexReader
open(String path)
Returns an IndexReader reading the index in an FSDirectory in the named path.
static IndexReader
open(Directory directory)
Returns an IndexReader reading the index in the given Directory.
protected void
setDeleter(org.apache.lucene.index.IndexFileDeleter deleter)
void
setNorm(int doc, String field, byte value)
Expert: Resets the normalization factor for the named field of the named document.
void
setNorm(int doc, String field, float value)
Expert: Resets the normalization factor for the named field of the named document.
abstract TermDocs
termDocs()
Returns an unpositioned TermDocs enumerator.
TermDocs
termDocs(Term term)
Returns an enumeration of all the documents which contain term.
abstract TermPositions
termPositions()
Returns an unpositioned TermPositions enumerator.
TermPositions
termPositions(Term term)
Returns an enumeration of all the documents which contain term.
abstract TermEnum
terms()
Returns an enumeration of all the terms in the index.
abstract TermEnum
terms(Term t)
Returns an enumeration of all terms after a given term.
void
undeleteAll()
Undeletes all documents currently marked as deleted in this index.
static void
unlock(Directory directory)
Forcibly unlocks the index in the named directory.

Field Details

deleter

protected org.apache.lucene.index.IndexFileDeleter deleter

Constructor Details

IndexReader

protected IndexReader(Directory directory)
Constructor used if IndexReader is not owner of its directory. This is used for IndexReaders that are used within other IndexReaders that take care or locking directories.
Parameters:
directory - Directory where IndexReader files reside.

Method Details

close

public final void close()
            throws IOException
Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called.

commit

protected final void commit()
            throws IOException
Commit changes resulting from delete, undeleteAll, or setNorm operations If an exception is hit, then either no changes or all changes will have been committed to the index (transactional semantics).

deleteDocument

public final void deleteDocument(int docNum)
            throws IOException

deleteDocuments

public final int deleteDocuments(Term term)
            throws IOException
Deletes all documents that have a given term indexed. This is useful if one uses a document field to hold a unique ID string for the document. Then to delete such a document, one merely constructs a term with the appropriate field and the unique ID string as its text and passes it to this method. See deleteDocument(int) for information about when this deletion will become effective.
Returns:
the number of documents deleted

directory

public Directory directory()
Returns the directory this index resides in.

doClose

protected abstract void doClose()
            throws IOException
Implements close.

doCommit

protected abstract void doCommit()
            throws IOException
Implements commit.

doDelete

protected abstract void doDelete(int docNum)
            throws IOException

doSetNorm

protected abstract void doSetNorm(int doc,
                                  String field,
                                  byte value)
            throws IOException
Implements setNorm in subclass.

doUndeleteAll

protected abstract void doUndeleteAll()
            throws IOException
Implements actual undeleteAll() in subclass.

docFreq

public abstract int docFreq(Term t)
            throws IOException
Returns the number of documents containing the term t.

document

public Document document(int n)
            throws IOException
Returns the stored fields of the nth Document in this index.

document

public abstract Document document(int n,
                                  FieldSelector fieldSelector)
            throws IOException
Get the Document at the nth position. The FieldSelector may be used to determine what Fields to load and how they should be loaded. NOTE: If this Reader (more specifically, the underlying FieldsReader is closed before the lazy Field is loaded an exception may be thrown. If you want the value of a lazy Field to be available after closing you must explicitly load it or fetch the Document again with a new loader.
Parameters:
n - Get the document at the nth position
fieldSelector - The FieldSelector to use to determine what Fields should be loaded on the Document. May be null, in which case all Fields will be loaded.
Returns:
The stored fields of the Document at the nth position

finalize

protected void finalize()
            throws Throwable
Release the write lock, if needed.

getCurrentVersion

public static long getCurrentVersion(File directory)
            throws IOException
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.
Parameters:
directory - where the index resides.
Returns:
version number.

getCurrentVersion

public static long getCurrentVersion(String directory)
            throws IOException
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.
Parameters:
directory - where the index resides.
Returns:
version number.

getCurrentVersion

public static long getCurrentVersion(Directory directory)
            throws IOException
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.
Parameters:
directory - where the index resides.
Returns:
version number.

getDeleter

protected org.apache.lucene.index.IndexFileDeleter getDeleter()

getFieldNames

public abstract Collection getFieldNames(IndexReader.FieldOption fldOption)
Get a list of unique field names that exist in this index and have the specified field option information.
Parameters:
fldOption - specifies which field option should be available for the returned fields
Returns:
Collection of Strings indicating the names of the fields.

getTermFreqVector

public abstract TermFreqVector getTermFreqVector(int docNumber,
                                                 String field)
            throws IOException
Return a term frequency vector for the specified document and field. The returned vector contains terms and frequencies for the terms in the specified field of this document, if the field had the storeTermVector flag set. If termvectors had been stored with positions or offsets, a TermPositionsVector is returned.
Parameters:
docNumber - document for which the term frequency vector is returned
field - field for which the term frequency vector is returned.
Returns:
term frequency vector May be null if field does not exist in the specified document or term vector was not stored.

getTermFreqVectors

public abstract TermFreqVector[] getTermFreqVectors(int docNumber)
            throws IOException
Return an array of term frequency vectors for the specified document. The array contains a vector for each vectorized field in the document. Each vector contains terms and frequencies for all terms in a given vectorized field. If no such fields existed, the method returns null. The term vectors that are returned my either be of type TermFreqVector or of type TermPositionsVector if positions or offsets have been stored.
Parameters:
docNumber - document for which term frequency vectors are returned
Returns:
array of term frequency vectors. May be null if no term vectors have been stored for the specified document.

getVersion

public long getVersion()
Version number when this IndexReader was opened.

hasDeletions

public abstract boolean hasDeletions()
Returns true if any documents have been deleted

hasNorms

public boolean hasNorms(String field)
            throws IOException
Returns true if there are norms stored for this field.

indexExists

public static boolean indexExists(File directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.
Parameters:
directory - the directory to check for an index
Returns:
true if an index exists; false otherwise

indexExists

public static boolean indexExists(String directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it. false is returned.
Parameters:
directory - the directory to check for an index
Returns:
true if an index exists; false otherwise

indexExists

public static boolean indexExists(Directory directory)
            throws IOException
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.
Parameters:
directory - the directory to check for an index
Returns:
true if an index exists; false otherwise

isCurrent

public boolean isCurrent()
            throws IOException
Check whether this IndexReader still works on a current version of the index. If this is not the case you will need to re-open the IndexReader to make sure you see the latest changes made to the index.

isDeleted

public abstract boolean isDeleted(int n)
Returns true if document n has been deleted

isLocked

public static boolean isLocked(String directory)
            throws IOException
Returns true iff the index in the named directory is currently locked.
Parameters:
directory - the directory to check for a lock

isLocked

public static boolean isLocked(Directory directory)
            throws IOException
Returns true iff the index in the named directory is currently locked.
Parameters:
directory - the directory to check for a lock

isOptimized

public boolean isOptimized()
Checks is the index is optimized (if it has a single segment and no deletions)
Returns:
true if the index is optimized; false otherwise

lastModified

public static long lastModified(File fileDirectory)
            throws IOException

lastModified

public static long lastModified(String directory)
            throws IOException

lastModified

public static long lastModified(Directory directory2)
            throws IOException
Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use isCurrent() instead.

main

public static void main(String[] args)
Prints the filename and size of each file within a given compound file. Add the -extract flag to extract files to the current working directory. In order to make the extracted version of the index work, you have to copy the segments file from the compound index into the directory where the extracted files are stored.
Parameters:
args - Usage: org.apache.lucene.index.IndexReader [-extract] <cfsfile>

maxDoc

public abstract int maxDoc()
Returns one greater than the largest possible document number. This may be used to, e.g., determine how big to allocate an array which will have an element for every document number in an index.

norms

public abstract byte[] norms(String field)
            throws IOException
Returns the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.

norms

public abstract void norms(String field,
                           byte[] bytes,
                           int offset)
            throws IOException
Reads the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.

numDocs

public abstract int numDocs()
Returns the number of documents in this index.

open

public static IndexReader open(File path)
            throws IOException
Returns an IndexReader reading the index in an FSDirectory in the named path.

open

public static IndexReader open(String path)
            throws IOException
Returns an IndexReader reading the index in an FSDirectory in the named path.

open

public static IndexReader open(Directory directory)
            throws IOException
Returns an IndexReader reading the index in the given Directory.

setDeleter

protected void setDeleter(org.apache.lucene.index.IndexFileDeleter deleter)

setNorm

public final void setNorm(int doc,
                          String field,
                          byte value)
            throws IOException
Expert: Resets the normalization factor for the named field of the named document. The norm represents the product of the field's boost and its length normalization. Thus, to preserve the length normalization values when resetting this, one should base the new value upon the old.

setNorm

public void setNorm(int doc,
                    String field,
                    float value)
            throws IOException
Expert: Resets the normalization factor for the named field of the named document.

termDocs

public abstract TermDocs termDocs()
            throws IOException
Returns an unpositioned TermDocs enumerator.

termDocs

public TermDocs termDocs(Term term)
            throws IOException
Returns an enumeration of all the documents which contain term. For each document, the document number, the frequency of the term in that document is also provided, for use in search scoring. Thus, this method implements the mapping:

    *

The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration.


termPositions

public abstract TermPositions termPositions()
            throws IOException
Returns an unpositioned TermPositions enumerator.

termPositions

public TermPositions termPositions(Term term)
            throws IOException
Returns an enumeration of all the documents which contain term. For each document, in addition to the document number and frequency of the term in that document, a list of all of the ordinal positions of the term in the document is available. Thus, this method implements the mapping:

    12freq-1*

This positional information faciliates phrase and proximity searching.

The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration.


terms

public abstract TermEnum terms()
            throws IOException
Returns an enumeration of all the terms in the index. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration.

terms

public abstract TermEnum terms(Term t)
            throws IOException
Returns an enumeration of all terms after a given term. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration.

undeleteAll

public final void undeleteAll()
            throws IOException
Undeletes all documents currently marked as deleted in this index.

unlock

public static void unlock(Directory directory)
            throws IOException
Forcibly unlocks the index in the named directory.

Caution: this should only be used by failure recovery code, when it is known that no other process nor thread is in fact currently accessing this index.


Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.