Lucene 2.1.1-dev API

Apache Lucene is a high-performance, full-featured text search engine library.

Core

org.apache.luceneTop-level package.
org.apache.lucene.analysisAPI and code to convert text into indexable tokens.
org.apache.lucene.analysis.brAnalyzer for Brazilian.
org.apache.lucene.analysis.cjkAnalyzer for Chinese, Japanese and Korean.
org.apache.lucene.analysis.cnAnalyzer for Chinese.
org.apache.lucene.analysis.czAnalyzer for Czech.
org.apache.lucene.analysis.deAnalyzer for German.
org.apache.lucene.analysis.elAnalyzer for Greek.
org.apache.lucene.analysis.frAnalyzer for French.
org.apache.lucene.analysis.ngram
org.apache.lucene.analysis.nlAnalyzer for Dutch.
org.apache.lucene.analysis.ruAnalyzer for Russian.
org.apache.lucene.analysis.snowballTokenFilter and Analyzer implementations that use Snowball stemmers.
org.apache.lucene.analysis.standardA grammar-based tokenizer constructed with JavaCC.
org.apache.lucene.analysis.th
org.apache.lucene.antAnt task to create Lucene indexes.
org.apache.lucene.benchmark
org.apache.lucene.benchmark.byTask
Benchmarking Lucene By Tasks.
org.apache.lucene.benchmark.byTask.feedsSources for benchmark inputs: documents and queries.
org.apache.lucene.benchmark.byTask.programmaticSample performance test written programatically - no algorithm file is needed here.
org.apache.lucene.benchmark.byTask.statsStatistics maintained when running benchmark tasks.
org.apache.lucene.benchmark.byTask.tasksExtendable benchmark tasks.
org.apache.lucene.benchmark.byTask.utilsUtilities used for the benchmark, and for the reports.
org.apache.lucene.benchmark.standard
org.apache.lucene.benchmark.stats
org.apache.lucene.benchmark.utils
org.apache.lucene.demo
org.apache.lucene.demo.html
org.apache.lucene.documentThe Document abstraction.
org.apache.lucene.indexCode to maintain and access indices.
org.apache.lucene.index.memoryHigh-performance single-document main memory Apache Lucene fulltext search index.
org.apache.lucene.misc
org.apache.lucene.queryParserA simple query parser implemented with JavaCC.
org.apache.lucene.queryParser.analyzing
org.apache.lucene.queryParser.precedence
org.apache.lucene.queryParser.surround.parserThis package contains the QueryParser.jj source file for the Surround parser.
org.apache.lucene.queryParser.surround.queryThis package contains SrndQuery and its subclasses.
org.apache.lucene.search

Table Of Contents

  1. Search Basics
  2. The Query Classes
  3. Changing the Scoring

Search

Search over indices.

org.apache.lucene.search.highlightThe highlight package contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages.
org.apache.lucene.search.regexRegular expression Query.
org.apache.lucene.search.spansThe calculus of spans.
org.apache.lucene.search.spellSuggest alternate spellings for words.
org.apache.lucene.storeBinary i/o API, used for all index data.
org.apache.lucene.swing.modelsDecorators for JTable TableModel and JList ListModel encapsulating Lucene indexing and searching functionality.
org.apache.lucene.utilSome utility classes.
org.apache.lucene.wordnetThis package uses synonyms defined by WordNet to build a Lucene index storing them, which in turn can be used for query expansion.
org.apache.regexpThis package exists to allow access to useful package protected data within Jakarta Regexp.

Demo

org.apache.lucene.demo
org.apache.lucene.demo.html

contrib: Analysis

org.apache.lucene.analysis.brAnalyzer for Brazilian.
org.apache.lucene.analysis.cjkAnalyzer for Chinese, Japanese and Korean.
org.apache.lucene.analysis.cnAnalyzer for Chinese.
org.apache.lucene.analysis.czAnalyzer for Czech.
org.apache.lucene.analysis.deAnalyzer for German.
org.apache.lucene.analysis.elAnalyzer for Greek.
org.apache.lucene.analysis.frAnalyzer for French.
org.apache.lucene.analysis.ngram
org.apache.lucene.analysis.nlAnalyzer for Dutch.
org.apache.lucene.analysis.ruAnalyzer for Russian.
org.apache.lucene.analysis.snowballTokenFilter and Analyzer implementations that use Snowball stemmers.
org.apache.lucene.analysis.standardA grammar-based tokenizer constructed with JavaCC.
org.apache.lucene.analysis.th

contrib: Ant

org.apache.lucene.antAnt task to create Lucene indexes.

contrib: Benchmark

org.apache.lucene.benchmark
org.apache.lucene.benchmark.byTask
Benchmarking Lucene By Tasks.
org.apache.lucene.benchmark.byTask.feedsSources for benchmark inputs: documents and queries.
org.apache.lucene.benchmark.byTask.programmaticSample performance test written programatically - no algorithm file is needed here.
org.apache.lucene.benchmark.byTask.statsStatistics maintained when running benchmark tasks.
org.apache.lucene.benchmark.byTask.tasksExtendable benchmark tasks.
org.apache.lucene.benchmark.byTask.utilsUtilities used for the benchmark, and for the reports.
org.apache.lucene.benchmark.standard
org.apache.lucene.benchmark.stats
org.apache.lucene.benchmark.utils

contrib: Highlighter

org.apache.lucene.search.highlightThe highlight package contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages.

contrib: GData Server (Java1.5)

contrib: Lucli

lucliLucene Command Line Interface

contrib: Memory

org.apache.lucene.index.memoryHigh-performance single-document main memory Apache Lucene fulltext search index.

contrib: Miscellaneous

org.apache.lucene.misc
org.apache.lucene.queryParser.analyzing
org.apache.lucene.queryParser.precedence

contrib: MoreLikeThis

contrib: RegEx

org.apache.lucene.search.regexRegular expression Query.
org.apache.regexpThis package exists to allow access to useful package protected data within Jakarta Regexp.

contrib: Snowball

net.sf.snowballSnowball system classes.
net.sf.snowball.extSnowball generated stemmer classes.
org.apache.lucene.analysis.snowballTokenFilter and Analyzer implementations that use Snowball stemmers.

contrib: SpellChecker

org.apache.lucene.search.spellSuggest alternate spellings for words.

contrib: Surround Parser

org.apache.lucene.queryParser.surround.parserThis package contains the QueryParser.jj source file for the Surround parser.
org.apache.lucene.queryParser.surround.queryThis package contains SrndQuery and its subclasses.

contrib: Swing

org.apache.lucene.swing.modelsDecorators for JTable TableModel and JList ListModel encapsulating Lucene indexing and searching functionality.

contrib: WordNet

org.apache.lucene.wordnetThis package uses synonyms defined by WordNet to build a Lucene index storing them, which in turn can be used for query expansion.
Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect): <!-- ======================================================== --> <!-- = Java Sourcecode to HTML automatically converted code = --> <!-- = Java2Html Converter 5.0 [2006-02-26] by Markus Gebhard markus@jave.de = --> <!-- = Further information: http://www.java2html.de = -->
    Analyzer analyzer = new StandardAnalyzer();

    // Store the index in memory:
    Directory directory = new RAMDirectory();
    // To store an index on disk, use this instead (note that the 
    // parameter true will overwrite the index in that directory
    // if one exists):
    //Directory directory = FSDirectory.getDirectory("/tmp/testindex", true);
    IndexWriter iwriter = new IndexWriter(directory, analyzer, true);
    iwriter.setMaxFieldLength(25000);
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, Field.Store.YES,
        Field.Index.TOKENIZED));
    iwriter.addDocument(doc);
    iwriter.close();
    
    // Now search the index:
    IndexSearcher isearcher = new IndexSearcher(directory);
    // Parse a simple query that searches for "text":
    QueryParser parser = new QueryParser("fieldname", analyzer);
    Query query = parser.parse("text");
    Hits hits = isearcher.search(query);
    assertEquals(1, hits.length());
    // Iterate through the results:
    for (int i = 0; i < hits.length(); i++) {
      Document hitDoc = hits.doc(i);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
    }
    isearcher.close();
    directory.close();
<!-- = END of automatically generated HTML code = --> <!-- ======================================================== -->

The Lucene API is divided into several packages:

To use Lucene, an application should:
  1. Create Documents by adding Fields;
  2. Create an IndexWriter and add documents to it with addDocument();
  3. Call QueryParser.parse() to build a query from a string; and
  4. Create an IndexSearcher and pass the query to its search() method.
Some simple examples of code which does this are: To demonstrate these, try something like:
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups
adding rec.food.recipes/soups/abalone-chowder
  [ ... ]

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
1. rec.food.recipes/soups/spam-chowder
  [ ... thirty-four documents contain the word "chowder" ... ]

Query: "clam chowder" AND Manhattan
Searching for: +"clam chowder" +manhattan
2 total matching documents
1. rec.food.recipes/soups/clam-chowder
  [ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ]
    [ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]

The IndexHTML demo is more sophisticated.  It incrementally maintains an index of HTML files, adding new files as they appear, deleting old files as they disappear and re-indexing files as they change.
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML -create java/jdk1.1.6/docs/relnotes
adding java/jdk1.1.6/docs/relnotes/SMICopyright.html
  [ ... create an index containing all the relnotes ]

> rm java/jdk1.1.6/docs/relnotes/smicopyright.html

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML java/jdk1.1.6/docs/relnotes
deleting java/jdk1.1.6/docs/relnotes/SMICopyright.html


Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.