org.apache.lucene.search.highlight
public class Highlighter extends Object
Field Summary | |
---|---|
static int | DEFAULT_MAX_DOC_BYTES_TO_ANALYZE |
Constructor Summary | |
---|---|
Highlighter(Scorer fragmentScorer) | |
Highlighter(Formatter formatter, Scorer fragmentScorer) | |
Highlighter(Formatter formatter, Encoder encoder, Scorer fragmentScorer) |
Method Summary | |
---|---|
String | getBestFragment(Analyzer analyzer, String fieldName, String text)
Highlights chosen terms in a text, extracting the most relevant section.
|
String | getBestFragment(TokenStream tokenStream, String text)
Highlights chosen terms in a text, extracting the most relevant section.
|
String[] | getBestFragments(Analyzer analyzer, String text, int maxNumFragments)
Highlights chosen terms in a text, extracting the most relevant sections.
|
String[] | getBestFragments(Analyzer analyzer, String fieldName, String text, int maxNumFragments)
Highlights chosen terms in a text, extracting the most relevant sections.
|
String[] | getBestFragments(TokenStream tokenStream, String text, int maxNumFragments)
Highlights chosen terms in a text, extracting the most relevant sections.
|
String | getBestFragments(TokenStream tokenStream, String text, int maxNumFragments, String separator)
Highlights terms in the text , extracting the most relevant sections
and concatenating the chosen fragments with a separator (typically "...").
|
TextFragment[] | getBestTextFragments(TokenStream tokenStream, String text, boolean mergeContiguousFragments, int maxNumFragments)
Low level api to get the most relevant (formatted) sections of the document.
|
Encoder | getEncoder() |
Scorer | getFragmentScorer() |
int | getMaxDocBytesToAnalyze() |
Fragmenter | getTextFragmenter() |
void | setEncoder(Encoder encoder) |
void | setFragmentScorer(Scorer scorer) |
void | setMaxDocBytesToAnalyze(int byteCount) |
void | setTextFragmenter(Fragmenter fragmenter) |
Parameters: analyzer the analyzer that will be used to split text
into chunks text text to highlight terms in fieldName Name of field used to influence analyzer's tokenization policy
Returns: highlighted text fragment or null if no terms found
Parameters: tokenStream a stream of tokens identified in the text parameter, including offset information. This is typically produced by an analyzer re-parsing a document's text. Some work may be done on retrieving TokenStreams more efficently by adding support for storing original text position data in the Lucene index but this support is not currently available (as of Lucene 1.4 rc2). text text to highlight terms in
Returns: highlighted text fragment or null if no terms found
Deprecated: This method incorrectly hardcodes the choice of fieldname. Use the method of the same name that takes a fieldname.
Highlights chosen terms in a text, extracting the most relevant sections. This is a convenience method that calls HighlighterParameters: analyzer the analyzer that will be used to split text
into chunks text text to highlight terms in maxNumFragments the maximum number of fragments.
Returns: highlighted text fragments (between 0 and maxNumFragments number of fragments)
Parameters: analyzer the analyzer that will be used to split text
into chunks fieldName the name of the field being highlighted (used by analyzer) text text to highlight terms in maxNumFragments the maximum number of fragments.
Returns: highlighted text fragments (between 0 and maxNumFragments number of fragments)
Parameters: text text to highlight terms in maxNumFragments the maximum number of fragments.
Returns: highlighted text fragments (between 0 and maxNumFragments number of fragments)
Parameters: text text to highlight terms in maxNumFragments the maximum number of fragments. separator the separator used to intersperse the document fragments (typically "...")
Returns: highlighted text
Parameters: tokenStream text maxNumFragments mergeContiguousFragments
Throws: IOException
Returns: Object used to score each text fragment
Returns: the maximum number of bytes to be tokenized per doc
Parameters: scorer
Parameters: byteCount the maximum number of bytes to be tokenized per doc (This can improve performance with large documents)
Parameters: fragmenter