org.apache.lucene.search.similar
public final class SimilarityQueries extends Object
See Also: MoreLikeThis
Method Summary | |
---|---|
static Query | formSimilarQuery(String body, Analyzer a, String field, Set stop)
Simple similarity query generators.
|
So, if you have a code fragment like this:
Query q = formSimilaryQuery( "I use Lucene to search fast. Fast searchers are good", new StandardAnalyzer(), "contents", null);
The query returned, in string form, will be '(i use lucene to search fast searchers are good')
.
The philosophy behind this method is "two documents are similar if they share lots of words". Note that behind the scenes, Lucenes scoring algorithm will tend to give two documents a higher similarity score if the share more uncommon words.
This method is fail-safe in that if a long 'body' is passed in and
BooleanQuery.add()
(used internally)
throws
BooleanQuery.TooManyClauses
, the
query as it is will be returned.
Parameters: body the body of the document you want to find similar documents to a the analyzer to use to parse the body field the field you want to search on, probably something like "contents" or "body" stop optional set of stop words to ignore
Returns: a query with all unique words in 'body'
Throws: IOException this can't happen...