org.apache.lucene.search.spell
public class SpellChecker extends Object
Spell Checker class (Main class)
(initially inspired by the David Spencer code).
Example Usage:
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory); // To index a field of a user index: spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field)); // To index a file containing words: spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt"))); String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
Version: 1.0
Field Summary | |
---|---|
static String | F_WORD
Field name for each word in the ngram index. |
Constructor Summary | |
---|---|
SpellChecker(Directory spellIndex)
Use the given directory as a spell checker index. |
Method Summary | |
---|---|
void | clearIndex()
Removes all terms from the spell check index. |
boolean | exist(String word)
Check whether the word exists in the index. |
protected void | finalize()
Closes the internal IndexReader. |
void | indexDictionary(Dictionary dict)
Index a Dictionary |
void | setAccuracy(float minScore)
Sets the accuracy 0 < minScore < 1; default 0.5 |
void | setSpellIndex(Directory spellIndex)
Use a different index as the spell checker index or re-open
the existing index if spellIndex is the same value
as given in the constructor.
|
String[] | suggestSimilar(String word, int numSug)
Suggest similar words.
|
String[] | suggestSimilar(String word, int numSug, IndexReader ir, String field, boolean morePopular)
Suggest similar words (optionally restricted to a field of an index).
|
Parameters: spellIndex
Throws: IOException
Throws: IOException
Parameters: word
Returns: true iff the word exists in the index
Throws: IOException
Parameters: dict the dictionary to index
Throws: IOException
spellIndex
is the same value
as given in the constructor.
Parameters: spellIndex
Throws: IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
Parameters: word the word you want a spell check done on numSug the number of suggested words
Returns: String[]
Throws: IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
Parameters: word the word you want a spell check done on numSug the number of suggested words ir the indexReader of the user index (can be null see field param) field the field of the user index: if field is not null, the suggested words are restricted to the words present in this field. morePopular return only the suggest words that are more frequent than the searched word (only if restricted mode = (indexReader!=null and field!=null)
Returns: String[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index
Throws: IOException