Package org.languagetool.dev
Class HomophoneOccurrenceDumper
java.lang.Object
org.languagetool.languagemodel.BaseLanguageModel
org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
org.languagetool.dev.HomophoneOccurrenceDumper
- All Implemented Interfaces:
AutoCloseable
,org.languagetool.languagemodel.LanguageModel
class HomophoneOccurrenceDumper
extends org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
Dump the occurrences of homophone 3grams to STDOUT. Useful to have a more
compact file with homophone occurrences, as searching the homophones and
their contexts in the Lucene index requires iterating all terms and is
thus slow.
- Since:
- 2.8
-
Nested Class Summary
Nested classes/interfaces inherited from class org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.LuceneSearcher
-
Field Summary
FieldsFields inherited from interface org.languagetool.languagemodel.LanguageModel
GOOGLE_SENTENCE_END, GOOGLE_SENTENCE_START
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate void
dumpOccurrences
(Set<String> tokens) getContext
(String... tokens) Get the context (left and right words) for the given word(s).private org.apache.lucene.index.TermsEnum
long
static void
private void
Methods inherited from class org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
clearCaches, close, doValidateDirectory, getCount, getCount, getLuceneSearcher, toString, validateDirectory
Methods inherited from class org.languagetool.languagemodel.BaseLanguageModel
getPseudoProbability, getPseudoProbabilityStupidBackoff
-
Field Details
-
MIN_COUNT
private static final int MIN_COUNT- See Also:
-
-
Constructor Details
-
HomophoneOccurrenceDumper
HomophoneOccurrenceDumper(File topIndexDir) throws IOException - Throws:
IOException
-
-
Method Details
-
getContext
Get the context (left and right words) for the given word(s). This is slow, as it needs to scan the whole index.- Throws:
IOException
-
run
- Throws:
IOException
-
dumpOccurrences
- Throws:
IOException
-
getIterator
- Throws:
IOException
-
main
- Throws:
IOException
-
getTotalTokenCount
public long getTotalTokenCount()- Overrides:
getTotalTokenCount
in classorg.languagetool.languagemodel.LuceneSingleIndexLanguageModel
-