Class HomophoneOccurrenceDumper

java.lang.Object
org.languagetool.languagemodel.BaseLanguageModel
org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
org.languagetool.dev.HomophoneOccurrenceDumper
All Implemented Interfaces:
AutoCloseable, org.languagetool.languagemodel.LanguageModel

class HomophoneOccurrenceDumper extends org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
Dump the occurrences of homophone 3grams to STDOUT. Useful to have a more compact file with homophone occurrences, as searching the homophones and their contexts in the Lucene index requires iterating all terms and is thus slow.
Since:
2.8
  • Nested Class Summary

    Nested classes/interfaces inherited from class org.languagetool.languagemodel.LuceneSingleIndexLanguageModel

    org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.LuceneSearcher
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final int
     

    Fields inherited from interface org.languagetool.languagemodel.LanguageModel

    GOOGLE_SENTENCE_END, GOOGLE_SENTENCE_START
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    private void
     
    (package private) Map<String,Long>
    getContext(String... tokens)
    Get the context (left and right words) for the given word(s).
    private org.apache.lucene.index.TermsEnum
     
    long
     
    static void
    main(String[] args)
     
    private void
    run(String confusionSetPath)
     

    Methods inherited from class org.languagetool.languagemodel.LuceneSingleIndexLanguageModel

    clearCaches, close, doValidateDirectory, getCount, getCount, getLuceneSearcher, toString, validateDirectory

    Methods inherited from class org.languagetool.languagemodel.BaseLanguageModel

    getPseudoProbability, getPseudoProbabilityStupidBackoff

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

  • Constructor Details

  • Method Details

    • getContext

      Map<String,Long> getContext(String... tokens) throws IOException
      Get the context (left and right words) for the given word(s). This is slow, as it needs to scan the whole index.
      Throws:
      IOException
    • run

      private void run(String confusionSetPath) throws IOException
      Throws:
      IOException
    • dumpOccurrences

      private void dumpOccurrences(Set<String> tokens) throws IOException
      Throws:
      IOException
    • getIterator

      private org.apache.lucene.index.TermsEnum getIterator() throws IOException
      Throws:
      IOException
    • main

      public static void main(String[] args) throws IOException
      Throws:
      IOException
    • getTotalTokenCount

      public long getTotalTokenCount()
      Overrides:
      getTotalTokenCount in class org.languagetool.languagemodel.LuceneSingleIndexLanguageModel