Class EnglishChunker

java.lang.Object
org.languagetool.chunking.EnglishChunker
All Implemented Interfaces:
org.languagetool.chunking.Chunker

public class EnglishChunker extends Object implements org.languagetool.chunking.Chunker
OpenNLP-based chunker. Also uses the OpenNLP tokenizer and POS tagger and maps the result to our own tokens (we have our own tokenizer), as far as trivially possible.
Since:
2.3
  • Field Details

    • TOKENIZER_MODEL

      private static final String TOKENIZER_MODEL
      See Also:
    • POS_TAGGER_MODEL

      private static final String POS_TAGGER_MODEL
      See Also:
    • CHUNKER_MODEL

      private static final String CHUNKER_MODEL
      See Also:
    • tokenModel

      private static volatile opennlp.tools.tokenize.TokenizerModel tokenModel
      This needs to be static to save memory: as Language.LANGUAGES is static, any language that is once created there will never be released. As English has several variants, we'd have as many posModels etc. as we have variants -> huge waste of memory:
    • posModel

      private static volatile opennlp.tools.postag.POSModel posModel
    • chunkerModel

      private static volatile opennlp.tools.chunker.ChunkerModel chunkerModel
    • chunkFilter

      private final EnglishChunkFilter chunkFilter
  • Constructor Details

    • EnglishChunker

      public EnglishChunker()
  • Method Details

    • addChunkTags

      public void addChunkTags(List<org.languagetool.AnalyzedTokenReadings> tokenReadings)
      Specified by:
      addChunkTags in interface org.languagetool.chunking.Chunker
    • getChunkTagsForReadings

      private List<ChunkTaggedToken> getChunkTagsForReadings(List<org.languagetool.AnalyzedTokenReadings> tokenReadings)
    • tokenize

      String[] tokenize(String sentence)
    • posTag

      private String[] posTag(String[] tokens)
    • chunk

      private String[] chunk(String[] tokens, String[] posTags)
    • getTokensWithTokenReadings

      private List<ChunkTaggedToken> getTokensWithTokenReadings(List<org.languagetool.AnalyzedTokenReadings> tokenReadings, String[] tokens, String[] chunkTags)
    • assignChunksToReadings

      private void assignChunksToReadings(List<ChunkTaggedToken> chunkTaggedTokens)
    • getSentence

      private String getSentence(List<org.languagetool.AnalyzedTokenReadings> sentenceTokens)
    • getAnalyzedTokenReadingsFor

      @Nullable private @Nullable org.languagetool.AnalyzedTokenReadings getAnalyzedTokenReadingsFor(int startPos, int endPos, List<org.languagetool.AnalyzedTokenReadings> tokenReadings)