Package org.languagetool.chunking
Class EnglishChunker
java.lang.Object
org.languagetool.chunking.EnglishChunker
- All Implemented Interfaces:
org.languagetool.chunking.Chunker
OpenNLP-based chunker. Also uses the OpenNLP tokenizer and POS tagger and
maps the result to our own tokens (we have our own tokenizer), as far as trivially possible.
- Since:
- 2.3
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final String
private static opennlp.tools.chunker.ChunkerModel
private final EnglishChunkFilter
private static final String
private static opennlp.tools.postag.POSModel
private static final String
private static opennlp.tools.tokenize.TokenizerModel
This needs to be static to save memory: as Language.LANGUAGES is static, any language that is once created there will never be released. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
addChunkTags
(List<org.languagetool.AnalyzedTokenReadings> tokenReadings) private void
assignChunksToReadings
(List<ChunkTaggedToken> chunkTaggedTokens) private String[]
private @Nullable org.languagetool.AnalyzedTokenReadings
getAnalyzedTokenReadingsFor
(int startPos, int endPos, List<org.languagetool.AnalyzedTokenReadings> tokenReadings) private List
<ChunkTaggedToken> getChunkTagsForReadings
(List<org.languagetool.AnalyzedTokenReadings> tokenReadings) private String
getSentence
(List<org.languagetool.AnalyzedTokenReadings> sentenceTokens) private List
<ChunkTaggedToken> getTokensWithTokenReadings
(List<org.languagetool.AnalyzedTokenReadings> tokenReadings, String[] tokens, String[] chunkTags) private String[]
(package private) String[]
-
Field Details
-
TOKENIZER_MODEL
- See Also:
-
POS_TAGGER_MODEL
- See Also:
-
CHUNKER_MODEL
- See Also:
-
tokenModel
private static volatile opennlp.tools.tokenize.TokenizerModel tokenModelThis needs to be static to save memory: as Language.LANGUAGES is static, any language that is once created there will never be released. As English has several variants, we'd have as many posModels etc. as we have variants -> huge waste of memory: -
posModel
private static volatile opennlp.tools.postag.POSModel posModel -
chunkerModel
private static volatile opennlp.tools.chunker.ChunkerModel chunkerModel -
chunkFilter
-
-
Constructor Details
-
EnglishChunker
public EnglishChunker()
-
-
Method Details
-
addChunkTags
- Specified by:
addChunkTags
in interfaceorg.languagetool.chunking.Chunker
-
getChunkTagsForReadings
private List<ChunkTaggedToken> getChunkTagsForReadings(List<org.languagetool.AnalyzedTokenReadings> tokenReadings) -
tokenize
-
posTag
-
chunk
-
getTokensWithTokenReadings
private List<ChunkTaggedToken> getTokensWithTokenReadings(List<org.languagetool.AnalyzedTokenReadings> tokenReadings, String[] tokens, String[] chunkTags) -
assignChunksToReadings
-
getSentence
-
getAnalyzedTokenReadingsFor
@Nullable private @Nullable org.languagetool.AnalyzedTokenReadings getAnalyzedTokenReadingsFor(int startPos, int endPos, List<org.languagetool.AnalyzedTokenReadings> tokenReadings)
-