Class DutchWordTokenizer

java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.nl.DutchWordTokenizer
All Implemented Interfaces:
org.languagetool.tokenizers.Tokenizer

public class DutchWordTokenizer extends org.languagetool.tokenizers.WordTokenizer
  • Field Details

    • QUOTES

      private static final List<String> QUOTES
    • nlTokenizingChars

      private final String nlTokenizingChars
  • Constructor Details

    • DutchWordTokenizer

      public DutchWordTokenizer()
  • Method Details

    • tokenize

      public List<String> tokenize(String text)
      Tokenizes just like WordTokenizer with the exception for words such as "oma's" that contain an apostrophe in their middle.
      Specified by:
      tokenize in interface org.languagetool.tokenizers.Tokenizer
      Overrides:
      tokenize in class org.languagetool.tokenizers.WordTokenizer
      Parameters:
      text - Text to tokenize
      Returns:
      List of tokens
    • startsWithQuote

      private boolean startsWithQuote(String token)
    • endsWithQuote

      private boolean endsWithQuote(String token)
    • getTokenizingCharacters

      public String getTokenizingCharacters()
      Overrides:
      getTokenizingCharacters in class org.languagetool.tokenizers.WordTokenizer