Package org.languagetool.tokenizers.gl
Class GalicianWordTokenizer
java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.gl.GalicianWordTokenizer
- All Implemented Interfaces:
org.languagetool.tokenizers.Tokenizer
public class GalicianWordTokenizer
extends org.languagetool.tokenizers.WordTokenizer
Tokenizes a sentence into words. Punctuation and whitespace gets its own token.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Pattern
private static final String
private static final Pattern
private static final String
private static final Pattern
private static final String
private static final char
private static final Pattern
private static final Pattern
private static final String
private static final Pattern
private static final String
private static final char
private static final char
private static final char
private static final String
-
Constructor Summary
Constructors -
Method Summary
Methods inherited from class org.languagetool.tokenizers.WordTokenizer
getProtocols, getTokenizingCharacters, isEMail, isUrl, joinEMails, joinEMailsAndUrls, joinUrls
-
Field Details
-
SPLIT_CHARS
- See Also:
-
DECIMAL_COMMA_SUBST
private static final char DECIMAL_COMMA_SUBST- See Also:
-
NON_BREAKING_SPACE_SUBST
private static final char NON_BREAKING_SPACE_SUBST- See Also:
-
NON_BREAKING_DOT_SUBST
private static final char NON_BREAKING_DOT_SUBST- See Also:
-
NON_BREAKING_COLON_SUBST
private static final char NON_BREAKING_COLON_SUBST- See Also:
-
DECIMAL_COMMA_PATTERN
-
DECIMAL_COMMA_REPL
- See Also:
-
DECIMAL_SPACE_PATTERN
-
DOTTED_NUMBERS_PATTERN
-
DOTTED_NUMBERS_REPL
- See Also:
-
COLON_NUMBERS_PATTERN
-
COLON_NUMBERS_REPL
- See Also:
-
DATE_PATTERN
-
DATE_PATTERN_REPL
- See Also:
-
DOTTED_ORDINALS_PATTERN
-
DOTTED_ORDINALS_REPL
- See Also:
-
-
Constructor Details
-
GalicianWordTokenizer
public GalicianWordTokenizer()
-
-
Method Details
-
tokenize
- Specified by:
tokenize
in interfaceorg.languagetool.tokenizers.Tokenizer
- Overrides:
tokenize
in classorg.languagetool.tokenizers.WordTokenizer
-