Package org.languagetool.tokenizers.ca
Class CatalanWordTokenizer
java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.ca.CatalanWordTokenizer
- All Implemented Interfaces:
org.languagetool.tokenizers.Tokenizer
public class CatalanWordTokenizer
extends org.languagetool.tokenizers.WordTokenizer
Tokenizes a sentence into words. Punctuation and whitespace gets its own token.
Special treatment for hyphens and apostrophes in Catalan.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Pattern
private static final Pattern
private static final Pattern
private static final Pattern
private static final Pattern
private static final Pattern
private static final String
private static final Pattern
private static final Pattern
private static final Pattern
private static final int
private static final Pattern
private final Pattern[]
private static final String
private static final Pattern
private static final Pattern
private static final Pattern
protected org.languagetool.rules.spelling.morfologik.MorfologikSpeller
-
Constructor Summary
Constructors -
Method Summary
Methods inherited from class org.languagetool.tokenizers.WordTokenizer
getProtocols, getTokenizingCharacters, isEMail, isUrl, joinEMails, joinEMailsAndUrls, joinUrls
-
Field Details
-
PF
- See Also:
-
maxPatterns
private static final int maxPatterns- See Also:
-
patterns
-
DICT_FILENAME
- See Also:
-
speller
protected org.languagetool.rules.spelling.morfologik.MorfologikSpeller speller -
ELA_GEMINADA
-
ELA_GEMINADA_UPPERCASE
-
APOSTROF_RECTE
-
APOSTROF_RODO
-
APOSTROF_RECTE_1
-
APOSTROF_RODO_1
-
NEARBY_HYPHENS
-
HYPHENS
-
DECIMAL_POINT
-
DECIMAL_COMMA
-
SPACE_DIGITS0
-
SPACE_DIGITS
-
SPACE_DIGITS2
-
-
Constructor Details
-
CatalanWordTokenizer
public CatalanWordTokenizer()
-
-
Method Details
-
tokenize
- Specified by:
tokenize
in interfaceorg.languagetool.tokenizers.Tokenizer
- Overrides:
tokenize
in classorg.languagetool.tokenizers.WordTokenizer
- Parameters:
text
- Text to tokenize- Returns:
- List of tokens. Note: a special string CA_APOS is used to replace apostrophes, and CA_HYPHEN to replace hyphens.
-
wordsToAdd
-