Class RegexTokenizer
- java.lang.Object
-
- org.apache.commons.text.similarity.RegexTokenizer
-
- All Implemented Interfaces:
Tokenizer<java.lang.CharSequence>
final class RegexTokenizer extends java.lang.Object implements Tokenizer<java.lang.CharSequence>
A simple word tokenizer that utilizes regex to find words. It applies a regex(\w)+
over the input text to extract words from a given character sequence.- Since:
- 1.0
-
-
Field Summary
Fields Modifier and Type Field Description private static java.util.regex.Pattern
PATTERN
The whitespace pattern.
-
Constructor Summary
Constructors Constructor Description RegexTokenizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.CharSequence[]
tokenize(java.lang.CharSequence text)
Returns an array of tokens.
-