org.apache.lucene.analysis.standard
public class StandardTokenizer extends Tokenizer implements StandardTokenizerConstants
This should be a good tokenizer for most European-language documents:
Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.
Field Summary | |
---|---|
Token | jj_nt |
Token | token |
StandardTokenizerTokenManager | token_source |
Constructor Summary | |
---|---|
StandardTokenizer(Reader reader) Constructs a tokenizer for this Reader. | |
StandardTokenizer(CharStream stream) | |
StandardTokenizer(StandardTokenizerTokenManager tm) |
Method Summary | |
---|---|
void | disable_tracing() |
void | enable_tracing() |
ParseException | generateParseException() |
Token | getNextToken() |
Token | getToken(int index) |
Token | next() Returns the next token in the stream, or null at EOS.
|
void | ReInit(CharStream stream) |
void | ReInit(StandardTokenizerTokenManager tm) |
The returned token's type is set to an element of tokenImage.