org.apache.lucene.analysis.ngram

Class NGramTokenizer


public class NGramTokenizer
extends Tokenizer

Tokenizes the input into n-grams of the given size(s).
Author:
Otis Gospodnetic

Field Summary

static int
DEFAULT_MAX_NGRAM_SIZE
static int
DEFAULT_MIN_NGRAM_SIZE

Fields inherited from class org.apache.lucene.analysis.Tokenizer

input

Constructor Summary

NGramTokenizer(Reader input)
Creates NGramTokenizer with default min and max n-grams.
NGramTokenizer(Reader input, int minGram, int maxGram)
Creates NGramTokenizer with given min and max n-grams.

Method Summary

Token
next()
Returns the next token in the stream, or null at EOS.

Methods inherited from class org.apache.lucene.analysis.Tokenizer

close

Methods inherited from class org.apache.lucene.analysis.TokenStream

close, next

Field Details

DEFAULT_MAX_NGRAM_SIZE

public static final int DEFAULT_MAX_NGRAM_SIZE
Field Value:
2

DEFAULT_MIN_NGRAM_SIZE

public static final int DEFAULT_MIN_NGRAM_SIZE
Field Value:
1

Constructor Details

NGramTokenizer

public NGramTokenizer(Reader input)
Creates NGramTokenizer with default min and max n-grams.
Parameters:
input - Reader holding the input to be tokenized

NGramTokenizer

public NGramTokenizer(Reader input,
                      int minGram,
                      int maxGram)
Creates NGramTokenizer with given min and max n-grams.
Parameters:
input - Reader holding the input to be tokenized
minGram - the smallest n-gram to generate
maxGram - the largest n-gram to generate

Method Details

next

public final Token next()
            throws IOException
Returns the next token in the stream, or null at EOS.
Overrides:
next in interface TokenStream

Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.