org.apache.lucene.analysis.cz
Class CzechAnalyzer
public final class CzechAnalyzer
Analyzer for Czech language. Supports an external list of stopwords (words that
will not be indexed at all).
A default set of stopwords is used unless an alternative list is specified, the
exclusion list is empty by default.
- Lukas Zapletal [lzap@root.cz]
void | loadStopWords(InputStream wordfile, String encoding) - Loads stopwords hash from resource stream (file, database...).
|
TokenStream | tokenStream(String fieldName, Reader reader) - Creates a TokenStream which tokenizes all the text in the provided Reader.
|
CZECH_STOP_WORDS
public static final String[] CZECH_STOP_WORDS
List of typical stopwords.
CzechAnalyzer
public CzechAnalyzer()
CzechAnalyzer
public CzechAnalyzer(File stopwords)
throws IOException
Builds an analyzer with the given stop words.
CzechAnalyzer
public CzechAnalyzer(HashSet stopwords)
CzechAnalyzer
public CzechAnalyzer(String[] stopwords)
Builds an analyzer with the given stop words.
loadStopWords
public void loadStopWords(InputStream wordfile,
String encoding)
Loads stopwords hash from resource stream (file, database...).
wordfile
- File containing the wordlistencoding
- Encoding used (win-1250, iso-8859-2, ...), null for default system encoding
tokenStream
public final TokenStream tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.
- tokenStream in interface Analyzer
- A TokenStream build from a StandardTokenizer filtered with
StandardFilter, LowerCaseFilter, and StopFilter
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.