org.apache.lucene.analysis
Class WordlistLoader
public class WordlistLoader
Loader for text files that represent a list of stopwords.
$Id: WordlistLoader.java 472959 2006-11-09 16:21:50Z yonik $static HashMap | getStemDict(File wordstemfile) - Reads a stem dictionary.
|
static HashSet | getWordSet(File wordfile) - Loads a text file and adds every line as an entry to a HashSet (omitting
leading and trailing whitespace).
|
static HashSet | getWordSet(Reader reader) - Reads lines from a Reader and adds every line as an entry to a HashSet (omitting
leading and trailing whitespace).
|
getStemDict
public static HashMap getStemDict(File wordstemfile)
throws IOException
Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab seperated words)
- stem dictionary that overrules the stemming algorithm
getWordSet
public static HashSet getWordSet(File wordfile)
throws IOException
Loads a text file and adds every line as an entry to a HashSet (omitting
leading and trailing whitespace). Every line of the file should contain only
one word. The words need to be in lowercase if you make use of an
Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
wordfile
- File containing the wordlist
- A HashSet with the file's words
getWordSet
public static HashSet getWordSet(Reader reader)
throws IOException
Reads lines from a Reader and adds every line as an entry to a HashSet (omitting
leading and trailing whitespace). Every line of the Reader should contain only
one word. The words need to be in lowercase if you make use of an
Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
reader
- Reader containing the wordlist
- A HashSet with the reader's words
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.