org.apache.lucene.analysis

Class WordlistLoader


public class WordlistLoader
extends Object

Loader for text files that represent a list of stopwords.
Version:
$Id: WordlistLoader.java 472959 2006-11-09 16:21:50Z yonik $
Author:
Gerhard Schwarz

Method Summary

static HashMap
getStemDict(File wordstemfile)
Reads a stem dictionary.
static HashSet
getWordSet(File wordfile)
Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace).
static HashSet
getWordSet(Reader reader)
Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace).

Method Details

getStemDict

public static HashMap getStemDict(File wordstemfile)
            throws IOException
Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab seperated words)
Returns:
stem dictionary that overrules the stemming algorithm

getWordSet

public static HashSet getWordSet(File wordfile)
            throws IOException
Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
Parameters:
wordfile - File containing the wordlist
Returns:
A HashSet with the file's words

getWordSet

public static HashSet getWordSet(Reader reader)
            throws IOException
Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
Parameters:
reader - Reader containing the wordlist
Returns:
A HashSet with the reader's words

Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.