org.apache.lucene.analysis

Class StopFilter


public final class StopFilter
extends TokenFilter

Removes stop words from a token stream.

Field Summary

Fields inherited from class org.apache.lucene.analysis.TokenFilter

input

Constructor Summary

StopFilter(TokenStream in, Set stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set.
StopFilter(TokenStream input, Set stopWords, boolean ignoreCase)
Construct a token stream filtering the given input.
StopFilter(TokenStream input, String[] stopWords)
Construct a token stream filtering the given input.
StopFilter(TokenStream in, String[] stopWords, boolean ignoreCase)
Constructs a filter which removes words from the input TokenStream that are named in the array of words.

Method Summary

static Set
makeStopSet(String[] stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
static Set
makeStopSet(String[] stopWords, boolean ignoreCase)
Token
next()
Returns the next input Token whose termText() is not a stop word.

Methods inherited from class org.apache.lucene.analysis.TokenFilter

close

Methods inherited from class org.apache.lucene.analysis.TokenStream

close, next

Constructor Details

StopFilter

public StopFilter(TokenStream in,
                  Set stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set. It is crucial that an efficient Set implementation is used for maximum performance.
See Also:
makeStopSet(java.lang.String[])

StopFilter

public StopFilter(TokenStream input,
                  Set stopWords,
                  boolean ignoreCase)
Construct a token stream filtering the given input.
Parameters:
input -
stopWords - The set of Stop Words, as Strings. If ignoreCase is true, all strings should be lower cased
ignoreCase - -Ignore case when stopping. The stopWords set must be setup to contain only lower case words

StopFilter

public StopFilter(TokenStream input,
                  String[] stopWords)
Construct a token stream filtering the given input.

StopFilter

public StopFilter(TokenStream in,
                  String[] stopWords,
                  boolean ignoreCase)
Constructs a filter which removes words from the input TokenStream that are named in the array of words.

Method Details

makeStopSet

public static final Set makeStopSet(String[] stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.
See Also:
passing false to ignoreCase

makeStopSet

public static final Set makeStopSet(String[] stopWords,
                                    boolean ignoreCase)
Parameters:
stopWords -
ignoreCase - If true, all words are lower cased first.
Returns:
a Set containing the words

next

public final Token next()
            throws IOException
Returns the next input Token whose termText() is not a stop word.
Overrides:
next in interface TokenStream

Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.