- java.lang.Object
-
- org.apache.lucene.analysis.fa.PersianStemmer
-
public class PersianStemmer extends java.lang.Object
Stemmer for Persian.Stemming is done in-place for efficiency, operating on a termbuffer.
Stemming is defined as:
- Removal of attached definite article, conjunction, and prepositions.
- Stemming of common suffixes.
-
-
Constructor Summary
Constructors Constructor Description PersianStemmer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private boolean
endsWithCheckLength(char[] s, int len, char[] suffix)
Returns true if the suffix matches and can be stemmedint
stem(char[] s, int len)
Stem an input buffer of Persian text.private int
stemSuffix(char[] s, int len)
Stem suffix(es) off a Persian word.
-
-
-
Field Detail
-
ALEF
private static final char ALEF
- See Also:
- Constant Field Values
-
HEH
private static final char HEH
- See Also:
- Constant Field Values
-
TEH
private static final char TEH
- See Also:
- Constant Field Values
-
REH
private static final char REH
- See Also:
- Constant Field Values
-
NOON
private static final char NOON
- See Also:
- Constant Field Values
-
YEH
private static final char YEH
- See Also:
- Constant Field Values
-
ZWNJ
private static final char ZWNJ
- See Also:
- Constant Field Values
-
suffixes
private static final char[][] suffixes
-
-
Method Detail
-
stem
public int stem(char[] s, int len)
Stem an input buffer of Persian text.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- length of input buffer after normalization
-
stemSuffix
private int stemSuffix(char[] s, int len)
Stem suffix(es) off a Persian word.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- new length of input buffer after stemming
-
endsWithCheckLength
private boolean endsWithCheckLength(char[] s, int len, char[] suffix)
Returns true if the suffix matches and can be stemmed- Parameters:
s
- input bufferlen
- length of input buffersuffix
- suffix to check- Returns:
- true if the suffix matches and can be stemmed
-
-