org.apache.lucene.analysis.ru
Class RussianLetterTokenizer
public class RussianLetterTokenizer
A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters
in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method,
which doesn't know how to detect letters in encodings like CP1252 and KOI8
(well-known problems with 0xD7 and 0xF7 chars)
$Id: RussianLetterTokenizer.java 472959 2006-11-09 16:21:50Z yonik $- Boris Okner, b.okner@rogers.com
protected boolean | isTokenChar(char c) - Collects only characters which satisfy
Character.isLetter(char) .
|
RussianLetterTokenizer
public RussianLetterTokenizer(Reader in,
char[] charset)
isTokenChar
protected boolean isTokenChar(char c)
Collects only characters which satisfy
Character.isLetter(char)
.
- isTokenChar in interface CharTokenizer
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.