org.apache.lucene.analysis.ru
Class RussianLetterTokenizer
public
class
RussianLetterTokenizer
extends CharTokenizer
A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters
in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method,
which doesn't know how to detect letters in encodings like CP1252 and KOI8
(well-known problems with 0xD7 and 0xF7 chars)
Version: $Id: RussianLetterTokenizer.java 472959 2006-11-09 16:21:50Z yonik $
Author: Boris Okner, b.okner@rogers.com
public RussianLetterTokenizer(Reader in, char[] charset)
protected boolean isTokenChar(char c)
Collects only characters which satisfy
Character#isLetter(char).
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.