org.apache.lucene.analysis.ru

Class RussianLetterTokenizer

public class RussianLetterTokenizer extends CharTokenizer

A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method, which doesn't know how to detect letters in encodings like CP1252 and KOI8 (well-known problems with 0xD7 and 0xF7 chars)

Version: $Id: RussianLetterTokenizer.java 472959 2006-11-09 16:21:50Z yonik $

Author: Boris Okner, b.okner@rogers.com

Constructor Summary
RussianLetterTokenizer(Reader in, char[] charset)
Method Summary
protected booleanisTokenChar(char c)
Collects only characters which satisfy Character#isLetter(char).

Constructor Detail

RussianLetterTokenizer

public RussianLetterTokenizer(Reader in, char[] charset)

Method Detail

isTokenChar

protected boolean isTokenChar(char c)
Collects only characters which satisfy Character#isLetter(char).
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.