com.ibm.icu.text

Class BreakDictionary


public class BreakDictionary
extends Object

This is the class that represents the list of known words used by DictionaryBasedBreakIterator. The conceptual data structure used here is a trie: there is a node hanging off the root node for every letter that can start a word. Each of these nodes has a node hanging off of it for every letter that can be the second letter of a word if this node is the first letter, and so on. The trie is represented as a two-dimensional array that can be treated as a table of state transitions. Indexes are used to compress this array, taking advantage of the fact that this array will always be very sparse.

Constructor Summary

BreakDictionary(InputStream dictionaryStream)

Method Summary

short
at(int row, char ch)
Uses the column map to map the character to a column number, then passes the row and column number to the other version of at()
short
at(int row, int col)
Returns the value in the cell with the specified (logical) row and column numbers.
static void
main(args[] )
void
printWordList(String partialWord, int state, PrintWriter out)
void
readDictionaryFile(DataInputStream in)

Constructor Details

BreakDictionary

public BreakDictionary(InputStream dictionaryStream)
            throws IOException

Method Details

at

public final short at(int row,
                      char ch)
Uses the column map to map the character to a column number, then passes the row and column number to the other version of at()
Parameters:
row - The current state
ch - The character whose column we're interested in
Returns:
The new state to transition to

at

public final short at(int row,
                      int col)
Returns the value in the cell with the specified (logical) row and column numbers. In DictionaryBasedBreakIterator, the row number is a state number, the column number is an input, and the return value is the row number of the new state to transition to. (0 is the "error" state, and -1 is the "end of word" state in a dictionary)
Parameters:
row - The row number of the current state
col - The column number of the input character (0 means "not a dictionary character")
Returns:
The row number of the new state to transition to

main

public static void main(args[] )
            throws FileNotFoundException,
                   UnsupportedEncodingException,
                   IOException

printWordList

public void printWordList(String partialWord,
                          int state,
                          PrintWriter out)
            throws IOException

readDictionaryFile

public void readDictionaryFile(DataInputStream in)
            throws IOException

Copyright (c) 2006 IBM Corporation and others.