Package com.ibm.icu.impl
Class TrieBuilder
- java.lang.Object
-
- com.ibm.icu.impl.TrieBuilder
-
- Direct Known Subclasses:
IntTrieBuilder
public class TrieBuilder extends java.lang.Object
Builder class to manipulate and generate a trie. This is useful for ICU data in primitive types. Provides a compact way to store information that is indexed by Unicode values, such as character properties, types, keyboard values, etc. This is very useful when you have a block of Unicode data that contains significant values while the rest of the Unicode data is unused in the application or when you have a lot of redundance, such as where all 21,000 Han ideographs have the same value. However, lookup is much faster than a hash table. A trie of any primitive data type serves two purposes:- Fast access of the indexed values.
- Smaller memory footprint.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
TrieBuilder.DataManipulate
Character data in com.ibm.impl.Trie have different user-specified format for different purposes.
-
Field Summary
Fields Modifier and Type Field Description protected static int
BMP_INDEX_LENGTH_
Length of the BMP portion of the index (stage 1) array.static int
DATA_BLOCK_LENGTH
Number of data values in a stage 2 (data array) block.protected static int
DATA_GRANULARITY_
The alignment size of a stage 2 data block.protected static int
INDEX_SHIFT_
Shift size for shifting left the index array values.protected int
m_dataCapacity_
protected int
m_dataLength_
protected int[]
m_index_
Index values at build-time are 32 bits wide for easier processing.protected int
m_indexLength_
protected boolean
m_isCompacted_
protected boolean
m_isLatin1Linear_
protected int[]
m_map_
Map of adjusted indexes, used in utrie_compact().protected static int
MASK_
Mask for getting the lower bits from the input index.private static int
MAX_BUILD_TIME_DATA_LENGTH_
Maximum length of the build-time data (stage 2) array.protected static int
MAX_DATA_LENGTH_
Maximum length of the runtime data (stage 2) array.protected static int
MAX_INDEX_LENGTH_
Length of the index (stage 1) array before folding.protected static int
OPTIONS_DATA_IS_32_BIT_
If set, then the data (stage 2) array is 32 bits wide.protected static int
OPTIONS_INDEX_SHIFT_
Shifting to position the index value in optionsprotected static int
OPTIONS_LATIN1_IS_LINEAR_
If set, then Latin-1 data (for U+0000..U+00ff) is stored in the data (stage 2) array as a simple, linear array at data + DATA_BLOCK_LENGTH.protected static int
SHIFT_
Shift size for shifting right the input index.protected static int
SURROGATE_BLOCK_COUNT_
Number of index (stage 1) entries per lead surrogate.
-
Constructor Summary
Constructors Modifier Constructor Description protected
TrieBuilder()
protected
TrieBuilder(TrieBuilder table)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected static boolean
equal_int(int[] array, int start1, int start2, int length)
Compare two sections of an array for equality.protected static int
findSameIndexBlock(int[] index, int indexLength, int otherBlock)
Finds the same index block as the otherBlockprotected void
findUnusedBlocks()
Set a value in the trie index map to indicate which data block is referenced and which one is not.boolean
isInZeroBlock(int ch)
Checks if the character belongs to a zero block in the trie
-
-
-
Field Detail
-
DATA_BLOCK_LENGTH
public static final int DATA_BLOCK_LENGTH
Number of data values in a stage 2 (data array) block. 2, 4, 8, .., 0x200- See Also:
- Constant Field Values
-
m_index_
protected int[] m_index_
Index values at build-time are 32 bits wide for easier processing. Bit 31 is set if the data block is used by multiple index values (from setRange()).
-
m_indexLength_
protected int m_indexLength_
-
m_dataCapacity_
protected int m_dataCapacity_
-
m_dataLength_
protected int m_dataLength_
-
m_isLatin1Linear_
protected boolean m_isLatin1Linear_
-
m_isCompacted_
protected boolean m_isCompacted_
-
m_map_
protected int[] m_map_
Map of adjusted indexes, used in utrie_compact(). Maps from original indexes to new ones.
-
SHIFT_
protected static final int SHIFT_
Shift size for shifting right the input index. 1..9- See Also:
- Constant Field Values
-
MAX_INDEX_LENGTH_
protected static final int MAX_INDEX_LENGTH_
Length of the index (stage 1) array before folding. Maximum number of Unicode code points (0x110000) shifted right by SHIFT.- See Also:
- Constant Field Values
-
BMP_INDEX_LENGTH_
protected static final int BMP_INDEX_LENGTH_
Length of the BMP portion of the index (stage 1) array.- See Also:
- Constant Field Values
-
SURROGATE_BLOCK_COUNT_
protected static final int SURROGATE_BLOCK_COUNT_
Number of index (stage 1) entries per lead surrogate. Same as number of indexe entries for 1024 trail surrogates, ==0x400>>UTRIE_SHIFT 10 - SHIFT == Number of bits of a trail surrogate that are used in index table lookups.- See Also:
- Constant Field Values
-
MASK_
protected static final int MASK_
Mask for getting the lower bits from the input index. DATA_BLOCK_LENGTH - 1.- See Also:
- Constant Field Values
-
INDEX_SHIFT_
protected static final int INDEX_SHIFT_
Shift size for shifting left the index array values. Increases possible data size with 16-bit index values at the cost of compactability. This requires blocks of stage 2 data to be aligned by UTRIE_DATA_GRANULARITY. 0..UTRIE_SHIFT- See Also:
- Constant Field Values
-
MAX_DATA_LENGTH_
protected static final int MAX_DATA_LENGTH_
Maximum length of the runtime data (stage 2) array. Limited by 16-bit index values that are left-shifted by INDEX_SHIFT_.- See Also:
- Constant Field Values
-
OPTIONS_INDEX_SHIFT_
protected static final int OPTIONS_INDEX_SHIFT_
Shifting to position the index value in options- See Also:
- Constant Field Values
-
OPTIONS_DATA_IS_32_BIT_
protected static final int OPTIONS_DATA_IS_32_BIT_
If set, then the data (stage 2) array is 32 bits wide.- See Also:
- Constant Field Values
-
OPTIONS_LATIN1_IS_LINEAR_
protected static final int OPTIONS_LATIN1_IS_LINEAR_
If set, then Latin-1 data (for U+0000..U+00ff) is stored in the data (stage 2) array as a simple, linear array at data + DATA_BLOCK_LENGTH.- See Also:
- Constant Field Values
-
DATA_GRANULARITY_
protected static final int DATA_GRANULARITY_
The alignment size of a stage 2 data block. Also the granularity for compaction.- See Also:
- Constant Field Values
-
MAX_BUILD_TIME_DATA_LENGTH_
private static final int MAX_BUILD_TIME_DATA_LENGTH_
Maximum length of the build-time data (stage 2) array. The maximum length is 0x110000 + DATA_BLOCK_LENGTH + 0x400. (Number of Unicode code points + one all-initial-value block + possible duplicate entries for 1024 lead surrogates.)- See Also:
- Constant Field Values
-
-
Constructor Detail
-
TrieBuilder
protected TrieBuilder()
-
TrieBuilder
protected TrieBuilder(TrieBuilder table)
-
-
Method Detail
-
isInZeroBlock
public boolean isInZeroBlock(int ch)
Checks if the character belongs to a zero block in the trie- Parameters:
ch
- codepoint which data is to be retrieved- Returns:
- true if ch is in the zero block
-
equal_int
protected static final boolean equal_int(int[] array, int start1, int start2, int length)
Compare two sections of an array for equality.
-
findUnusedBlocks
protected void findUnusedBlocks()
Set a value in the trie index map to indicate which data block is referenced and which one is not. utrie_compact() will remove data blocks that are not used at all. Set - 0 if it is used - -1 if it is not used
-
findSameIndexBlock
protected static final int findSameIndexBlock(int[] index, int indexLength, int otherBlock)
Finds the same index block as the otherBlock- Parameters:
index
- arrayindexLength
- size of indexotherBlock
-- Returns:
- same index block
-
-