Package com.ibm.icu.impl.coll
Class CollationData
- java.lang.Object
-
- com.ibm.icu.impl.coll.CollationData
-
public final class CollationData extends java.lang.Object
Collation data container. Immutable data created by a CollationDataBuilder, or loaded from a file, or deserialized from API-provided binary data. Includes data for the collation base (root/default), aliased if this is not the base.
-
-
Field Summary
Fields Modifier and Type Field Description CollationData
base
Base collation data, or null if this data itself is a base.(package private) int[]
ce32s
Array of CE32 values.(package private) long[]
ces
Array of CE values for expansions and OFFSET_TAG.boolean[]
compressibleBytes
256 flags for which primary-weight lead bytes are compressible.(package private) java.lang.String
contexts
Array of prefix and contraction-suffix matching data.private static int[]
EMPTY_INT_ARRAY
char[]
fastLatinTable
Fast Latin table for common-Latin-text string comparisons.(package private) char[]
fastLatinTableHeader
Header portion of the fastLatinTable.(package private) static int
JAMO_CE32S_LENGTH
(package private) int[]
jamoCE32s
Simple array of JAMO_CE32S_LENGTH=19+21+27 CE32s, one per canonical Jamo L/V/T.(package private) static int
MAX_NUM_SPECIAL_REORDER_CODES
Normalizer2Impl
nfcImpl
(package private) long
numericPrimary
The single-byte primary weight (xx000000) for numeric collation.(package private) int
numScripts
Data for scripts and reordering groups.(package private) static int
REORDER_RESERVED_AFTER_LATIN
(package private) static int
REORDER_RESERVED_BEFORE_LATIN
long[]
rootElements
Collation elements in the root collator.(package private) char[]
scriptsIndex
The length of scriptsIndex is numScripts+16.(package private) char[]
scriptStarts
Start primary weight (top 16 bits only) for a group/script/reserved range indexed by scriptsIndex.(package private) Trie2_32
trie
Main lookup trie.(package private) UnicodeSet
unsafeBackwardSet
Set of code points that are unsafe for starting string comparison after an identical prefix, or in backwards CE iteration.
-
Constructor Summary
Constructors Constructor Description CollationData(Normalizer2Impl nfc)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private int
addHighScriptRange(short[] table, int index, int highLimit)
private int
addLowScriptRange(short[] table, int index, int lowStart)
int
getCE32(int c)
(package private) int
getCE32FromContexts(int index)
Returns the CE32 from two contexts words.(package private) int
getCE32FromSupplementary(int c)
(package private) long
getCEFromOffsetCE32(int c, int ce32)
Computes a CE from c's ce32 which has the OFFSET_TAG.int[]
getEquivalentScripts(int script)
(package private) int
getFCD16(int c)
Returns the FCD16 value for code point c.(package private) int
getFinalCE32(int ce32)
Returns the CE32 for an indirect special CE32 (e.g., with DIGIT_TAG), if ce32 is special.(package private) long
getFirstPrimaryForGroup(int script)
Returns the first primary for the script's reordering group.int
getGroupForPrimary(long p)
Finds the reordering group which contains the primary weight.(package private) int
getIndirectCE32(int ce32)
Returns the CE32 for an indirect special CE32 (e.g., with DIGIT_TAG).long
getLastPrimaryForGroup(int script)
Returns the last primary for the script's reordering group.private int
getScriptIndex(int script)
(package private) long
getSingleCE(int c)
Returns the single CE that c maps to.boolean
isCompressibleLeadByte(int b)
boolean
isCompressiblePrimary(long p)
(package private) boolean
isDigit(int c)
boolean
isUnsafeBackward(int c, boolean numeric)
private void
makeReorderRanges(int[] reorder, boolean latinMustMove, UVector32 ranges)
(package private) void
makeReorderRanges(int[] reorder, UVector32 ranges)
Writes the permutation of primary-weight ranges for the given reordering of scripts and groups.private static java.lang.String
scriptCodeString(int script)
-
-
-
Field Detail
-
REORDER_RESERVED_BEFORE_LATIN
static final int REORDER_RESERVED_BEFORE_LATIN
- See Also:
- Constant Field Values
-
REORDER_RESERVED_AFTER_LATIN
static final int REORDER_RESERVED_AFTER_LATIN
- See Also:
- Constant Field Values
-
MAX_NUM_SPECIAL_REORDER_CODES
static final int MAX_NUM_SPECIAL_REORDER_CODES
- See Also:
- Constant Field Values
-
EMPTY_INT_ARRAY
private static final int[] EMPTY_INT_ARRAY
-
JAMO_CE32S_LENGTH
static final int JAMO_CE32S_LENGTH
- See Also:
jamoCE32s
, Constant Field Values
-
trie
Trie2_32 trie
Main lookup trie.
-
ce32s
int[] ce32s
Array of CE32 values. At index 0 there must be CE32(U+0000) to support U+0000's special-tag for NUL-termination handling.
-
ces
long[] ces
Array of CE values for expansions and OFFSET_TAG.
-
contexts
java.lang.String contexts
Array of prefix and contraction-suffix matching data.
-
base
public CollationData base
Base collation data, or null if this data itself is a base.
-
jamoCE32s
int[] jamoCE32s
Simple array of JAMO_CE32S_LENGTH=19+21+27 CE32s, one per canonical Jamo L/V/T. They are normally simple CE32s, rarely expansions. For fast handling of HANGUL_TAG.
-
nfcImpl
public Normalizer2Impl nfcImpl
-
numericPrimary
long numericPrimary
The single-byte primary weight (xx000000) for numeric collation.
-
compressibleBytes
public boolean[] compressibleBytes
256 flags for which primary-weight lead bytes are compressible.
-
unsafeBackwardSet
UnicodeSet unsafeBackwardSet
Set of code points that are unsafe for starting string comparison after an identical prefix, or in backwards CE iteration.
-
fastLatinTable
public char[] fastLatinTable
Fast Latin table for common-Latin-text string comparisons. Data structure see class CollationFastLatin.
-
fastLatinTableHeader
char[] fastLatinTableHeader
Header portion of the fastLatinTable. In C++, these are one array, and the header is skipped for mapping characters. In Java, two arrays work better.
-
numScripts
int numScripts
Data for scripts and reordering groups. Uses include building a reordering permutation table and providing script boundaries to AlphabeticIndex.
-
scriptsIndex
char[] scriptsIndex
The length of scriptsIndex is numScripts+16. It maps from a UScriptCode or a special reorder code to an entry in scriptStarts. 16 special reorder codes (not all used) are mapped starting at numScripts. Up to MAX_NUM_SPECIAL_REORDER_CODES are codes for special groups like space/punct/digit. There are special codes at the end for reorder-reserved primary ranges.Multiple scripts may share a range and index, for example Hira & Kana.
-
scriptStarts
char[] scriptStarts
Start primary weight (top 16 bits only) for a group/script/reserved range indexed by scriptsIndex. The first range (separators & terminators) and the last range (trailing weights) are not reorderable, and no scriptsIndex entry points to them.
-
rootElements
public long[] rootElements
Collation elements in the root collator. Used by the CollationRootElements class. The data structure is described there. null in a tailoring.
-
-
Constructor Detail
-
CollationData
CollationData(Normalizer2Impl nfc)
-
-
Method Detail
-
getCE32
public int getCE32(int c)
-
getCE32FromSupplementary
int getCE32FromSupplementary(int c)
-
isDigit
boolean isDigit(int c)
-
isUnsafeBackward
public boolean isUnsafeBackward(int c, boolean numeric)
-
isCompressibleLeadByte
public boolean isCompressibleLeadByte(int b)
-
isCompressiblePrimary
public boolean isCompressiblePrimary(long p)
-
getCE32FromContexts
int getCE32FromContexts(int index)
Returns the CE32 from two contexts words. Access to the defaultCE32 for contraction and prefix matching.
-
getIndirectCE32
int getIndirectCE32(int ce32)
Returns the CE32 for an indirect special CE32 (e.g., with DIGIT_TAG). Requires that ce32 is special.
-
getFinalCE32
int getFinalCE32(int ce32)
Returns the CE32 for an indirect special CE32 (e.g., with DIGIT_TAG), if ce32 is special.
-
getCEFromOffsetCE32
long getCEFromOffsetCE32(int c, int ce32)
Computes a CE from c's ce32 which has the OFFSET_TAG.
-
getSingleCE
long getSingleCE(int c)
Returns the single CE that c maps to. Throws UnsupportedOperationException if c does not map to a single CE.
-
getFCD16
int getFCD16(int c)
Returns the FCD16 value for code point c. c must be >= 0.
-
getFirstPrimaryForGroup
long getFirstPrimaryForGroup(int script)
Returns the first primary for the script's reordering group.- Returns:
- the primary with only the first primary lead byte of the group (not necessarily an actual root collator primary weight), or 0 if the script is unknown
-
getLastPrimaryForGroup
public long getLastPrimaryForGroup(int script)
Returns the last primary for the script's reordering group.- Returns:
- the last primary of the group (not an actual root collator primary weight), or 0 if the script is unknown
-
getGroupForPrimary
public int getGroupForPrimary(long p)
Finds the reordering group which contains the primary weight.- Returns:
- the first script of the group, or -1 if the weight is beyond the last group
-
getScriptIndex
private int getScriptIndex(int script)
-
getEquivalentScripts
public int[] getEquivalentScripts(int script)
-
makeReorderRanges
void makeReorderRanges(int[] reorder, UVector32 ranges)
Writes the permutation of primary-weight ranges for the given reordering of scripts and groups. The caller checks for illegal arguments and takes care of [DEFAULT] and memory allocation.Each list element will be a (limit, offset) pair as described for the CollationSettings.reorderRanges. The list will be empty if no ranges are reordered.
-
makeReorderRanges
private void makeReorderRanges(int[] reorder, boolean latinMustMove, UVector32 ranges)
-
addLowScriptRange
private int addLowScriptRange(short[] table, int index, int lowStart)
-
addHighScriptRange
private int addHighScriptRange(short[] table, int index, int highLimit)
-
scriptCodeString
private static java.lang.String scriptCodeString(int script)
-
-