Package com.ibm.icu.impl.coll
Class CollationData
java.lang.Object
com.ibm.icu.impl.coll.CollationData
Collation data container.
Immutable data created by a CollationDataBuilder, or loaded from a file,
or deserialized from API-provided binary data.
Includes data for the collation base (root/default), aliased if this is not the base.
-
Field Summary
FieldsModifier and TypeFieldDescriptionBase collation data, or null if this data itself is a base.(package private) int[]
Array of CE32 values.(package private) long[]
Array of CE values for expansions and OFFSET_TAG.boolean[]
256 flags for which primary-weight lead bytes are compressible.(package private) String
Array of prefix and contraction-suffix matching data.private static final int[]
char[]
Fast Latin table for common-Latin-text string comparisons.(package private) char[]
Header portion of the fastLatinTable.(package private) static final int
(package private) int[]
Simple array of JAMO_CE32S_LENGTH=19+21+27 CE32s, one per canonical Jamo L/V/T.(package private) static final int
(package private) long
The single-byte primary weight (xx000000) for numeric collation.(package private) int
Data for scripts and reordering groups.(package private) static final int
(package private) static final int
long[]
Collation elements in the root collator.(package private) char[]
The length of scriptsIndex is numScripts+16.(package private) char[]
Start primary weight (top 16 bits only) for a group/script/reserved range indexed by scriptsIndex.(package private) Trie2_32
Main lookup trie.(package private) UnicodeSet
Set of code points that are unsafe for starting string comparison after an identical prefix, or in backwards CE iteration. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate int
addHighScriptRange
(short[] table, int index, int highLimit) private int
addLowScriptRange
(short[] table, int index, int lowStart) int
getCE32
(int c) (package private) int
getCE32FromContexts
(int index) Returns the CE32 from two contexts words.(package private) int
getCE32FromSupplementary
(int c) (package private) long
getCEFromOffsetCE32
(int c, int ce32) Computes a CE from c's ce32 which has the OFFSET_TAG.int[]
getEquivalentScripts
(int script) (package private) int
getFCD16
(int c) Returns the FCD16 value for code point c.(package private) int
getFinalCE32
(int ce32) Returns the CE32 for an indirect special CE32 (e.g., with DIGIT_TAG), if ce32 is special.(package private) long
getFirstPrimaryForGroup
(int script) Returns the first primary for the script's reordering group.int
getGroupForPrimary
(long p) Finds the reordering group which contains the primary weight.(package private) int
getIndirectCE32
(int ce32) Returns the CE32 for an indirect special CE32 (e.g., with DIGIT_TAG).long
getLastPrimaryForGroup
(int script) Returns the last primary for the script's reordering group.private int
getScriptIndex
(int script) (package private) long
getSingleCE
(int c) Returns the single CE that c maps to.boolean
isCompressibleLeadByte
(int b) boolean
isCompressiblePrimary
(long p) (package private) boolean
isDigit
(int c) boolean
isUnsafeBackward
(int c, boolean numeric) private void
makeReorderRanges
(int[] reorder, boolean latinMustMove, UVector32 ranges) (package private) void
makeReorderRanges
(int[] reorder, UVector32 ranges) Writes the permutation of primary-weight ranges for the given reordering of scripts and groups.private static String
scriptCodeString
(int script)
-
Field Details
-
REORDER_RESERVED_BEFORE_LATIN
static final int REORDER_RESERVED_BEFORE_LATIN- See Also:
-
REORDER_RESERVED_AFTER_LATIN
static final int REORDER_RESERVED_AFTER_LATIN- See Also:
-
MAX_NUM_SPECIAL_REORDER_CODES
static final int MAX_NUM_SPECIAL_REORDER_CODES- See Also:
-
EMPTY_INT_ARRAY
private static final int[] EMPTY_INT_ARRAY -
JAMO_CE32S_LENGTH
static final int JAMO_CE32S_LENGTH- See Also:
-
trie
Trie2_32 trieMain lookup trie. -
ce32s
int[] ce32sArray of CE32 values. At index 0 there must be CE32(U+0000) to support U+0000's special-tag for NUL-termination handling. -
ces
long[] cesArray of CE values for expansions and OFFSET_TAG. -
contexts
String contextsArray of prefix and contraction-suffix matching data. -
base
Base collation data, or null if this data itself is a base. -
jamoCE32s
int[] jamoCE32sSimple array of JAMO_CE32S_LENGTH=19+21+27 CE32s, one per canonical Jamo L/V/T. They are normally simple CE32s, rarely expansions. For fast handling of HANGUL_TAG. -
nfcImpl
-
numericPrimary
long numericPrimaryThe single-byte primary weight (xx000000) for numeric collation. -
compressibleBytes
public boolean[] compressibleBytes256 flags for which primary-weight lead bytes are compressible. -
unsafeBackwardSet
UnicodeSet unsafeBackwardSetSet of code points that are unsafe for starting string comparison after an identical prefix, or in backwards CE iteration. -
fastLatinTable
public char[] fastLatinTableFast Latin table for common-Latin-text string comparisons. Data structure see class CollationFastLatin. -
fastLatinTableHeader
char[] fastLatinTableHeaderHeader portion of the fastLatinTable. In C++, these are one array, and the header is skipped for mapping characters. In Java, two arrays work better. -
numScripts
int numScriptsData for scripts and reordering groups. Uses include building a reordering permutation table and providing script boundaries to AlphabeticIndex. -
scriptsIndex
char[] scriptsIndexThe length of scriptsIndex is numScripts+16. It maps from a UScriptCode or a special reorder code to an entry in scriptStarts. 16 special reorder codes (not all used) are mapped starting at numScripts. Up to MAX_NUM_SPECIAL_REORDER_CODES are codes for special groups like space/punct/digit. There are special codes at the end for reorder-reserved primary ranges.Multiple scripts may share a range and index, for example Hira invalid input: '&' Kana.
-
scriptStarts
char[] scriptStartsStart primary weight (top 16 bits only) for a group/script/reserved range indexed by scriptsIndex. The first range (separators invalid input: '&' terminators) and the last range (trailing weights) are not reorderable, and no scriptsIndex entry points to them. -
rootElements
public long[] rootElementsCollation elements in the root collator. Used by the CollationRootElements class. The data structure is described there. null in a tailoring.
-
-
Constructor Details
-
CollationData
CollationData(Normalizer2Impl nfc)
-
-
Method Details
-
getCE32
public int getCE32(int c) -
getCE32FromSupplementary
int getCE32FromSupplementary(int c) -
isDigit
boolean isDigit(int c) -
isUnsafeBackward
public boolean isUnsafeBackward(int c, boolean numeric) -
isCompressibleLeadByte
public boolean isCompressibleLeadByte(int b) -
isCompressiblePrimary
public boolean isCompressiblePrimary(long p) -
getCE32FromContexts
int getCE32FromContexts(int index) Returns the CE32 from two contexts words. Access to the defaultCE32 for contraction and prefix matching. -
getIndirectCE32
int getIndirectCE32(int ce32) Returns the CE32 for an indirect special CE32 (e.g., with DIGIT_TAG). Requires that ce32 is special. -
getFinalCE32
int getFinalCE32(int ce32) Returns the CE32 for an indirect special CE32 (e.g., with DIGIT_TAG), if ce32 is special. -
getCEFromOffsetCE32
long getCEFromOffsetCE32(int c, int ce32) Computes a CE from c's ce32 which has the OFFSET_TAG. -
getSingleCE
long getSingleCE(int c) Returns the single CE that c maps to. Throws UnsupportedOperationException if c does not map to a single CE. -
getFCD16
int getFCD16(int c) Returns the FCD16 value for code point c. c must be >= 0. -
getFirstPrimaryForGroup
long getFirstPrimaryForGroup(int script) Returns the first primary for the script's reordering group.- Returns:
- the primary with only the first primary lead byte of the group (not necessarily an actual root collator primary weight), or 0 if the script is unknown
-
getLastPrimaryForGroup
public long getLastPrimaryForGroup(int script) Returns the last primary for the script's reordering group.- Returns:
- the last primary of the group (not an actual root collator primary weight), or 0 if the script is unknown
-
getGroupForPrimary
public int getGroupForPrimary(long p) Finds the reordering group which contains the primary weight.- Returns:
- the first script of the group, or -1 if the weight is beyond the last group
-
getScriptIndex
private int getScriptIndex(int script) -
getEquivalentScripts
public int[] getEquivalentScripts(int script) -
makeReorderRanges
Writes the permutation of primary-weight ranges for the given reordering of scripts and groups. The caller checks for illegal arguments and takes care of [DEFAULT] and memory allocation.Each list element will be a (limit, offset) pair as described for the CollationSettings.reorderRanges. The list will be empty if no ranges are reordered.
-
makeReorderRanges
-
addLowScriptRange
private int addLowScriptRange(short[] table, int index, int lowStart) -
addHighScriptRange
private int addHighScriptRange(short[] table, int index, int highLimit) -
scriptCodeString
-