Class UCharacterProperty


  • public final class UCharacterProperty
    extends java.lang.Object

    Internal class used for Unicode character property database.

    This classes store binary data read from uprops.icu. It does not have the capability to parse the data into more high-level information. It only returns bytes of information when required.

    Due to the form most commonly used for retrieval, array of char is used to store the binary data.

    UCharacterPropertyDB also contains information on accessing indexes to significant points in the binary data.

    Responsibility for molding the binary data into more meaning form lies on UCharacter.

    Since:
    release 2.1, february 1st 2002
    • Field Detail

      • m_trie_

        public Trie2_16 m_trie_
        Trie data
      • m_unicodeVersion_

        public VersionInfo m_unicodeVersion_
        Unicode version
      • LATIN_CAPITAL_LETTER_I_WITH_DOT_ABOVE_

        public static final char LATIN_CAPITAL_LETTER_I_WITH_DOT_ABOVE_
        Latin capital letter i with dot above
        See Also:
        Constant Field Values
      • LATIN_SMALL_LETTER_DOTLESS_I_

        public static final char LATIN_SMALL_LETTER_DOTLESS_I_
        Latin small letter i with dot above
        See Also:
        Constant Field Values
      • LATIN_SMALL_LETTER_I_

        public static final char LATIN_SMALL_LETTER_I_
        Latin lowercase i
        See Also:
        Constant Field Values
      • SRC_NONE

        public static final int SRC_NONE
        No source, not a supported property.
        See Also:
        Constant Field Values
      • SRC_CHAR

        public static final int SRC_CHAR
        From uchar.c/uprops.icu main trie
        See Also:
        Constant Field Values
      • SRC_PROPSVEC

        public static final int SRC_PROPSVEC
        From uchar.c/uprops.icu properties vectors trie
        See Also:
        Constant Field Values
      • SRC_BIDI

        public static final int SRC_BIDI
        From ubidi_props.c/ubidi.icu
        See Also:
        Constant Field Values
      • SRC_CHAR_AND_PROPSVEC

        public static final int SRC_CHAR_AND_PROPSVEC
        From uchar.c/uprops.icu main trie as well as properties vectors trie
        See Also:
        Constant Field Values
      • SRC_CASE_AND_NORM

        public static final int SRC_CASE_AND_NORM
        From ucase.c/ucase.icu as well as unorm.cpp/unorm.icu
        See Also:
        Constant Field Values
      • SRC_NFC

        public static final int SRC_NFC
        From normalizer2impl.cpp/nfc.nrm
        See Also:
        Constant Field Values
      • SRC_NFKC

        public static final int SRC_NFKC
        From normalizer2impl.cpp/nfkc.nrm
        See Also:
        Constant Field Values
      • SRC_NFKC_CF

        public static final int SRC_NFKC_CF
        From normalizer2impl.cpp/nfkc_cf.nrm
        See Also:
        Constant Field Values
      • SRC_NFC_CANON_ITER

        public static final int SRC_NFC_CANON_ITER
        From normalizer2impl.cpp/nfc.nrm canonical iterator data
        See Also:
        Constant Field Values
      • SRC_COUNT

        public static final int SRC_COUNT
        One more than the highest UPropertySource (SRC_) constant.
        See Also:
        Constant Field Values
      • GC_CN_MASK

        private static final int GC_CN_MASK
      • GC_CC_MASK

        private static final int GC_CC_MASK
      • GC_CS_MASK

        private static final int GC_CS_MASK
      • GC_ZS_MASK

        private static final int GC_ZS_MASK
      • GC_ZL_MASK

        private static final int GC_ZL_MASK
      • GC_ZP_MASK

        private static final int GC_ZP_MASK
      • GC_Z_MASK

        private static final int GC_Z_MASK
        Mask constant for multiple UCharCategory bits (Z Separators).
      • ID_COMPAT_MATH_CONTINUE

        private static final int[] ID_COMPAT_MATH_CONTINUE
        Ranges (start/limit pairs) of ID_Compat_Math_Continue (only), from UCD PropList.txt.
      • ID_COMPAT_MATH_START

        private static final int[] ID_COMPAT_MATH_START
        ID_Compat_Math_Start characters, from UCD PropList.txt.
      • gcbToHst

        private static final int[] gcbToHst
      • m_additionalTrie_

        Trie2_16 m_additionalTrie_
        Extra property trie
      • m_additionalVectors_

        int[] m_additionalVectors_
        Extra property vectors, 1st column for age and second for binary properties.
      • m_additionalColumnsCount_

        int m_additionalColumnsCount_
        Number of additional columns
      • m_maxBlockScriptValue_

        int m_maxBlockScriptValue_
        Maximum values for block, bits used as in vector word 0
      • m_maxJTGValue_

        int m_maxJTGValue_
        Maximum values for script, bits used as in vector word 0
      • m_scriptExtensions_

        public char[] m_scriptExtensions_
        Script_Extensions data
      • DATA_FILE_NAME_

        private static final java.lang.String DATA_FILE_NAME_
        Default name of the datafile
        See Also:
        Constant Field Values
      • NUMERIC_TYPE_VALUE_SHIFT_

        private static final int NUMERIC_TYPE_VALUE_SHIFT_
        Numeric types and values in the main properties words.
        See Also:
        Constant Field Values
      • NTV_DECIMAL_START_

        private static final int NTV_DECIMAL_START_
        Decimal digits: nv=0..9
        See Also:
        Constant Field Values
      • NTV_DIGIT_START_

        private static final int NTV_DIGIT_START_
        Other digits: nv=0..9
        See Also:
        Constant Field Values
      • NTV_NUMERIC_START_

        private static final int NTV_NUMERIC_START_
        Small integers: nv=0..154
        See Also:
        Constant Field Values
      • NTV_FRACTION_START_

        private static final int NTV_FRACTION_START_
        Fractions: ((ntv>>4)-12) / ((ntv&0xf)+1) = -1..17 / 1..16
        See Also:
        Constant Field Values
      • NTV_LARGE_START_

        private static final int NTV_LARGE_START_
        Large integers: ((ntv>>5)-14) * 10^((ntv&0x1f)+2) = (1..9)*(10^2..10^33) (only one significant decimal digit)
        See Also:
        Constant Field Values
      • NTV_BASE60_START_

        private static final int NTV_BASE60_START_
        Sexagesimal numbers: ((ntv>>2)-0xbf) * 60^((ntv&3)+1) = (1..9)*(60^1..60^4)
        See Also:
        Constant Field Values
      • NTV_FRACTION20_START_

        private static final int NTV_FRACTION20_START_
        Fraction-20 values: frac20 = ntv-0x324 = 0..0x17 -> 1|3|5|7 / 20|40|80|160|320|640 numerator: num = 2*(frac20&3)+1 denominator: den = 20<<(frac20>>2)
        See Also:
        Constant Field Values
      • NTV_FRACTION32_START_

        private static final int NTV_FRACTION32_START_
        Fraction-32 values: frac32 = ntv-0x34c = 0..15 -> 1|3|5|7 / 32|64|128|256 numerator: num = 2*(frac32&3)+1 denominator: den = 32<<(frac32>>2)
        See Also:
        Constant Field Values
      • NTV_RESERVED_START_

        private static final int NTV_RESERVED_START_
        No numeric value (yet).
        See Also:
        Constant Field Values
      • SCRIPT_X_MASK

        public static final int SCRIPT_X_MASK
        Script_Extensions: mask includes Script
        See Also:
        Constant Field Values
      • EAST_ASIAN_MASK_

        private static final int EAST_ASIAN_MASK_
        Integer properties mask and shift values for East Asian cell width. Equivalent to icu4c UPROPS_EA_MASK
        See Also:
        Constant Field Values
      • EAST_ASIAN_SHIFT_

        private static final int EAST_ASIAN_SHIFT_
        Integer properties mask and shift values for East Asian cell width. Equivalent to icu4c UPROPS_EA_SHIFT
        See Also:
        Constant Field Values
      • BLOCK_MASK_

        private static final int BLOCK_MASK_
        Integer properties mask and shift values for blocks. Equivalent to icu4c UPROPS_BLOCK_MASK
        See Also:
        Constant Field Values
      • BLOCK_SHIFT_

        private static final int BLOCK_SHIFT_
        Integer properties mask and shift values for blocks. Equivalent to icu4c UPROPS_BLOCK_SHIFT
        See Also:
        Constant Field Values
      • SCRIPT_LOW_MASK

        public static final int SCRIPT_LOW_MASK
        Integer properties mask and shift values for scripts. Equivalent to icu4c UPROPS_SHIFT_LOW_MASK.
        See Also:
        Constant Field Values
      • SCRIPT_X_WITH_INHERITED

        public static final int SCRIPT_X_WITH_INHERITED
        See Also:
        Constant Field Values
      • WHITE_SPACE_PROPERTY_

        private static final int WHITE_SPACE_PROPERTY_
        Additional properties used in internal trie data
        See Also:
        Constant Field Values
      • QUOTATION_MARK_PROPERTY_

        private static final int QUOTATION_MARK_PROPERTY_
        See Also:
        Constant Field Values
      • TERMINAL_PUNCTUATION_PROPERTY_

        private static final int TERMINAL_PUNCTUATION_PROPERTY_
        See Also:
        Constant Field Values
      • ASCII_HEX_DIGIT_PROPERTY_

        private static final int ASCII_HEX_DIGIT_PROPERTY_
        See Also:
        Constant Field Values
      • NONCHARACTER_CODE_POINT_PROPERTY_

        private static final int NONCHARACTER_CODE_POINT_PROPERTY_
        See Also:
        Constant Field Values
      • GRAPHEME_EXTEND_PROPERTY_

        private static final int GRAPHEME_EXTEND_PROPERTY_
        See Also:
        Constant Field Values
      • GRAPHEME_LINK_PROPERTY_

        private static final int GRAPHEME_LINK_PROPERTY_
        See Also:
        Constant Field Values
      • IDS_BINARY_OPERATOR_PROPERTY_

        private static final int IDS_BINARY_OPERATOR_PROPERTY_
        See Also:
        Constant Field Values
      • IDS_TRINARY_OPERATOR_PROPERTY_

        private static final int IDS_TRINARY_OPERATOR_PROPERTY_
        See Also:
        Constant Field Values
      • UNIFIED_IDEOGRAPH_PROPERTY_

        private static final int UNIFIED_IDEOGRAPH_PROPERTY_
        See Also:
        Constant Field Values
      • DEFAULT_IGNORABLE_CODE_POINT_PROPERTY_

        private static final int DEFAULT_IGNORABLE_CODE_POINT_PROPERTY_
        See Also:
        Constant Field Values
      • LOGICAL_ORDER_EXCEPTION_PROPERTY_

        private static final int LOGICAL_ORDER_EXCEPTION_PROPERTY_
        See Also:
        Constant Field Values
      • XID_CONTINUE_PROPERTY_

        private static final int XID_CONTINUE_PROPERTY_
        See Also:
        Constant Field Values
      • GRAPHEME_BASE_PROPERTY_

        private static final int GRAPHEME_BASE_PROPERTY_
        See Also:
        Constant Field Values
      • VARIATION_SELECTOR_PROPERTY_

        private static final int VARIATION_SELECTOR_PROPERTY_
        See Also:
        Constant Field Values
      • PREPENDED_CONCATENATION_MARK

        private static final int PREPENDED_CONCATENATION_MARK
        See Also:
        Constant Field Values
      • ID_TYPE_DEFAULT_IGNORABLE

        private static final int ID_TYPE_DEFAULT_IGNORABLE
        See Also:
        Constant Field Values
      • idTypeToEncoded

        private static final int[] idTypeToEncoded
        Maps UIdentifierType to encoded bits. When UPROPS_ID_TYPE_BIT is set, then use "&" to test whether the value bit is set. When UPROPS_ID_TYPE_BIT is not set, then compare ("==") the array value with the data value.
      • DECOMPOSITION_TYPE_MASK_

        private static final int DECOMPOSITION_TYPE_MASK_
        Integer properties mask for decomposition type. Equivalent to icu4c UPROPS_DT_MASK.
        See Also:
        Constant Field Values
      • FIRST_NIBBLE_SHIFT_

        private static final int FIRST_NIBBLE_SHIFT_
        First nibble shift
        See Also:
        Constant Field Values
      • LAST_NIBBLE_MASK_

        private static final int LAST_NIBBLE_MASK_
        Second nibble mask
        See Also:
        Constant Field Values
    • Constructor Detail

      • UCharacterProperty

        private UCharacterProperty()
                            throws java.io.IOException
        Constructor
        Throws:
        java.io.IOException - thrown when data reading fails or data corrupted
    • Method Detail

      • getProperty

        public final int getProperty​(int ch)
        Gets the main property value for code point ch.
        Parameters:
        ch - code point whose property value is to be retrieved
        Returns:
        property value of code point
      • getAdditional

        public int getAdditional​(int codepoint,
                                 int column)
        Gets the unicode additional properties. Java version of C u_getUnicodeProperties().
        Parameters:
        codepoint - codepoint whose additional properties is to be retrieved
        column - The column index.
        Returns:
        unicode properties
      • getAge

        public VersionInfo getAge​(int codepoint)

        Get the "age" of the code point.

        The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.

        This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.

        The data is from the UCD file DerivedAge.txt.

        This API does not check the validity of the codepoint.

        Parameters:
        codepoint - The code point.
        Returns:
        the Unicode version number
      • isgraphPOSIX

        private static final boolean isgraphPOSIX​(int c)
        Checks if c is in [^\p{space}\p{gc=Control}\p{gc=Surrogate}\p{gc=Unassigned}] with space=\p{Whitespace} and Control=Cc. Implements UCHAR_POSIX_GRAPH.
      • hasBinaryProperty

        public boolean hasBinaryProperty​(int c,
                                         int which)
      • getType

        public int getType​(int c)
      • getIntPropertyValue

        public int getIntPropertyValue​(int c,
                                       int which)
      • getIntPropertyMaxValue

        public int getIntPropertyMaxValue​(int which)
      • getSource

        final int getSource​(int which)
      • getMaxValues

        public int getMaxValues​(int column)
        Get the the maximum values for some enum/int properties.
        Returns:
        maximum values for the integer properties.
      • getMask

        public static final int getMask​(int type)
        Gets the type mask
        Parameters:
        type - character type
        Returns:
        mask
      • getEuropeanDigit

        public static int getEuropeanDigit​(int ch)
        Returns the digit values of characters like 'A' - 'Z', normal, half-width and full-width. This method assumes that the other digit characters are checked by the calling method.
        Parameters:
        ch - character to test
        Returns:
        -1 if ch is not a character of the form 'A' - 'Z', otherwise its corresponding digit will be returned.
      • digit

        public int digit​(int c)
      • getNumericValue

        public int getNumericValue​(int c)
      • getUnicodeNumericValue

        public double getUnicodeNumericValue​(int c)
      • getNumericTypeValue

        private static final int getNumericTypeValue​(int props)
      • ntvGetType

        private static final int ntvGetType​(int ntv)
      • mergeScriptCodeOrIndex

        public static final int mergeScriptCodeOrIndex​(int scriptX)
      • upropsvec_addPropertyStarts

        public void upropsvec_addPropertyStarts​(UnicodeSet set)
      • ulayout_addPropertyStarts

        static UnicodeSet ulayout_addPropertyStarts​(int src,
                                                    UnicodeSet set)
      • mathCompat_addPropertyStarts

        static void mathCompat_addPropertyStarts​(UnicodeSet set)
      • hasIDType

        public boolean hasIDType​(int c,
                                 int typeIndex)