Class IntTrieBuilder


  • public class IntTrieBuilder
    extends TrieBuilder
    Builder class to manipulate and generate a trie. This is useful for ICU data in primitive types. Provides a compact way to store information that is indexed by Unicode values, such as character properties, types, keyboard values, etc. This is very useful when you have a block of Unicode data that contains significant values while the rest of the Unicode data is unused in the application or when you have a lot of redundance, such as where all 21,000 Han ideographs have the same value. However, lookup is much faster than a hash table. A trie of any primitive data type serves two purposes:
    • Fast access of the indexed values.
    • Smaller memory footprint.
    This is a direct port from the ICU4C version
    • Field Detail

      • m_data_

        protected int[] m_data_
      • m_initialValue_

        protected int m_initialValue_
      • m_leadUnitValue_

        private int m_leadUnitValue_
    • Constructor Detail

      • IntTrieBuilder

        public IntTrieBuilder​(IntTrieBuilder table)
        Copy constructor
      • IntTrieBuilder

        public IntTrieBuilder​(int[] aliasdata,
                              int maxdatalength,
                              int initialvalue,
                              int leadunitvalue,
                              boolean latin1linear)
        Constructs a build table
        Parameters:
        aliasdata - data to be filled into table
        maxdatalength - maximum data length allowed in table
        initialvalue - initial data value
        latin1linear - is latin 1 to be linear
    • Method Detail

      • getValue

        public int getValue​(int ch)
        Gets a 32 bit data from the table data
        Parameters:
        ch - codepoint which data is to be retrieved
        Returns:
        the 32 bit data
      • getValue

        public int getValue​(int ch,
                            boolean[] inBlockZero)
        Get a 32 bit data from the table data
        Parameters:
        ch - code point for which data is to be retrieved.
        inBlockZero - Output parameter, inBlockZero[0] returns true if the char maps into block zero, otherwise false.
        Returns:
        the 32 bit data value.
      • setValue

        public boolean setValue​(int ch,
                                int value)
        Sets a 32 bit data in the table data
        Parameters:
        ch - codepoint which data is to be set
        value - to set
        Returns:
        true if the set is successful, otherwise if the table has been compacted return false
      • serialize

        public IntTrie serialize​(TrieBuilder.DataManipulate datamanipulate,
                                 Trie.DataManipulate triedatamanipulate)
        Serializes the build table with 32 bit data
        Parameters:
        datamanipulate - builder raw fold method implementation
        triedatamanipulate - result trie fold method
        Returns:
        a new trie
      • serialize

        public int serialize​(java.io.OutputStream os,
                             boolean reduceTo16Bits,
                             TrieBuilder.DataManipulate datamanipulate)
                      throws java.io.IOException
        Serializes the build table to an output stream. Compacts the build-time trie after all values are set, and then writes the serialized form onto an output stream. After this, this build-time Trie can only be serialized again and/or closed; no further values can be added. This function is the rough equivalent of utrie_seriaize() in ICU4C.
        Parameters:
        os - the output stream to which the seriaized trie will be written. If nul, the function still returns the size of the serialized Trie.
        reduceTo16Bits - If true, reduce the data size to 16 bits. The resulting serialized form can then be used to create a CharTrie.
        datamanipulate - builder raw fold method implementation
        Returns:
        the number of bytes written to the output stream.
        Throws:
        java.io.IOException
      • setRange

        public boolean setRange​(int start,
                                int limit,
                                int value,
                                boolean overwrite)
        Set a value in a range of code points [start..limit]. All code points c with start <= c < limit will get the value if overwrite is true or if the old value is 0.
        Parameters:
        start - the first code point to get the value
        limit - one past the last code point to get the value
        value - the value
        overwrite - flag for whether old non-initial values are to be overwritten
        Returns:
        false if a failure occurred (illegal argument or data array overrun)
      • allocDataBlock

        private int allocDataBlock()
      • getDataBlock

        private int getDataBlock​(int ch)
        No error checking for illegal arguments.
        Parameters:
        ch - codepoint to look for
        Returns:
        -1 if no new data block available (out of memory in data array)
      • compact

        private void compact​(boolean overlap)
        Compact a folded build-time trie. The compaction - removes blocks that are identical with earlier ones - overlaps adjacent blocks as much as possible (if overlap == true) - moves blocks in steps of the data granularity - moves and overlaps blocks that overlap with multiple values in the overlap region It does not - try to move and overlap blocks that are not already adjacent
        Parameters:
        overlap - flag
      • findSameDataBlock

        private static final int findSameDataBlock​(int[] data,
                                                   int dataLength,
                                                   int otherBlock,
                                                   int step)
        Find the same data block
        Parameters:
        data - array
        dataLength -
        otherBlock -
        step -
      • fold

        private final void fold​(TrieBuilder.DataManipulate manipulate)
        Fold the normalization data for supplementary code points into a compact area on top of the BMP-part of the trie index, with the lead surrogates indexing this compact area. Duplicate the index values for lead surrogates: From inside the BMP area, where some may be overridden with folded values, to just after the BMP area, where they can be retrieved for code point lookups.
        Parameters:
        manipulate - fold implementation
      • fillBlock

        private void fillBlock​(int block,
                               int start,
                               int limit,
                               int value,
                               boolean overwrite)