Package com.ibm.icu.impl
Class BMPSet
- java.lang.Object
-
- com.ibm.icu.impl.BMPSet
-
public final class BMPSet extends java.lang.Object
Helper class for frozen UnicodeSets, implements contains() and span() optimized for BMP code points. Latin-1: Look up bytes. 2-byte characters: Bits organized vertically. 3-byte characters: Use zero/one/mixed data per 64-block in U+0000..U+FFFF, with mixed for illegal ranges. Supplementary characters: Binary search over the supplementary part of the parent set's inversion list.
-
-
Field Summary
Fields Modifier and Type Field Description private int[]
bmpBlockBits
One bit per 64 BMP code points.private boolean[]
latin1Contains
One boolean ('true' or 'false') per Latin-1 character.private int[]
list
The inversion list of the parent set, for the slower contains() implementation for mixed BMP blocks and for supplementary code points.private int[]
list4kStarts
Inversion list indexes for restricted binary searches in findCodePoint(), from findCodePoint(U+0800, U+1000, U+2000, .., U+F000, U+10000).private int
listLength
private int[]
table7FF
One bit per code point from U+0000..U+07FF.static int
U16_SURROGATE_OFFSET
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
contains(int c)
private boolean
containsSlow(int c, int lo, int hi)
private int
findCodePoint(int c, int lo, int hi)
Same as UnicodeSet.findCodePoint(int c) except that the binary search is restricted for finding code points in a certain range.private void
initBits()
private static void
set32x64Bits(int[] table, int start, int limit)
Set bits in a bit rectangle in "vertical" bit organization.int
span(java.lang.CharSequence s, int start, UnicodeSet.SpanCondition spanCondition, OutputInt outCount)
Span the initial substring for which each character c has spanCondition==contains(c).int
spanBack(java.lang.CharSequence s, int limit, UnicodeSet.SpanCondition spanCondition)
Symmetrical with span().
-
-
-
Field Detail
-
U16_SURROGATE_OFFSET
public static int U16_SURROGATE_OFFSET
-
latin1Contains
private boolean[] latin1Contains
One boolean ('true' or 'false') per Latin-1 character.
-
table7FF
private int[] table7FF
One bit per code point from U+0000..U+07FF. The bits are organized vertically; consecutive code points correspond to the same bit positions in consecutive table words. With code point parts lead=c{10..6} trail=c{5..0} it is set.contains(c)==(table7FF[trail] bit lead) Bits for 0..FF are unused (0).
-
bmpBlockBits
private int[] bmpBlockBits
One bit per 64 BMP code points. The bits are organized vertically; consecutive 64-code point blocks correspond to the same bit position in consecutive table words. With code point parts lead=c{15..12} t1=c{11..6} test bits (lead+16) and lead in bmpBlockBits[t1]. If the upper bit is 0, then the lower bit indicates if contains(c) for all code points in the 64-block. If the upper bit is 1, then the block is mixed and set.contains(c) must be called. Bits for 0..7FF are unused (0).
-
list4kStarts
private int[] list4kStarts
Inversion list indexes for restricted binary searches in findCodePoint(), from findCodePoint(U+0800, U+1000, U+2000, .., U+F000, U+10000). U+0800 is the first 3-byte-UTF-8 code point. Code points below U+0800 are always looked up in the bit tables. The last pair of indexes is for finding supplementary code points.
-
list
private final int[] list
The inversion list of the parent set, for the slower contains() implementation for mixed BMP blocks and for supplementary code points. The list is terminated with list[listLength-1]=0x110000.
-
listLength
private final int listLength
-
-
Constructor Detail
-
BMPSet
public BMPSet(int[] parentList, int parentListLength)
-
BMPSet
public BMPSet(BMPSet otherBMPSet, int[] newParentList, int newParentListLength)
-
-
Method Detail
-
contains
public boolean contains(int c)
-
span
public final int span(java.lang.CharSequence s, int start, UnicodeSet.SpanCondition spanCondition, OutputInt outCount)
Span the initial substring for which each character c has spanCondition==contains(c). It must be spanCondition==0 or 1.- Parameters:
start
- The start indexoutCount
- If not null: Receives the number of code points in the span.- Returns:
- the limit (exclusive end) of the span NOTE: to reduce the overhead of function call to contains(c), it is manually inlined here. Check for sufficient length for trail unit for each surrogate pair. Handle single surrogates as surrogate code points as usual in ICU.
-
spanBack
public final int spanBack(java.lang.CharSequence s, int limit, UnicodeSet.SpanCondition spanCondition)
Symmetrical with span(). Span the trailing substring for which each character c has spanCondition==contains(c). It must be s.length >= limit and spanCondition==0 or 1.- Returns:
- The string index which starts the span (i.e. inclusive).
-
set32x64Bits
private static void set32x64Bits(int[] table, int start, int limit)
Set bits in a bit rectangle in "vertical" bit organization. start-
initBits
private void initBits()
-
findCodePoint
private int findCodePoint(int c, int lo, int hi)
Same as UnicodeSet.findCodePoint(int c) except that the binary search is restricted for finding code points in a certain range. For restricting the search for finding in the range start..end, pass in lo=findCodePoint(start) and hi=findCodePoint(end) with 0<=lo<=hi- Parameters:
c
- a character in a subrange of MIN_VALUE..MAX_VALUElo
- The lowest index to be returned.hi
- The highest index to be returned.- Returns:
- the smallest integer i in the range lo..hi, inclusive, such that c < list[i]
-
containsSlow
private final boolean containsSlow(int c, int lo, int hi)
-
-
-