CollationElementIterator
is an iterator created by
a RuleBasedCollator to walk through a string. The return result of
each iteration is a 32-bit collation element that defines the
ordering priority of the next character or sequence of characters
in the source string.
For illustration, consider the following in Spanish:
"ca" -> the first collation element is collation_element('c') and second
collation element is collation_element('a').
Since "ch" in Spanish sorts as one entity, the below example returns one
collation element for the two characters 'c' and 'h'
"cha" -> the first collation element is collation_element('ch') and second
collation element is collation_element('a').
And in German,
Since the character 'æ' is a composed character of 'a' and 'e', the
iterator returns two collation elements for the single character 'æ'
"æb" -> the first collation element is collation_element('a'), the
second collation element is collation_element('e'), and the
third collation element is collation_element('b').
For collation ordering comparison, the collation element results
can not be compared simply by using basic arithmetric operators,
e.g. <, == or >, further processing has to be done. Details
can be found in the ICU
user guide. An example of using the CollationElementIterator
for collation ordering comparison is the class
com.ibm.icu.text.StringSearch.
To construct a CollationElementIterator object, users
call the method getCollationElementIterator() on a
RuleBasedCollator that defines the desired sorting order.
Example:
String testString = "This is a test";
RuleBasedCollator rbc = new RuleBasedCollator("&a<b");
CollationElementIterator iterator = rbc.getCollationElementIterator(testString);
int primaryOrder = iterator.IGNORABLE;
while (primaryOrder != iterator.NULLORDER) {
int order = iterator.next();
if (order != iterator.IGNORABLE &&
order != iterator.NULLORDER) {
// order is valid, not ignorable and we have not passed the end
// of the iteration, we do something
primaryOrder = CollationElementIterator.primaryOrder(order);
System.out.println("Next primary order 0x" +
Integer.toHexString(primaryOrder));
}
}
This class is not subclassable
equals
public boolean equals(Object that)
Tests that argument object is equals to this CollationElementIterator.
Iterators are equal if the objects uses the same RuleBasedCollator,
the same source text and have the same current position in iteration.
that
- object to test if it is equals to this
CollationElementIterator
getMaxExpansion
public int getMaxExpansion(int ce)
Returns the maximum length of any expansion sequence that ends with
the specified collation element. If there is no expansion with this
collation element as the last element, returns 1.
ce
- a collation element returned by previous() or next().
- the maximum length of any expansion sequence ending
with the specified collation element.
getOffset
public int getOffset()
Returns the character offset in the source string
corresponding to the next collation element. I.e., getOffset()
returns the position in the source string corresponding to the
collation element that will be returned by the next call to
next(). This value could be any of:
- The index of the first character corresponding to
the next collation element. (This means that if
setOffset(offset)
sets the index in the middle of
a contraction, getOffset()
returns the index of
the first character in the contraction, which may not be equal
to the original offset that was set. Hence calling getOffset()
immediately after setOffset(offset) does not guarantee that the
original offset set will be returned.)
- If normalization is on, the index of the immediate
subsequent character, or composite character with the first
character, having a combining class of 0.
- The length of the source string, if iteration has reached
the end.
- The character offset in the source string corresponding to the
collation element that will be returned by the next call to
next().
next
public int next()
Get the next collation element in the source string.
This iterator iterates over a sequence of collation elements
that were built from the string. Because there isn't
necessarily a one-to-one mapping from characters to collation
elements, this doesn't mean the same thing as "return the
collation element [or ordering priority] of the next character
in the string".
This function returns the collation element that the
iterator is currently pointing to, and then updates the
internal pointer to point to the next element. Previous()
updates the pointer first, and then returns the element. This
means that when you change direction while iterating (i.e.,
call next() and then call previous(), or call previous() and
then call next()), you'll get back the same element twice.
- the next collation element or NULLORDER if the end of the
iteration has been reached.
previous
public int previous()
Get the previous collation element in the source string.
This iterator iterates over a sequence of collation elements
that were built from the string. Because there isn't
necessarily a one-to-one mapping from characters to collation
elements, this doesn't mean the same thing as "return the
collation element [or ordering priority] of the previous
character in the string".
This function updates the iterator's internal pointer to
point to the collation element preceding the one it's currently
pointing to and then returns that element, while next() returns
the current element and then updates the pointer. This means
that when you change direction while iterating (i.e., call
next() and then call previous(), or call previous() and then
call next()), you'll get back the same element twice.
- the previous collation element, or NULLORDER when the start of
the iteration has been reached.
primaryOrder
public static final int primaryOrder(int ce)
Return the primary order of the specified collation element,
i.e. the first 16 bits. This value is unsigned.
ce
- the collation element
- the element's 16 bits primary order.
reset
public void reset()
Resets the cursor to the beginning of the string. The next
call to next() or previous() will return the first and last
collation element in the string, respectively.
If the RuleBasedCollator used by this iterator has had its
attributes changed, calling reset() will reinitialize the
iterator to use the new attributes.
secondaryOrder
public static final int secondaryOrder(int ce)
Return the secondary order of the specified collation element,
i.e. the 16th to 23th bits, inclusive. This value is unsigned.
ce
- the collation element
- the element's 8 bits secondary order
setOffset
public void setOffset(int offset)
Sets the iterator to point to the collation element
corresponding to the character at the specified offset. The
value returned by the next call to next() will be the collation
element corresponding to the characters at offset.
If offset is in the middle of a contracting character
sequence, the iterator is adjusted to the start of the
contracting sequence. This means that getOffset() is not
guaranteed to return the same value set by this method.
If the decomposition mode is on, and offset is in the middle
of a decomposible range of source text, the iterator may not
return a correct result for the next forwards or backwards
iteration. The user must ensure that the offset is not in the
middle of a decomposible range.
offset
- the character offset into the original source string to
set. Note that this is not an offset into the corresponding
sequence of collation elements.
setText
public void setText(CharacterIterator source)
Set a new source string iterator for iteration, and reset the
offset to the beginning of the text.
source
- the new source string iterator for iteration.
setText
public void setText(String source)
Set a new source string for iteration, and reset the offset
to the beginning of the text.
source
- the new source string for iteration.
setText
public void setText(UCharacterIterator source)
Set a new source string iterator for iteration, and reset the
offset to the beginning of the text.
The source iterator's integrity will be preserved since a new copy
will be created for use.
source
- the new source string iterator for iteration.
tertiaryOrder
public static final int tertiaryOrder(int ce)
Return the tertiary order of the specified collation element, i.e. the last
8 bits. This value is unsigned.
ce
- the collation element
- the element's 8 bits tertiary order