com.ibm.icu.text

Class Collator

Implemented Interfaces:
Cloneable, Comparator
Known Direct Subclasses:
RuleBasedCollator

public abstract class Collator
extends Object
implements Comparator, Cloneable

Collator performs locale-sensitive string comparison. A concrete subclass, RuleBasedCollator, allows customization of the collation ordering by the use of rule sets.

Following the Unicode Consortium's specifications for the Unicode Collation Algorithm (UCA), there are 5 different levels of strength used in comparisons:

Unlike the JDK, ICU4J's Collator deals only with 2 decomposition modes, the canonical decomposition mode and one that does not use any decomposition. The compatibility decomposition mode, java.text.Collator.FULL_DECOMPOSITION is not supported here. If the canonical decomposition mode is set, the Collator handles un-normalized text properly, producing the same results as if the text were normalized in NFD. If canonical decomposition is turned off, it is the user's responsibility to ensure that all text is already in the appropriate form before performing a comparison or before getting a CollationKey.

For more information about the collation service see the users guide.

Examples of use

 // Get the Collator for US English and set its strength to PRIMARY
 Collator usCollator = Collator.getInstance(Locale.US);
 usCollator.setStrength(Collator.PRIMARY);
 if (usCollator.compare("abc", "ABC") == 0) {
     System.out.println("Strings are equivalent");
 }

 The following example shows how to compare two strings using the
 Collator for the default locale.

 // Compare two strings in the default locale
 Collator myCollator = Collator.getInstance();
 myCollator.setDecomposition(NO_DECOMPOSITION);
 if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
     System.out.println("à\u0325 is not equals to a\u0325̀ without decomposition");
     myCollator.setDecomposition(CANONICAL_DECOMPOSITION);
     if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
         System.out.println("Error: à\u0325 should be equals to a\u0325̀ with decomposition");
     }
     else {
         System.out.println("à\u0325 is equals to a\u0325̀ with decomposition");
     }
 }
 else {
     System.out.println("Error: à\u0325 should be not equals to a\u0325̀ without decomposition");
 }
 
Author:
Syn Wee Quek
See Also:
RuleBasedCollator, CollationKey

Nested Class Summary

static class
Collator.CollatorFactory
A factory used with registerFactory to register multiple collators and provide display names for them.

Field Summary

static int
CANONICAL_DECOMPOSITION
Decomposition mode value.
static int
FULL_DECOMPOSITION
This is for backwards compatibility with Java APIs only.
static int
IDENTICAL
Smallest Collator strength value.
static int
NO_DECOMPOSITION
Decomposition mode value.
static int
PRIMARY
Strongest collator strength value.
static int
QUATERNARY
Fourth level collator strength value.
static int
SECONDARY
Second level collator strength value.
static int
TERTIARY
Third level collator strength value.

Constructor Summary

Collator()
Empty default constructor to make javadocs happy

Method Summary

Object
clone()
Clone the collator.
int
compare(Object source, Object target)
Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode.
abstract int
compare(String source, String target)
Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode.
boolean
equals(String source, String target)
Convenience method for comparing the equality of two text Strings using this Collator's rules, strength and decomposition mode.
static Locale[]
getAvailableLocales()
Get the set of locales, as Locale objects, for which collators are installed.
static ULocale[]
getAvailableULocales()
Get the set of locales, as ULocale objects, for which collators are installed.
abstract CollationKey
getCollationKey(String source)
Transforms the String into a CollationKey suitable for efficient repeated comparison.
int
getDecomposition()
Get the decomposition mode of this Collator.
static String
getDisplayName(Locale objectLocale)
Get the name of the collator for the objectLocale, localized for the current locale.
static String
getDisplayName(Locale objectLocale, Locale displayLocale)
Get the name of the collator for the objectLocale, localized for the displayLocale.
static String
getDisplayName(ULocale objectLocale)
Get the name of the collator for the objectLocale, localized for the current locale.
static String
getDisplayName(ULocale objectLocale, ULocale displayLocale)
Get the name of the collator for the objectLocale, localized for the displayLocale.
static ULocale
getFunctionalEquivalent(String keyword, ULocale locID)
Return the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.
static ULocale
getFunctionalEquivalent(String keyword, ULocale locID, isAvailable[] )
Return the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.
static Collator
getInstance()
Gets the Collator for the current default locale.
static Collator
getInstance(Locale locale)
Gets the Collator for the desired locale.
static Collator
getInstance(ULocale locale)
Gets the Collator for the desired locale.
static String[]
getKeywordValues(String keyword)
Given a keyword, return an array of all values for that keyword that are currently in use.
static String[]
getKeywords()
Return an array of all possible keywords that are relevant to collation.
ULocale
getLocale(ULocale.Type type)
Return the locale that was used to create this object, or null.
abstract RawCollationKey
getRawCollationKey(String source, RawCollationKey key)
Gets the simpler form of a CollationKey for the String source following the rules of this Collator and stores the result into the user provided argument key.
int
getStrength()
Returns this Collator's strength property.
UnicodeSet
getTailoredSet()
Get an UnicodeSet that contains all the characters and sequences tailored in this collator.
abstract VersionInfo
getUCAVersion()
Get the UCA version of this collator object.
abstract int
getVariableTop()
Gets the variable top value of a Collator.
abstract VersionInfo
getVersion()
Get the version of this collator object.
static Object
registerFactory(Collator.CollatorFactory factory)
Register a collator factory.
static Object
registerInstance(Collator collator, ULocale locale)
Register a collator as the default collator for the provided locale.
void
setDecomposition(int decomposition)
Set the decomposition mode of this Collator.
void
setStrength(int newStrength)
Sets this Collator's strength property.
abstract int
setVariableTop(String varTop)
Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED.
abstract void
setVariableTop(int varTop)
Sets the variable top to a collation element value supplied.
static boolean
unregister(Object registryKey)
Unregister a collator previously registered using registerInstance.

Field Details

CANONICAL_DECOMPOSITION

public static final int CANONICAL_DECOMPOSITION
Field Value:
17

FULL_DECOMPOSITION

public static final int FULL_DECOMPOSITION
This is for backwards compatibility with Java APIs only. It should not be used, IDENTICAL should be used instead. ICU's collation does not support Java's FULL_DECOMPOSITION mode.
Field Value:
15

IDENTICAL

public static final int IDENTICAL
Smallest Collator strength value. When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. See class documentation for more explanation.

Note this value is different from JDK's

Field Value:
15

NO_DECOMPOSITION

public static final int NO_DECOMPOSITION
Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be decomposed for collation. This is the default decomposition setting unless otherwise specified by the locale used to create the Collator.

Note this value is different from the JDK's.

Field Value:
16

PRIMARY

public static final int PRIMARY
Strongest collator strength value. Typically used to denote differences between base characters. See class documentation for more explanation.
Field Value:
0

QUATERNARY

public static final int QUATERNARY
Fourth level collator strength value. When punctuation is ignored (see Ignoring Punctuations in the user guide) at PRIMARY to TERTIARY strength, an additional strength level can be used to distinguish words with and without punctuation. See class documentation for more explanation.
Field Value:
3

SECONDARY

public static final int SECONDARY
Second level collator strength value. Accents in the characters are considered secondary differences. Other differences between letters can also be considered secondary differences, depending on the language. See class documentation for more explanation.
Field Value:
1

TERTIARY

public static final int TERTIARY
Third level collator strength value. Upper and lower case differences in characters are distinguished at this strength level. In addition, a variant of a letter differs from the base form on the tertiary level. See class documentation for more explanation.
Field Value:
2

Constructor Details

Collator

protected Collator()
Empty default constructor to make javadocs happy

Method Details

clone

public Object clone()
            throws CloneNotSupportedException
Clone the collator.
Returns:
a clone of this collator.

compare

public int compare(Object source,
                   Object target)
Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.
Parameters:
source - the source String.
target - the target String.
Returns:
Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.

compare

public abstract int compare(String source,
                            String target)
Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.
Parameters:
source - the source String.
target - the target String.
Returns:
Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.

equals

public boolean equals(String source,
                      String target)
Convenience method for comparing the equality of two text Strings using this Collator's rules, strength and decomposition mode.
Parameters:
source - the source string to be compared.
target - the target string to be compared.
Returns:
true if the strings are equal according to the collation rules, otherwise false.
See Also:
compare

getAvailableLocales

public static Locale[] getAvailableLocales()
Get the set of locales, as Locale objects, for which collators are installed. Note that Locale objects do not support RFC 3066.
Returns:
the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J.

getAvailableULocales

public static final ULocale[] getAvailableULocales()
Get the set of locales, as ULocale objects, for which collators are installed. ULocale objects support RFC 3066.
Returns:
the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J.

getCollationKey

public abstract CollationKey getCollationKey(String source)
Transforms the String into a CollationKey suitable for efficient repeated comparison. The resulting key depends on the collator's rules, strength and decomposition mode.

See the CollationKey class documentation for more information.

Parameters:
source - the string to be transformed into a CollationKey.
Returns:
the CollationKey for the given String based on this Collator's collation rules. If the source String is null, a null CollationKey is returned.

getDecomposition

public int getDecomposition()
Get the decomposition mode of this Collator. Decomposition mode determines how Unicode composed characters are handled.

See the Collator class description for more details.

Returns:
the decomposition mode

getDisplayName

public static String getDisplayName(Locale objectLocale)
Get the name of the collator for the objectLocale, localized for the current locale.
Parameters:
objectLocale - the locale of the collator
Returns:
the display name

getDisplayName

public static String getDisplayName(Locale objectLocale,
                                    Locale displayLocale)
Get the name of the collator for the objectLocale, localized for the displayLocale.
Parameters:
objectLocale - the locale of the collator
displayLocale - the locale for the collator's display name
Returns:
the display name

getDisplayName

public static String getDisplayName(ULocale objectLocale)
Get the name of the collator for the objectLocale, localized for the current locale.
Parameters:
objectLocale - the locale of the collator
Returns:
the display name

getDisplayName

public static String getDisplayName(ULocale objectLocale,
                                    ULocale displayLocale)
Get the name of the collator for the objectLocale, localized for the displayLocale.
Parameters:
objectLocale - the locale of the collator
displayLocale - the locale for the collator's display name
Returns:
the display name

getFunctionalEquivalent

public static final ULocale getFunctionalEquivalent(String keyword,
                                                    ULocale locID)
Return the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.
Parameters:
keyword - a particular keyword as enumerated by getKeywords.
locID - The requested locale
Returns:
the locale
See Also:
getFunctionalEquivalent(String,ULocale,boolean[])

getFunctionalEquivalent

public static final ULocale getFunctionalEquivalent(String keyword,
                                                    ULocale locID,
                                                    isAvailable[] )
Return the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service. If two locales return the same result, then collators instantiated for these locales will behave equivalently. The converse is not always true; two collators may in fact be equivalent, but return different results, due to internal details. The return result has no other meaning than that stated above, and implies nothing as to the relationship between the two locales. This is intended for use by applications who wish to cache collators, or otherwise reuse collators when possible. The functional equivalent may change over time. For more information, please see the Locales and Services section of the ICU User Guide.
Parameters:
keyword - a particular keyword as enumerated by getKeywords.
locID - The requested locale
Returns:
the locale

getInstance

public static final Collator getInstance()
Gets the Collator for the current default locale. The default locale is determined by java.util.Locale.getDefault().
Returns:
the Collator for the default locale (for example, en_US) if it is created successfully. Otherwise if there is no Collator associated with the current locale, the default UCA collator will be returned.

getInstance

public static final Collator getInstance(Locale locale)
Gets the Collator for the desired locale.
Parameters:
locale - the desired locale.
Returns:
Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, a default UCA collator will be returned.

getInstance

public static final Collator getInstance(ULocale locale)
Gets the Collator for the desired locale.
Parameters:
locale - the desired locale.
Returns:
Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, a default UCA collator will be returned.

getKeywordValues

public static final String[] getKeywordValues(String keyword)
Given a keyword, return an array of all values for that keyword that are currently in use.
Parameters:
keyword - one of the keywords returned by getKeywords.

getKeywords

public static final String[] getKeywords()
Return an array of all possible keywords that are relevant to collation. At this point, the only recognized keyword for this service is "collation".
Returns:
an array of valid collation keywords.

getLocale

public final ULocale getLocale(ULocale.Type type)
Return the locale that was used to create this object, or null. This may may differ from the locale requested at the time of this object's creation. For example, if an object is created for locale en_US_CALIFORNIA, the actual data may be drawn from en (the actual locale), and en_US may be the most specific locale that exists (the valid locale).

Note: This method will be implemented in ICU 3.0; ICU 2.8 contains a partial preview implementation. The * actual locale is returned correctly, but the valid locale is not, in most cases.

Parameters:
type - type of information requested, either ULocale.VALID_LOCALE or ULocale.ACTUAL_LOCALE.
Returns:
the information specified by type, or null if this object was not constructed from locale data.

getRawCollationKey

public abstract RawCollationKey getRawCollationKey(String source,
                                                   RawCollationKey key)
Gets the simpler form of a CollationKey for the String source following the rules of this Collator and stores the result into the user provided argument key. If key has a internal byte array of length that's too small for the result, the internal byte array will be grown to the exact required size.
Parameters:
source - the text String to be transformed into a RawCollationKey
Returns:
If key is null, a new instance of RawCollationKey will be created and returned, otherwise the user provided key will be returned.

getStrength

public int getStrength()
Returns this Collator's strength property. The strength property determines the minimum level of difference considered significant.

See the Collator class description for more details.

Returns:
this Collator's current strength property.

getTailoredSet

public UnicodeSet getTailoredSet()
Get an UnicodeSet that contains all the characters and sequences tailored in this collator.
Returns:
a pointer to a UnicodeSet object containing all the code points and sequences that may sort differently than in the UCA.

getUCAVersion

public abstract VersionInfo getUCAVersion()
Get the UCA version of this collator object.
Returns:
the version object associated with this collator

getVariableTop

public abstract int getVariableTop()
Gets the variable top value of a Collator. Lower 16 bits are undefined and should be ignored.
Returns:
the variable top value of a Collator.
See Also:
setVariableTop

getVersion

public abstract VersionInfo getVersion()
Get the version of this collator object.
Returns:
the version object associated with this collator

registerFactory

public static final Object registerFactory(Collator.CollatorFactory factory)
Register a collator factory.
Parameters:
factory - the factory to register
Returns:
an object that can be used to unregister the registered factory.

registerInstance

public static final Object registerInstance(Collator collator,
                                            ULocale locale)
Register a collator as the default collator for the provided locale. The collator should not be modified after it is registered.
Parameters:
collator - the collator to register
locale - the locale for which this is the default collator
Returns:
an object that can be used to unregister the registered collator.

setDecomposition

public void setDecomposition(int decomposition)
Set the decomposition mode of this Collator. Setting this decomposition property with CANONICAL_DECOMPOSITION allows the Collator to handle un-normalized text properly, producing the same results as if the text were normalized. If NO_DECOMPOSITION is set, it is the user's responsibility to insure that all text is already in the appropriate form before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user to select between faster and more complete collation behavior.

Since a great many of the world's languages do not require text normalization, most locales set NO_DECOMPOSITION as the default decomposition mode. The default decompositon mode for the Collator is NO_DECOMPOSITON, unless specified otherwise by the locale used to create the Collator.

See getDecomposition for a description of decomposition mode.

Parameters:
decomposition - the new decomposition mode

setStrength

public void setStrength(int newStrength)
Sets this Collator's strength property. The strength property determines the minimum level of difference considered significant during comparison.

The default strength for the Collator is TERTIARY, unless specified otherwise by the locale used to create the Collator.

See the Collator class description for an example of use.

Parameters:
newStrength - the new strength value.

setVariableTop

public abstract int setVariableTop(String varTop)
Variable top is a two byte primary value which causes all the codepoints with primary values that are less or equal than the variable top to be shifted when alternate handling is set to SHIFTED.

Sets the variable top to a collation element value of a string supplied.

Parameters:
varTop - one or more (if contraction) characters to which the variable top should be set
Returns:
a int value containing the value of the variable top in upper 16 bits. Lower 16 bits are undefined.

setVariableTop

public abstract void setVariableTop(int varTop)
Sets the variable top to a collation element value supplied. Variable top is set to the upper 16 bits. Lower 16 bits are ignored.
Parameters:
varTop - Collation element value, as returned by setVariableTop or getVariableTop

unregister

public static final boolean unregister(Object registryKey)
Unregister a collator previously registered using registerInstance.
Parameters:
registryKey - the object previously returned by registerInstance.
Returns:
true if the collator was successfully unregistered.

Copyright (c) 2006 IBM Corporation and others.