IDNA API implements the IDNA protocol as defined in the
IDNA RFC.
The draft defines 2 operations: ToASCII and ToUnicode. Domain labels
containing non-ASCII code points are required to be processed by
ToASCII operation before passing it to resolver libraries. Domain names
that are obtained from resolver libraries are required to be processed by
ToUnicode operation before displaying the domain name to the user.
IDNA requires that implementations process input strings with
Nameprep,
which is a profile of
Stringprep ,
and then with
Punycode.
Implementations of IDNA MUST fully implement Nameprep and Punycode;
neither Nameprep nor Punycode are optional.
The input and output of ToASCII and ToUnicode operations are Unicode
and are designed to be chainable, i.e., applying ToASCII or ToUnicode operations
multiple times to an input string will yield the same result as applying the operation
once.
ToUnicode(ToUnicode(ToUnicode...(ToUnicode(string)))) == ToUnicode(string)
ToASCII(ToASCII(ToASCII...(ToASCII(string))) == ToASCII(string).
compare
public static int compare(String s1,
String s2,
int options)
throws StringPrepParseException
Compare two IDN strings for equivalence.
This function splits the domain names into labels and compares them.
According to IDN RFC, whenever two labels are compared, they are
considered equal if and only if their ASCII forms (obtained by
applying toASCII) match using an case-insensitive ASCII comparison.
Two domain names are considered a match if and only if all labels
match regardless of whether label separators match.
s1
- First IDN strings2
- Second IDN stringoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- 0 if the strings are equal, > 0 if s1 > s2 and <320 if s1 <32s2
compare
public static int compare(StringBuffer s1,
StringBuffer s2,
int options)
throws StringPrepParseException
Compare two IDN strings for equivalence.
This function splits the domain names into labels and compares them.
According to IDN RFC, whenever two labels are compared, they are
considered equal if and only if their ASCII forms (obtained by
applying toASCII) match using an case-insensitive ASCII comparison.
Two domain names are considered a match if and only if all labels
match regardless of whether label separators match.
s1
- First IDN string as StringBuffers2
- Second IDN string as StringBufferoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- 0 if the strings are equal, > 0 if s1 > s2 and <320 if s1 <32s2
compare
public static int compare(UCharacterIterator s1,
UCharacterIterator s2,
int options)
throws StringPrepParseException
Compare two IDN strings for equivalence.
This function splits the domain names into labels and compares them.
According to IDN RFC, whenever two labels are compared, they are
considered equal if and only if their ASCII forms (obtained by
applying toASCII) match using an case-insensitive ASCII comparison.
Two domain names are considered a match if and only if all labels
match regardless of whether label separators match.
s1
- First IDN string as UCharacterIterators2
- Second IDN string as UCharacterIteratoroptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- 0 if the strings are equal, > 0 if i1 > i2 and <320 if i1 <32i2
convertIDNToASCII
public static StringBuffer convertIDNToASCII(String src,
int options)
throws StringPrepParseException
Convenience function that implements the IDNToASCII operation as defined in the IDNA RFC.
This operation is done on complete domain names, e.g: "www.example.com".
It is important to note that this operation can fail. If it fails, then the input
domain name cannot be used as an Internationalized Domain Name and the application
should have methods defined to deal with the failure.
Note: IDNA RFC specifies that a conformant application should divide a domain name
into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each,
and then convert. This function does not offer that level of granularity. The options once
set will apply to all labels in the domain name
src
- The input string to be processedoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- StringBuffer the converted String
convertIDNToASCII
public static StringBuffer convertIDNToASCII(StringBuffer src,
int options)
throws StringPrepParseException
Convenience function that implements the IDNToASCII operation as defined in the IDNA RFC.
This operation is done on complete domain names, e.g: "www.example.com".
It is important to note that this operation can fail. If it fails, then the input
domain name cannot be used as an Internationalized Domain Name and the application
should have methods defined to deal with the failure.
Note: IDNA RFC specifies that a conformant application should divide a domain name
into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each,
and then convert. This function does not offer that level of granularity. The options once
set will apply to all labels in the domain name
src
- The input string as a StringBuffer to be processedoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- StringBuffer the converted String
convertIDNToASCII
public static StringBuffer convertIDNToASCII(UCharacterIterator src,
int options)
throws StringPrepParseException
Convenience function that implements the IDNToASCII operation as defined in the IDNA RFC.
This operation is done on complete domain names, e.g: "www.example.com".
It is important to note that this operation can fail. If it fails, then the input
domain name cannot be used as an Internationalized Domain Name and the application
should have methods defined to deal with the failure.
Note: IDNA RFC specifies that a conformant application should divide a domain name
into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each,
and then convert. This function does not offer that level of granularity. The options once
set will apply to all labels in the domain name
src
- The input string as UCharacterIterator to be processedoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- StringBuffer the converted String
convertIDNToUnicode
public static StringBuffer convertIDNToUnicode(String src,
int options)
throws StringPrepParseException
Convenience function that implements the IDNToUnicode operation as defined in the IDNA RFC.
This operation is done on complete domain names, e.g: "www.example.com".
Note: IDNA RFC specifies that a conformant application should divide a domain name
into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each,
and then convert. This function does not offer that level of granularity. The options once
set will apply to all labels in the domain name
src
- The input string to be processedoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- StringBuffer the converted String
convertIDNToUnicode
public static StringBuffer convertIDNToUnicode(StringBuffer src,
int options)
throws StringPrepParseException
Convenience function that implements the IDNToUnicode operation as defined in the IDNA RFC.
This operation is done on complete domain names, e.g: "www.example.com".
Note: IDNA RFC specifies that a conformant application should divide a domain name
into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each,
and then convert. This function does not offer that level of granularity. The options once
set will apply to all labels in the domain name
src
- The input string as StringBuffer to be processedoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- StringBuffer the converted String
convertIDNToUnicode
public static StringBuffer convertIDNToUnicode(UCharacterIterator src,
int options)
throws StringPrepParseException
Convenience function that implements the IDNToUnicode operation as defined in the IDNA RFC.
This operation is done on complete domain names, e.g: "www.example.com".
Note: IDNA RFC specifies that a conformant application should divide a domain name
into separate labels, decide whether to apply allowUnassigned and useSTD3ASCIIRules on each,
and then convert. This function does not offer that level of granularity. The options once
set will apply to all labels in the domain name
src
- The input string as UCharacterIterator to be processedoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- StringBuffer the converted String
convertToASCII
public static StringBuffer convertToASCII(String src,
int options)
throws StringPrepParseException
This function implements the ToASCII operation as defined in the IDNA RFC.
This operation is done on single labels before sending it to something that expects
ASCII names. A label is an individual part of a domain name. Labels are usually
separated by dots; e.g." "www.example.com" is composed of 3 labels
"www","example", and "com".
src
- The input string to be processedoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- StringBuffer the converted String
convertToASCII
public static StringBuffer convertToASCII(StringBuffer src,
int options)
throws StringPrepParseException
This function implements the ToASCII operation as defined in the IDNA RFC.
This operation is done on single labels before sending it to something that expects
ASCII names. A label is an individual part of a domain name. Labels are usually
separated by dots; e.g." "www.example.com" is composed of 3 labels
"www","example", and "com".
src
- The input string as StringBuffer to be processedoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- StringBuffer the converted String
convertToASCII
public static StringBuffer convertToASCII(UCharacterIterator src,
int options)
throws StringPrepParseException
This function implements the ToASCII operation as defined in the IDNA RFC.
This operation is done on single labels before sending it to something that expects
ASCII names. A label is an individual part of a domain name. Labels are usually
separated by dots; e.g." "www.example.com" is composed of 3 labels
"www","example", and "com".
src
- The input string as UCharacterIterator to be processedoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- StringBuffer the converted String
convertToUnicode
public static StringBuffer convertToUnicode(String src,
int options)
throws StringPrepParseException
This function implements the ToUnicode operation as defined in the IDNA RFC.
This operation is done on single labels before sending it to something that expects
Unicode names. A label is an individual part of a domain name. Labels are usually
separated by dots; for e.g." "www.example.com" is composed of 3 labels
"www","example", and "com".
src
- The input string to be processedoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- StringBuffer the converted String
convertToUnicode
public static StringBuffer convertToUnicode(StringBuffer src,
int options)
throws StringPrepParseException
This function implements the ToUnicode operation as defined in the IDNA RFC.
This operation is done on single labels before sending it to something that expects
Unicode names. A label is an individual part of a domain name. Labels are usually
separated by dots; for e.g." "www.example.com" is composed of 3 labels
"www","example", and "com".
src
- The input string as StringBuffer to be processedoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- StringBuffer the converted String
convertToUnicode
public static StringBuffer convertToUnicode(UCharacterIterator src,
int options)
throws StringPrepParseException
This function implements the ToUnicode operation as defined in the IDNA RFC.
This operation is done on single labels before sending it to something that expects
Unicode names. A label is an individual part of a domain name. Labels are usually
separated by dots; for e.g." "www.example.com" is composed of 3 labels
"www","example", and "com".
src
- The input string as UCharacterIterator to be processedoptions
- A bit set of options:
- IDNA.DEFAULT Use default options, i.e., do not process unassigned code points
and do not use STD3 ASCII rules
If unassigned code points are found the operation fails with
ParseException.
- IDNA.ALLOW_UNASSIGNED Unassigned values can be converted to ASCII for query operations
If this option is set, the unassigned code points are in the input
are treated as normal Unicode code points.
- IDNA.USE_STD3_RULES Use STD3 ASCII rules for host name syntax restrictions
If this option is set and the input does not satisfy STD3 rules,
the operation will fail with ParseException
- StringBuffer the converted String