com.ibm.icu.util

Class StringTokenizer

Implemented Interfaces:
Enumeration

public final class StringTokenizer
extends Object
implements Enumeration

The string tokenizer class allows an application to break a string into tokens by performing code point comparison. The StringTokenizer methods do not distinguish among identifiers, numbers, and quoted strings, nor do they recognize and skip comments.

The set of delimiters (the codepoints that separate tokens) may be specified either at creation time or on a per-token basis.

An instance of StringTokenizer behaves in one of three ways, depending on whether it was created with the returnDelims and coalesceDelims flags having the value true or false:

A token is thus either one delimiter code point, a maximal sequence of consecutive code points that are delimiters, or a maximal sequence of consecutive code points that are not delimiters.

A StringTokenizer object internally maintains a current position within the string to be tokenized. Some operations advance this current position past the code point processed.

A token is returned by taking a substring of the string that was used to create the StringTokenizer object.

Example of the use of the default delimiter tokenizer.

 StringTokenizer st = new StringTokenizer("this is a test");
 while (st.hasMoreTokens()) {
     println(st.nextToken());
     }
 

prints the following output:

     this
     is
     a
     test
 

Example of the use of the tokenizer with user specified delimiter.

     StringTokenizer st = new StringTokenizer(
     "this is a test with supplementary characters \ud800\ud800\udc00\udc00",
         " \ud800\udc00");
     while (st.hasMoreTokens()) {
         println(st.nextToken());
     }
 

prints the following output:

     this
     is
     a
     test
     with
     supplementary
     characters
     \ud800
     \udc00
 
Author:
syn wee

Constructor Summary

StringTokenizer(String str)
Constructs a string tokenizer for the specified string.
StringTokenizer(String str, String delim)
Constructs a string tokenizer for the specified string.
StringTokenizer(String str, String delim, boolean returndelims)
Constructs a string tokenizer for the specified string.
StringTokenizer(String str, String delim, boolean returndelims, boolean coalescedelims)
Deprecated. This API is ICU internal only.
StringTokenizer(String str, UnicodeSet delim)
Constructs a string tokenizer for the specified string.
StringTokenizer(String str, UnicodeSet delim, boolean returndelims)
Constructs a string tokenizer for the specified string.
StringTokenizer(String str, UnicodeSet delim, boolean returndelims, boolean coalescedelims)
Deprecated. This API is ICU internal only.

Method Summary

int
countTokens()
Calculates the number of times that this tokenizer's nextToken method can be called before it generates an exception.
boolean
hasMoreElements()
Returns the same value as the hasMoreTokens method.
boolean
hasMoreTokens()
Tests if there are more tokens available from this tokenizer's string.
Object
nextElement()
Returns the same value as the nextToken method, except that its declared return value is Object rather than String.
String
nextToken()
Returns the next token from this string tokenizer.
String
nextToken(String delim)
Returns the next token in this string tokenizer's string.
String
nextToken(UnicodeSet delim)
Returns the next token in this string tokenizer's string.

Constructor Details

StringTokenizer

public StringTokenizer(String str)
Constructs a string tokenizer for the specified string. The tokenizer uses the default delimiter set, which is " \t\n\r\f": the space character, the tab character, the newline character, the carriage-return character, and the form-feed character.

Delimiter characters themselves will not be treated as tokens.

Parameters:
str - a string to be parsed

StringTokenizer

public StringTokenizer(String str,
                       String delim)
Constructs a string tokenizer for the specified string. The characters in the delim argument are the delimiters for separating tokens.

Delimiter characters themselves will not be treated as tokens.

Parameters:
str - a string to be parsed.
delim - the delimiters.

StringTokenizer

public StringTokenizer(String str,
                       String delim,
                       boolean returndelims)
Constructs a string tokenizer for the specified string. All characters in the delim argument are the delimiters for separating tokens.

If the returnDelims flag is false, the delimiter characters are skipped and only serve as separators between tokens.

If the returnDelims flag is true, then the delimiter characters are also returned as tokens, one per delimiter.

Parameters:
str - a string to be parsed.
delim - the delimiters.
returndelims - flag indicating whether to return the delimiters as tokens.

StringTokenizer

public StringTokenizer(String str,
                       String delim,
                       boolean returndelims,
                       boolean coalescedelims)

Deprecated. This API is ICU internal only.

Constructs a string tokenizer for the specified string. All characters in the delim argument are the delimiters for separating tokens.

If the returnDelims flag is false, the delimiter characters are skipped and only serve as separators between tokens.

If the returnDelims flag is true, then the delimiter characters are also returned as tokens. If coalescedelims is true, one token is returned for each run of delimiter characters, otherwise one token is returned per delimiter. Since surrogate pairs can be delimiters, the returned token might be two chars in length.

Parameters:
str - a string to be parsed.
delim - the delimiters.
returndelims - flag indicating whether to return the delimiters as tokens.
coalescedelims - flag indicating whether to return a run of delimiters as a single token or as one token per delimiter. This only takes effect if returndelims is true.

StringTokenizer

public StringTokenizer(String str,
                       UnicodeSet delim)
Constructs a string tokenizer for the specified string. The characters in the delim argument are the delimiters for separating tokens.

Delimiter characters themselves will not be treated as tokens.

Parameters:
str - a string to be parsed.
delim - the delimiters.

StringTokenizer

public StringTokenizer(String str,
                       UnicodeSet delim,
                       boolean returndelims)
Constructs a string tokenizer for the specified string. All characters in the delim argument are the delimiters for separating tokens.

If the returnDelims flag is false, the delimiter characters are skipped and only serve as separators between tokens.

If the returnDelims flag is true, then the delimiter characters are also returned as tokens, one per delimiter.

Parameters:
str - a string to be parsed.
delim - the delimiters.
returndelims - flag indicating whether to return the delimiters as tokens.

StringTokenizer

public StringTokenizer(String str,
                       UnicodeSet delim,
                       boolean returndelims,
                       boolean coalescedelims)

Deprecated. This API is ICU internal only.

Constructs a string tokenizer for the specified string. All characters in the delim argument are the delimiters for separating tokens.

If the returnDelims flag is false, the delimiter characters are skipped and only serve as separators between tokens.

If the returnDelims flag is true, then the delimiter characters are also returned as tokens. If coalescedelims is true, one token is returned for each run of delimiter characters, otherwise one token is returned per delimiter. Since surrogate pairs can be delimiters, the returned token might be two chars in length.

Parameters:
str - a string to be parsed.
delim - the delimiters.
returndelims - flag indicating whether to return the delimiters as tokens.
coalescedelims - flag indicating whether to return a run of delimiters as a single token or as one token per delimiter. This only takes effect if returndelims is true.

Method Details

countTokens

public int countTokens()
Calculates the number of times that this tokenizer's nextToken method can be called before it generates an exception. The current position is not advanced.
Returns:
the number of tokens remaining in the string using the current delimiter set.

hasMoreElements

public boolean hasMoreElements()
Returns the same value as the hasMoreTokens method. It exists so that this class can implement the Enumeration interface.
Returns:
true if there are more tokens; false otherwise.

hasMoreTokens

public boolean hasMoreTokens()
Tests if there are more tokens available from this tokenizer's string. If this method returns true, then a subsequent call to nextToken with no argument will successfully return a token.
Returns:
true if and only if there is at least one token in the string after the current position; false otherwise.

nextElement

public Object nextElement()
Returns the same value as the nextToken method, except that its declared return value is Object rather than String. It exists so that this class can implement the Enumeration interface.
Returns:
the next token in the string.

nextToken

public String nextToken()
Returns the next token from this string tokenizer.
Returns:
the next token from this string tokenizer.

nextToken

public String nextToken(String delim)
Returns the next token in this string tokenizer's string. First, the set of characters considered to be delimiters by this StringTokenizer object is changed to be the characters in the string delim. Then the next token in the string after the current position is returned. The current position is advanced beyond the recognized token. The new delimiter set remains the default after this call.
Parameters:
delim - the new delimiters.
Returns:
the next token, after switching to the new delimiter set.

nextToken

public String nextToken(UnicodeSet delim)
Returns the next token in this string tokenizer's string. First, the set of characters considered to be delimiters by this StringTokenizer object is changed to be the characters in the string delim. Then the next token in the string after the current position is returned. The current position is advanced beyond the recognized token. The new delimiter set remains the default after this call.
Parameters:
delim - the new delimiters.
Returns:
the next token, after switching to the new delimiter set.

Copyright (c) 2006 IBM Corporation and others.