Class StringTokenizer

  • All Implemented Interfaces:
    java.util.Enumeration<java.lang.Object>

    public final class StringTokenizer
    extends java.lang.Object
    implements java.util.Enumeration<java.lang.Object>
    .

    The string tokenizer class allows an application to break a string into tokens by performing code point comparison. The StringTokenizer methods do not distinguish among identifiers, numbers, and quoted strings, nor do they recognize and skip comments.

    The set of delimiters (the codepoints that separate tokens) may be specified either at creation time or on a per-token basis.

    An instance of StringTokenizer behaves in one of three ways, depending on whether it was created with the returnDelims and coalesceDelims flags having the value true or false:

    • If returnDelims is false, delimiter code points serve to separate tokens. A token is a maximal sequence of consecutive code points that are not delimiters.
    • If returnDelims is true, delimiter code points are themselves considered to be tokens. In this case, if coalesceDelims is true, such tokens will be the maximal sequence of consecutive code points that are delimiters. If coalesceDelims is false, a token will be received for each delimiter code point.

    A token is thus either one delimiter code point, a maximal sequence of consecutive code points that are delimiters, or a maximal sequence of consecutive code points that are not delimiters.

    A StringTokenizer object internally maintains a current position within the string to be tokenized. Some operations advance this current position past the code point processed.

    A token is returned by taking a substring of the string that was used to create the StringTokenizer object.

    Example of the use of the default delimiter tokenizer.

     StringTokenizer st = new StringTokenizer("this is a test");
     while (st.hasMoreTokens()) {
         println(st.nextToken());
         }
     

    prints the following output:

         this
         is
         a
         test
     

    Example of the use of the tokenizer with user specified delimiter.

         StringTokenizer st = new StringTokenizer(
         "this is a test with supplementary characters \ud800\ud800\udc00\udc00",
             " \ud800\udc00");
         while (st.hasMoreTokens()) {
             println(st.nextToken());
         }
     

    prints the following output:

         this
         is
         a
         test
         with
         supplementary
         characters
         \ud800
         \udc00
     
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static UnicodeSet DEFAULT_DELIMITERS_
      Default set of delimiters \t\n\r\f
      private boolean[] delims  
      private static UnicodeSet EMPTY_DELIMITER_
      A empty delimiter UnicodeSet, used when user specified null delimiters
      private boolean m_coalesceDelimiters_
      Flag indicating whether to coalesce runs of delimiters into single tokens
      private UnicodeSet m_delimiters_
      UnicodeSet containing delimiters
      private int m_length_
      Length of m_source_
      private int m_nextOffset_
      Current position in string to parse for tokens
      private boolean m_returnDelimiters_
      Flag indicator if delimiters are to be treated as tokens too
      private java.lang.String m_source_
      String to parse for tokens
      private int[] m_tokenLimit_
      Array of pre-calculated tokens limit indexes in source string.
      private int m_tokenOffset_
      Current offset to the token array.
      private int m_tokenSize_
      Size of the token array.
      private int[] m_tokenStart_
      Array of pre-calculated tokens start indexes in source string terminated by -1.
      private static int TOKEN_SIZE_
      Array size increments
    • Constructor Summary

      Constructors 
      Constructor Description
      StringTokenizer​(java.lang.String str)
      Constructs a string tokenizer for the specified string.
      StringTokenizer​(java.lang.String str, UnicodeSet delim)
      Constructs a string tokenizer for the specified string.
      StringTokenizer​(java.lang.String str, UnicodeSet delim, boolean returndelims)
      Constructs a string tokenizer for the specified string.
      StringTokenizer​(java.lang.String str, UnicodeSet delim, boolean returndelims, boolean coalescedelims)
      Deprecated.
      This API is ICU internal only.
      StringTokenizer​(java.lang.String str, java.lang.String delim)
      Constructs a string tokenizer for the specified string.
      StringTokenizer​(java.lang.String str, java.lang.String delim, boolean returndelims)
      Constructs a string tokenizer for the specified string.
      StringTokenizer​(java.lang.String str, java.lang.String delim, boolean returndelims, boolean coalescedelims)
      Deprecated.
      This API is ICU internal only.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      (package private) void checkDelimiters()  
      int countTokens()
      Calculates the number of times that this tokenizer's nextToken method can be called before it generates an exception.
      private int getNextDelimiter​(int offset)
      Gets the index of the next delimiter after offset
      private int getNextNonDelimiter​(int offset)
      Gets the index of the next non-delimiter after m_nextOffset_
      boolean hasMoreElements()
      Returns the same value as the hasMoreTokens method.
      boolean hasMoreTokens()
      Tests if there are more tokens available from this tokenizer's string.
      java.lang.Object nextElement()
      Returns the same value as the nextToken method, except that its declared return value is Object rather than String.
      java.lang.String nextToken()
      Returns the next token from this string tokenizer.
      java.lang.String nextToken​(UnicodeSet delim)
      Returns the next token in this string tokenizer's string.
      java.lang.String nextToken​(java.lang.String delim)
      Returns the next token in this string tokenizer's string.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.util.Enumeration

        asIterator
    • Field Detail

      • m_tokenOffset_

        private int m_tokenOffset_
        Current offset to the token array. If the array token is not set up yet, this value is a -1
      • m_tokenSize_

        private int m_tokenSize_
        Size of the token array. If the array token is not set up yet, this value is a -1
      • m_tokenStart_

        private int[] m_tokenStart_
        Array of pre-calculated tokens start indexes in source string terminated by -1. This is only set up during countTokens() and only stores the remaining tokens, not all tokens including parsed ones
      • m_tokenLimit_

        private int[] m_tokenLimit_
        Array of pre-calculated tokens limit indexes in source string. This is only set up during countTokens() and only stores the remaining tokens, not all tokens including parsed ones
      • m_delimiters_

        private UnicodeSet m_delimiters_
        UnicodeSet containing delimiters
      • m_source_

        private java.lang.String m_source_
        String to parse for tokens
      • m_length_

        private int m_length_
        Length of m_source_
      • m_nextOffset_

        private int m_nextOffset_
        Current position in string to parse for tokens
      • m_returnDelimiters_

        private boolean m_returnDelimiters_
        Flag indicator if delimiters are to be treated as tokens too
      • m_coalesceDelimiters_

        private boolean m_coalesceDelimiters_
        Flag indicating whether to coalesce runs of delimiters into single tokens
      • DEFAULT_DELIMITERS_

        private static final UnicodeSet DEFAULT_DELIMITERS_
        Default set of delimiters \t\n\r\f
      • TOKEN_SIZE_

        private static final int TOKEN_SIZE_
        Array size increments
        See Also:
        Constant Field Values
      • EMPTY_DELIMITER_

        private static final UnicodeSet EMPTY_DELIMITER_
        A empty delimiter UnicodeSet, used when user specified null delimiters
      • delims

        private boolean[] delims
    • Constructor Detail

      • StringTokenizer

        public StringTokenizer​(java.lang.String str,
                               UnicodeSet delim,
                               boolean returndelims)
        Constructs a string tokenizer for the specified string. All characters in the delim argument are the delimiters for separating tokens.

        If the returnDelims flag is false, the delimiter characters are skipped and only serve as separators between tokens.

        If the returnDelims flag is true, then the delimiter characters are also returned as tokens, one per delimiter.

        Parameters:
        str - a string to be parsed.
        delim - the delimiters.
        returndelims - flag indicating whether to return the delimiters as tokens.
        Throws:
        java.lang.NullPointerException - if str is null
      • StringTokenizer

        @Deprecated
        public StringTokenizer​(java.lang.String str,
                               UnicodeSet delim,
                               boolean returndelims,
                               boolean coalescedelims)
        Deprecated.
        This API is ICU internal only.
        Constructs a string tokenizer for the specified string. All characters in the delim argument are the delimiters for separating tokens.

        If the returnDelims flag is false, the delimiter characters are skipped and only serve as separators between tokens.

        If the returnDelims flag is true, then the delimiter characters are also returned as tokens. If coalescedelims is true, one token is returned for each run of delimiter characters, otherwise one token is returned per delimiter. Since surrogate pairs can be delimiters, the returned token might be two chars in length.

        Parameters:
        str - a string to be parsed.
        delim - the delimiters.
        returndelims - flag indicating whether to return the delimiters as tokens.
        coalescedelims - flag indicating whether to return a run of delimiters as a single token or as one token per delimiter. This only takes effect if returndelims is true.
        Throws:
        java.lang.NullPointerException - if str is null
      • StringTokenizer

        public StringTokenizer​(java.lang.String str,
                               UnicodeSet delim)
        Constructs a string tokenizer for the specified string. The characters in the delim argument are the delimiters for separating tokens.

        Delimiter characters themselves will not be treated as tokens.

        Parameters:
        str - a string to be parsed.
        delim - the delimiters.
        Throws:
        java.lang.NullPointerException - if str is null
      • StringTokenizer

        public StringTokenizer​(java.lang.String str,
                               java.lang.String delim,
                               boolean returndelims)

        Constructs a string tokenizer for the specified string. All characters in the delim argument are the delimiters for separating tokens.

        If the returnDelims flag is false, the delimiter characters are skipped and only serve as separators between tokens.

        If the returnDelims flag is true, then the delimiter characters are also returned as tokens, one per delimiter.

        Parameters:
        str - a string to be parsed.
        delim - the delimiters.
        returndelims - flag indicating whether to return the delimiters as tokens.
        Throws:
        java.lang.NullPointerException - if str is null
      • StringTokenizer

        @Deprecated
        public StringTokenizer​(java.lang.String str,
                               java.lang.String delim,
                               boolean returndelims,
                               boolean coalescedelims)
        Deprecated.
        This API is ICU internal only.

        Constructs a string tokenizer for the specified string. All characters in the delim argument are the delimiters for separating tokens.

        If the returnDelims flag is false, the delimiter characters are skipped and only serve as separators between tokens.

        If the returnDelims flag is true, then the delimiter characters are also returned as tokens. If coalescedelims is true, one token is returned for each run of delimiter characters, otherwise one token is returned per delimiter. Since surrogate pairs can be delimiters, the returned token might be two chars in length.

        Parameters:
        str - a string to be parsed.
        delim - the delimiters.
        returndelims - flag indicating whether to return the delimiters as tokens.
        coalescedelims - flag indicating whether to return a run of delimiters as a single token or as one token per delimiter. This only takes effect if returndelims is true.
        Throws:
        java.lang.NullPointerException - if str is null
      • StringTokenizer

        public StringTokenizer​(java.lang.String str,
                               java.lang.String delim)

        Constructs a string tokenizer for the specified string. The characters in the delim argument are the delimiters for separating tokens.

        Delimiter characters themselves will not be treated as tokens.

        Parameters:
        str - a string to be parsed.
        delim - the delimiters.
        Throws:
        java.lang.NullPointerException - if str is null
      • StringTokenizer

        public StringTokenizer​(java.lang.String str)

        Constructs a string tokenizer for the specified string. The tokenizer uses the default delimiter set, which is " \t\n\r\f": the space character, the tab character, the newline character, the carriage-return character, and the form-feed character.

        Delimiter characters themselves will not be treated as tokens.

        Parameters:
        str - a string to be parsed
        Throws:
        java.lang.NullPointerException - if str is null
    • Method Detail

      • hasMoreTokens

        public boolean hasMoreTokens()
        Tests if there are more tokens available from this tokenizer's string. If this method returns true, then a subsequent call to nextToken with no argument will successfully return a token.
        Returns:
        true if and only if there is at least one token in the string after the current position; false otherwise.
      • nextToken

        public java.lang.String nextToken()
        Returns the next token from this string tokenizer.
        Returns:
        the next token from this string tokenizer.
        Throws:
        java.util.NoSuchElementException - if there are no more tokens in this tokenizer's string.
      • nextToken

        public java.lang.String nextToken​(java.lang.String delim)
        Returns the next token in this string tokenizer's string. First, the set of characters considered to be delimiters by this StringTokenizer object is changed to be the characters in the string delim. Then the next token in the string after the current position is returned. The current position is advanced beyond the recognized token. The new delimiter set remains the default after this call.
        Parameters:
        delim - the new delimiters.
        Returns:
        the next token, after switching to the new delimiter set.
        Throws:
        java.util.NoSuchElementException - if there are no more tokens in this tokenizer's string.
      • nextToken

        public java.lang.String nextToken​(UnicodeSet delim)
        Returns the next token in this string tokenizer's string. First, the set of characters considered to be delimiters by this StringTokenizer object is changed to be the characters in the string delim. Then the next token in the string after the current position is returned. The current position is advanced beyond the recognized token. The new delimiter set remains the default after this call.
        Parameters:
        delim - the new delimiters.
        Returns:
        the next token, after switching to the new delimiter set.
        Throws:
        java.util.NoSuchElementException - if there are no more tokens in this tokenizer's string.
      • hasMoreElements

        public boolean hasMoreElements()
        Returns the same value as the hasMoreTokens method. It exists so that this class can implement the Enumeration interface.
        Specified by:
        hasMoreElements in interface java.util.Enumeration<java.lang.Object>
        Returns:
        true if there are more tokens; false otherwise.
        See Also:
        hasMoreTokens()
      • nextElement

        public java.lang.Object nextElement()
        Returns the same value as the nextToken method, except that its declared return value is Object rather than String. It exists so that this class can implement the Enumeration interface.
        Specified by:
        nextElement in interface java.util.Enumeration<java.lang.Object>
        Returns:
        the next token in the string.
        Throws:
        java.util.NoSuchElementException - if there are no more tokens in this tokenizer's string.
        See Also:
        nextToken()
      • countTokens

        public int countTokens()
        Calculates the number of times that this tokenizer's nextToken method can be called before it generates an exception. The current position is not advanced.
        Returns:
        the number of tokens remaining in the string using the current delimiter set.
        See Also:
        nextToken()
      • getNextDelimiter

        private int getNextDelimiter​(int offset)
        Gets the index of the next delimiter after offset
        Parameters:
        offset - to the source string
        Returns:
        offset of the immediate next delimiter, otherwise (- source string length - 1) if there are no more delimiters after m_nextOffset
      • getNextNonDelimiter

        private int getNextNonDelimiter​(int offset)
        Gets the index of the next non-delimiter after m_nextOffset_
        Parameters:
        offset - to the source string
        Returns:
        offset of the immediate next non-delimiter, otherwise (- source string length - 1) if there are no more delimiters after m_nextOffset
      • checkDelimiters

        void checkDelimiters()