Class ClassicTokenizerImpl


  • class ClassicTokenizerImpl
    extends java.lang.Object
    This class implements the classic lucene StandardTokenizer up until 3.0
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int ACRONYM  
      static int ACRONYM_DEP  
      static int ALPHANUM  
      static int APOSTROPHE  
      static int CJ  
      static int COMPANY  
      static int EMAIL  
      static int HOST  
      static int NUM  
      static java.lang.String[] TOKEN_TYPES  
      private long yychar
      Number of characters up to the start of the matched text.
      private int yycolumn
      Number of characters from the last newline up to the start of the matched text.
      static int YYEOF
      This character denotes the end of file.
      static int YYINITIAL
      Lexical states.
      private int yyline
      Number of newlines encountered up to the start of the matched text.
      private static int[] ZZ_ACTION
      Translates DFA states to action switch labels.
      private static java.lang.String ZZ_ACTION_PACKED_0  
      private static int[] ZZ_ATTRIBUTE
      ZZ_ATTRIBUTE[aState] contains the attributes of state aState
      private static java.lang.String ZZ_ATTRIBUTE_PACKED_0  
      private static int ZZ_BUFFERSIZE
      Initial size of the lookahead buffer.
      private static int[] ZZ_CMAP_BLOCKS
      Second-level tables for translating characters to character classes
      private static java.lang.String ZZ_CMAP_BLOCKS_PACKED_0  
      private static int[] ZZ_CMAP_TOP
      Top-level table for translating characters to character classes
      private static java.lang.String ZZ_CMAP_TOP_PACKED_0  
      private static java.lang.String[] ZZ_ERROR_MSG
      Error messages for ZZ_UNKNOWN_ERROR, ZZ_NO_MATCH, and ZZ_PUSHBACK_2BIG respectively.
      private static int[] ZZ_LEXSTATE
      ZZ_LEXSTATE[l] is the state in the DFA for the lexical state l ZZ_LEXSTATE[l+1] is the state in the DFA for the lexical state l at the beginning of a line l is of the form l = 2*k, k a non negative integer
      private static int ZZ_NO_MATCH
      Error code for "could not match input".
      private static int ZZ_PUSHBACK_2BIG
      Error code for "pushback value was too large".
      private static int[] ZZ_ROWMAP
      Translates a state to a row index in the transition table
      private static java.lang.String ZZ_ROWMAP_PACKED_0  
      private static int[] ZZ_TRANS
      The transition table of the DFA
      private static java.lang.String ZZ_TRANS_PACKED_0  
      private static int ZZ_UNKNOWN_ERROR
      Error code for "Unknown internal scanner error".
      private boolean zzAtBOL
      Whether the scanner is currently at the beginning of a line.
      private boolean zzAtEOF
      Whether the scanner is at the end of file.
      private char[] zzBuffer
      This buffer contains the current text to be matched and is the source of the yytext() string.
      private int zzCurrentPos
      Current text position in the buffer.
      private int zzEndRead
      Marks the last character in the buffer, that has been read from input.
      private boolean zzEOFDone
      Whether the user-EOF-code has already been executed.
      private int zzFinalHighSurrogate
      The number of occupied positions in zzBuffer beyond zzEndRead.
      private int zzLexicalState
      Current lexical state.
      private int zzMarkedPos
      Text position at the last accepting state.
      private java.io.Reader zzReader
      Input device.
      private int zzStartRead
      Marks the beginning of the yytext() string in the buffer.
      private int zzState
      Current state of the DFA.
    • Constructor Summary

      Constructors 
      Constructor Description
      ClassicTokenizerImpl​(java.io.Reader in)
      Creates a new scanner
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int getNextToken()
      Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
      void getText​(CharTermAttribute t)
      Fills CharTermAttribute with the current token text.
      void setBufferSize​(int numChars)  
      boolean yyatEOF()
      Returns whether the scanner has reached the end of the reader it reads from.
      void yybegin​(int newState)
      Enters a new lexical state.
      int yychar()  
      char yycharat​(int position)
      Returns the character at the given position from the matched text.
      void yyclose()
      Closes the input reader.
      int yylength()
      How many characters were matched.
      void yypushback​(int number)
      Pushes the specified amount of characters back into the input stream.
      void yyreset​(java.io.Reader reader)
      Resets the scanner to read from a new input stream.
      private void yyResetPosition()
      Resets the input position.
      int yystate()
      Returns the current lexical state.
      java.lang.String yytext()
      Returns the text matched by the current regular expression.
      private static int zzCMap​(int input)
      Translates raw input code points to DFA table row
      private boolean zzRefill()
      Refills the input buffer.
      private static void zzScanError​(int errorCode)
      Reports an error that occurred while scanning.
      private static int[] zzUnpackAction()  
      private static int zzUnpackAction​(java.lang.String packed, int offset, int[] result)  
      private static int[] zzUnpackAttribute()  
      private static int zzUnpackAttribute​(java.lang.String packed, int offset, int[] result)  
      private static int[] zzUnpackcmap_blocks()  
      private static int zzUnpackcmap_blocks​(java.lang.String packed, int offset, int[] result)  
      private static int[] zzUnpackcmap_top()  
      private static int zzUnpackcmap_top​(java.lang.String packed, int offset, int[] result)  
      private static int[] zzUnpackRowMap()  
      private static int zzUnpackRowMap​(java.lang.String packed, int offset, int[] result)  
      private static int[] zzUnpackTrans()  
      private static int zzUnpackTrans​(java.lang.String packed, int offset, int[] result)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • YYEOF

        public static final int YYEOF
        This character denotes the end of file.
        See Also:
        Constant Field Values
      • ZZ_BUFFERSIZE

        private static final int ZZ_BUFFERSIZE
        Initial size of the lookahead buffer.
        See Also:
        Constant Field Values
      • ZZ_LEXSTATE

        private static final int[] ZZ_LEXSTATE
        ZZ_LEXSTATE[l] is the state in the DFA for the lexical state l ZZ_LEXSTATE[l+1] is the state in the DFA for the lexical state l at the beginning of a line l is of the form l = 2*k, k a non negative integer
      • ZZ_CMAP_TOP

        private static final int[] ZZ_CMAP_TOP
        Top-level table for translating characters to character classes
      • ZZ_CMAP_TOP_PACKED_0

        private static final java.lang.String ZZ_CMAP_TOP_PACKED_0
        See Also:
        Constant Field Values
      • ZZ_CMAP_BLOCKS

        private static final int[] ZZ_CMAP_BLOCKS
        Second-level tables for translating characters to character classes
      • ZZ_CMAP_BLOCKS_PACKED_0

        private static final java.lang.String ZZ_CMAP_BLOCKS_PACKED_0
        See Also:
        Constant Field Values
      • ZZ_ACTION

        private static final int[] ZZ_ACTION
        Translates DFA states to action switch labels.
      • ZZ_ACTION_PACKED_0

        private static final java.lang.String ZZ_ACTION_PACKED_0
        See Also:
        Constant Field Values
      • ZZ_ROWMAP

        private static final int[] ZZ_ROWMAP
        Translates a state to a row index in the transition table
      • ZZ_ROWMAP_PACKED_0

        private static final java.lang.String ZZ_ROWMAP_PACKED_0
        See Also:
        Constant Field Values
      • ZZ_TRANS

        private static final int[] ZZ_TRANS
        The transition table of the DFA
      • ZZ_TRANS_PACKED_0

        private static final java.lang.String ZZ_TRANS_PACKED_0
        See Also:
        Constant Field Values
      • ZZ_UNKNOWN_ERROR

        private static final int ZZ_UNKNOWN_ERROR
        Error code for "Unknown internal scanner error".
        See Also:
        Constant Field Values
      • ZZ_NO_MATCH

        private static final int ZZ_NO_MATCH
        Error code for "could not match input".
        See Also:
        Constant Field Values
      • ZZ_PUSHBACK_2BIG

        private static final int ZZ_PUSHBACK_2BIG
        Error code for "pushback value was too large".
        See Also:
        Constant Field Values
      • ZZ_ATTRIBUTE

        private static final int[] ZZ_ATTRIBUTE
        ZZ_ATTRIBUTE[aState] contains the attributes of state aState
      • ZZ_ATTRIBUTE_PACKED_0

        private static final java.lang.String ZZ_ATTRIBUTE_PACKED_0
        See Also:
        Constant Field Values
      • zzReader

        private java.io.Reader zzReader
        Input device.
      • zzState

        private int zzState
        Current state of the DFA.
      • zzLexicalState

        private int zzLexicalState
        Current lexical state.
      • zzBuffer

        private char[] zzBuffer
        This buffer contains the current text to be matched and is the source of the yytext() string.
      • zzMarkedPos

        private int zzMarkedPos
        Text position at the last accepting state.
      • zzCurrentPos

        private int zzCurrentPos
        Current text position in the buffer.
      • zzStartRead

        private int zzStartRead
        Marks the beginning of the yytext() string in the buffer.
      • zzEndRead

        private int zzEndRead
        Marks the last character in the buffer, that has been read from input.
      • zzAtEOF

        private boolean zzAtEOF
        Whether the scanner is at the end of file.
        See Also:
        yyatEOF()
      • zzFinalHighSurrogate

        private int zzFinalHighSurrogate
        The number of occupied positions in zzBuffer beyond zzEndRead.

        When a lead/high surrogate has been read from the input stream into the final zzBuffer position, this will have a value of 1; otherwise, it will have a value of 0.

      • yyline

        private int yyline
        Number of newlines encountered up to the start of the matched text.
      • yycolumn

        private int yycolumn
        Number of characters from the last newline up to the start of the matched text.
      • yychar

        private long yychar
        Number of characters up to the start of the matched text.
      • zzAtBOL

        private boolean zzAtBOL
        Whether the scanner is currently at the beginning of a line.
      • zzEOFDone

        private boolean zzEOFDone
        Whether the user-EOF-code has already been executed.
      • TOKEN_TYPES

        public static final java.lang.String[] TOKEN_TYPES
    • Constructor Detail

      • ClassicTokenizerImpl

        ClassicTokenizerImpl​(java.io.Reader in)
        Creates a new scanner
        Parameters:
        in - the java.io.Reader to read input from.
    • Method Detail

      • zzUnpackcmap_top

        private static int[] zzUnpackcmap_top()
      • zzUnpackcmap_top

        private static int zzUnpackcmap_top​(java.lang.String packed,
                                            int offset,
                                            int[] result)
      • zzUnpackcmap_blocks

        private static int[] zzUnpackcmap_blocks()
      • zzUnpackcmap_blocks

        private static int zzUnpackcmap_blocks​(java.lang.String packed,
                                               int offset,
                                               int[] result)
      • zzUnpackAction

        private static int[] zzUnpackAction()
      • zzUnpackAction

        private static int zzUnpackAction​(java.lang.String packed,
                                          int offset,
                                          int[] result)
      • zzUnpackRowMap

        private static int[] zzUnpackRowMap()
      • zzUnpackRowMap

        private static int zzUnpackRowMap​(java.lang.String packed,
                                          int offset,
                                          int[] result)
      • zzUnpackTrans

        private static int[] zzUnpackTrans()
      • zzUnpackTrans

        private static int zzUnpackTrans​(java.lang.String packed,
                                         int offset,
                                         int[] result)
      • zzUnpackAttribute

        private static int[] zzUnpackAttribute()
      • zzUnpackAttribute

        private static int zzUnpackAttribute​(java.lang.String packed,
                                             int offset,
                                             int[] result)
      • yychar

        public final int yychar()
      • getText

        public final void getText​(CharTermAttribute t)
        Fills CharTermAttribute with the current token text.
      • setBufferSize

        public final void setBufferSize​(int numChars)
      • zzCMap

        private static int zzCMap​(int input)
        Translates raw input code points to DFA table row
      • zzRefill

        private boolean zzRefill()
                          throws java.io.IOException
        Refills the input buffer.
        Returns:
        false iff there was new input.
        Throws:
        java.io.IOException - if any I/O-Error occurs
      • yyclose

        public final void yyclose()
                           throws java.io.IOException
        Closes the input reader.
        Throws:
        java.io.IOException - if the reader could not be closed.
      • yyreset

        public final void yyreset​(java.io.Reader reader)
        Resets the scanner to read from a new input stream.

        Does not close the old reader.

        All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL.

        Internal scan buffer is resized down to its initial length, if it has grown.

        Parameters:
        reader - The new input stream.
      • yyResetPosition

        private final void yyResetPosition()
        Resets the input position.
      • yyatEOF

        public final boolean yyatEOF()
        Returns whether the scanner has reached the end of the reader it reads from.
        Returns:
        whether the scanner has reached EOF.
      • yystate

        public final int yystate()
        Returns the current lexical state.
        Returns:
        the current lexical state.
      • yybegin

        public final void yybegin​(int newState)
        Enters a new lexical state.
        Parameters:
        newState - the new lexical state
      • yytext

        public final java.lang.String yytext()
        Returns the text matched by the current regular expression.
        Returns:
        the matched text.
      • yycharat

        public final char yycharat​(int position)
        Returns the character at the given position from the matched text.

        It is equivalent to yytext().charAt(pos), but faster.

        Parameters:
        position - the position of the character to fetch. A value from 0 to yylength()-1.
        Returns:
        the character at position.
      • yylength

        public final int yylength()
        How many characters were matched.
        Returns:
        the length of the matched text region.
      • zzScanError

        private static void zzScanError​(int errorCode)
        Reports an error that occurred while scanning.

        In a well-formed scanner (no or only correct usage of yypushback(int) and a match-all fallback rule) this method will only be called with things that "Can't Possibly Happen".

        If this method is called, something is seriously wrong (e.g. a JFlex bug producing a faulty scanner etc.).

        Usual syntax/scanner level error handling should be done in error fallback rules.

        Parameters:
        errorCode - the code of the error message to display.
      • yypushback

        public void yypushback​(int number)
        Pushes the specified amount of characters back into the input stream.

        They will be read again by then next call of the scanning method.

        Parameters:
        number - the number of characters to be read again. This number must not be greater than yylength().
      • getNextToken

        public int getNextToken()
                         throws java.io.IOException
        Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
        Returns:
        the next token.
        Throws:
        java.io.IOException - if any I/O-Error occurs.