net.sf.saxon.regex

Class JDK14RegexTranslator

public class JDK14RegexTranslator extends Object

This class translates XML Schema regex syntax into JDK 1.4 regex syntax. Author: James Clark Modified by Michael Kay (a) to integrate the code into Saxon, and (b) to support XPath additions to the XML Schema regex syntax.

This version of the regular expression translator treats each half of a surrogate pair as a separate character, translating anything in an XPath regex that can match a non-BMP character into a Java regex that matches the two halves of a surrogate pair independently. This approach doesn't work under JDK 1.5, whose regex engine treats a surrogate pair as a single character.

The same translator is currently used for Saxon on .NET 1.1

Nested Class Summary
static classJDK14RegexTranslator.BackReference
abstract static classJDK14RegexTranslator.CharClass
static classJDK14RegexTranslator.CharRange
static classJDK14RegexTranslator.Complement
static classJDK14RegexTranslator.Dot
static classJDK14RegexTranslator.Empty
static classJDK14RegexTranslator.Property
static classJDK14RegexTranslator.Range
abstract static classJDK14RegexTranslator.SimpleCharClass
static classJDK14RegexTranslator.SingleChar
static classJDK14RegexTranslator.Subtraction
static classJDK14RegexTranslator.Union
static classJDK14RegexTranslator.WideSingleChar
Field Summary
static intALL
static StringCATEGORY_NAMES
static int[][]CATEGORY_RANGES
static StringNMCHAR_CATEGORIES
static StringNMCHAR_EXCLUDE_RANGES
static StringNMCHAR_INCLUDES
static StringNMSTRT_CATEGORIES
static StringNMSTRT_EXCLUDE_RANGES
static StringNMSTRT_INCLUDES
static intNONE
static StringNOT_ALLOWED_CLASS
static intSOME
static StringSURROGATES1_CLASS
static StringSURROGATES2_CLASS
Constructor Summary
JDK14RegexTranslator()
Method Summary
intgetNumberOfCapturedGroups()
static voidmain(String[] args)
Stringtranslate(CharSequence regExp, boolean xpath)
Translates a regular expression in the syntax of XML Schemas Part 2 into a regular expression in the syntax of java.util.regex.Pattern.

Field Detail

ALL

static final int ALL

CATEGORY_NAMES

static final String CATEGORY_NAMES

CATEGORY_RANGES

static final int[][] CATEGORY_RANGES

NMCHAR_CATEGORIES

static final String NMCHAR_CATEGORIES

NMCHAR_EXCLUDE_RANGES

static final String NMCHAR_EXCLUDE_RANGES

NMCHAR_INCLUDES

static final String NMCHAR_INCLUDES

NMSTRT_CATEGORIES

static final String NMSTRT_CATEGORIES

NMSTRT_EXCLUDE_RANGES

static final String NMSTRT_EXCLUDE_RANGES

NMSTRT_INCLUDES

static final String NMSTRT_INCLUDES

NONE

static final int NONE

NOT_ALLOWED_CLASS

static final String NOT_ALLOWED_CLASS

SOME

static final int SOME

SURROGATES1_CLASS

static final String SURROGATES1_CLASS

SURROGATES2_CLASS

static final String SURROGATES2_CLASS

Constructor Detail

JDK14RegexTranslator

public JDK14RegexTranslator()

Method Detail

getNumberOfCapturedGroups

public int getNumberOfCapturedGroups()

main

public static void main(String[] args)

translate

public String translate(CharSequence regExp, boolean xpath)
Translates a regular expression in the syntax of XML Schemas Part 2 into a regular expression in the syntax of java.util.regex.Pattern. The translation assumes that the string to be matched against the regex uses surrogate pairs correctly. If the string comes from XML content, a conforming XML parser will automatically check this; if the string comes from elsewhere, it may be necessary to check surrogate usage before matching.

Parameters: regExp a String containing a regular expression in the syntax of XML Schemas Part 2 xpath a boolean indicating whether the XPath 2.0 F+O extensions to the schema regex syntax are permitted

Returns: a String containing a regular expression in the syntax of java.util.regex.Pattern

Throws: RegexSyntaxException if regexp is not a regular expression in the syntax of XML Schemas Part 2, or XPath 2.0, as appropriate

See Also: java.util.regex.Pattern XML Schema Part 2