antlr

Class CodeGenerator

Known Direct Subclasses:
CppCodeGenerator, CSharpCodeGenerator, DiagnosticCodeGenerator, DocBookCodeGenerator, HTMLCodeGenerator, JavaCodeGenerator, PythonCodeGenerator

public abstract class CodeGenerator
extends Object

A generic ANTLR code generator. All code generators Derive from this class.

A CodeGenerator knows about a Grammar data structure and a grammar analyzer. The Grammar is walked to generate the appropriate code for both a parser and lexer (if present). This interface may change slightly so that the lexer is itself living inside of a Grammar object (in which case, this class generates only one recognizer). The main method to call is gen(), which initiates all code gen.

The interaction of the code generator with the analyzer is simple: each subrule block calls deterministic() before generating code for the block. Method deterministic() sets lookahead caches in each Alternative object. Technically, a code generator doesn't need the grammar analyzer if all lookahead analysis is done at runtime, but this would result in a slower parser.

This class provides a set of support utilities to handle argument list parsing and so on.

Version:
2.00a

Author:
Terence Parr, John Lilley

See Also:
JavaCodeGenerator, DiagnosticCodeGenerator, LLkAnalyzer, Grammar, AlternativeElement, Lookahead

Field Summary

protected static int
BITSET_OPTIMIZE_INIT_THRESHOLD
If there are more than 8 long words to init in a bitset, try to optimize it; e.g., detect runs of -1L and 0L.
protected boolean
DEBUG_CODE_GENERATOR
Use option "codeGenDebug" to generate debugging output
protected static int
DEFAULT_BITSET_TEST_THRESHOLD
protected static int
DEFAULT_MAKE_SWITCH_THRESHOLD
Default values for code-generation thresholds
static String
TokenTypesFileExt
static String
TokenTypesFileSuffix
protected LLkGrammarAnalyzer
analyzer
The LLk analyzer
protected Tool
antlrTool
protected DefineGrammarSymbols
behavior
The grammar behavior
protected int
bitsetTestThreshold
This is a hint for the language-specific code generator.
protected Vector
bitsetsUsed
List of all bitsets that must be dumped.
protected CharFormatter
charFormatter
Object used to format characters in the target language.
protected PrintWriter
currentOutput
Current output Stream
protected Grammar
grammar
The grammar for which we generate code
protected int
makeSwitchThreshold
This is a hint for the language-specific code generator.
protected int
tabs
Current tab indentation for code output

Constructor Summary

CodeGenerator()
Construct code generator base class

Method Summary

protected void
_print(String s)
Output a String to the currentOutput stream.
protected void
_printAction(String s)
Print an action without leading tabs, attempting to preserve the current indentation level for multi-line actions Ignored if string is null.
protected void
_println(String s)
Output a String followed by newline, to the currentOutput stream.
static String
decodeLexerRuleName(String id)
static boolean
elementsAreRange(int[] elems)
Test if a set element array represents a contiguous range.
static String
encodeLexerRuleName(String id)
protected String
extractIdOfAction(String s, int line, int column)
Get the identifier portion of an argument-action.
protected String
extractIdOfAction(Token t)
Get the identifier portion of an argument-action token.
protected String
extractTypeOfAction(String s, int line, int column)
Get the type portion of an argument-action.
protected String
extractTypeOfAction(Token t)
Get the type string out of an argument-action token.
void
gen()
Generate the code for all grammars
void
gen(ActionElement action)
Generate code for the given grammar element.
void
gen(AlternativeBlock blk)
Generate code for the given grammar element.
void
gen(BlockEndElement end)
Generate code for the given grammar element.
void
gen(CharLiteralElement atom)
Generate code for the given grammar element.
void
gen(CharRangeElement r)
Generate code for the given grammar element.
void
gen(LexerGrammar g)
Generate the code for a parser
void
gen(OneOrMoreBlock blk)
Generate code for the given grammar element.
void
gen(ParserGrammar g)
Generate the code for a parser
void
gen(RuleRefElement rr)
Generate code for the given grammar element.
void
gen(StringLiteralElement atom)
Generate code for the given grammar element.
void
gen(TokenRangeElement r)
Generate code for the given grammar element.
void
gen(TokenRefElement atom)
Generate code for the given grammar element.
void
gen(TreeElement t)
Generate code for the given grammar element.
void
gen(TreeWalkerGrammar g)
Generate the code for a parser
void
gen(WildcardElement wc)
Generate code for the given grammar element.
void
gen(ZeroOrMoreBlock blk)
Generate code for the given grammar element.
protected void
genTokenInterchange(TokenManager tm)
Generate the token types as a text file for persistence across shared lexer/parser
String
getASTCreateString(GrammarAtom atom, String str)
Get a string for an expression to generate creating of an AST node
String
getASTCreateString(Vector v)
Get a string for an expression to generate creation of an AST subtree.
protected String
getBitsetName(int index)
Given the index of a bitset in the bitset list, generate a unique name.
String
getFIRSTBitSet(String ruleName, int k)
String
getFOLLOWBitSet(String ruleName, int k)
String
mapTreeId(String id, ActionTransInfo tInfo)
Map an identifier to it's corresponding tree-node variable.
protected int
markBitsetForGen(BitSet p)
Add a bitset to the list of bitsets to be generated.
protected void
print(String s)
Output tab indent followed by a String, to the currentOutput stream.
protected void
printAction(String s)
Print an action with leading tabs, attempting to preserve the current indentation level for multi-line actions Ignored if string is null.
protected void
printTabs()
Output the current tab indentation.
protected void
println(String s)
Output tab indent followed by a String followed by newline, to the currentOutput stream.
protected String
processActionForSpecialSymbols(String actionStr, int line, RuleBlock currentRule, ActionTransInfo tInfo)
Lexically process $ and # references within the action.
String
processStringForASTConstructor(String str)
Process a string for an simple expression for use in xx/action.g it is used to cast simple tokens/references to the right type for the generated language.
protected String
removeAssignmentFromDeclaration(String d)
Remove the assignment portion of a declaration, if any.
static String
reverseLexerRuleName(String id)
void
setAnalyzer(LLkGrammarAnalyzer analyzer_)
void
setBehavior(DefineGrammarSymbols behavior_)
protected void
setGrammar(Grammar g)
Set a grammar for the code generator to use
void
setTool(Tool tool)

Field Details

BITSET_OPTIMIZE_INIT_THRESHOLD

protected static final int BITSET_OPTIMIZE_INIT_THRESHOLD
If there are more than 8 long words to init in a bitset, try to optimize it; e.g., detect runs of -1L and 0L.

Field Value:
8


DEBUG_CODE_GENERATOR

protected boolean DEBUG_CODE_GENERATOR
Use option "codeGenDebug" to generate debugging output


DEFAULT_BITSET_TEST_THRESHOLD

protected static final int DEFAULT_BITSET_TEST_THRESHOLD

Field Value:
4


DEFAULT_MAKE_SWITCH_THRESHOLD

protected static final int DEFAULT_MAKE_SWITCH_THRESHOLD
Default values for code-generation thresholds

Field Value:
2


TokenTypesFileExt

public static String TokenTypesFileExt


TokenTypesFileSuffix

public static String TokenTypesFileSuffix


analyzer

protected LLkGrammarAnalyzer analyzer
The LLk analyzer


antlrTool

protected Tool antlrTool


behavior

protected DefineGrammarSymbols behavior
The grammar behavior


bitsetTestThreshold

protected int bitsetTestThreshold
This is a hint for the language-specific code generator. A bitset membership test will be generated instead of an ORed series of LA(k) comparisions for lookahead sets with degree greater than or equal to this value. This is modified by the grammar option "codeGenBitsetTestThreshold"


bitsetsUsed

protected Vector bitsetsUsed
List of all bitsets that must be dumped. These are Vectors of BitSet.


charFormatter

protected CharFormatter charFormatter
Object used to format characters in the target language. subclass must initialize this to the language-specific formatter


currentOutput

protected PrintWriter currentOutput
Current output Stream


grammar

protected Grammar grammar
The grammar for which we generate code


makeSwitchThreshold

protected int makeSwitchThreshold
This is a hint for the language-specific code generator. A switch() or language-specific equivalent will be generated instead of a series of if/else statements for blocks with number of alternates greater than or equal to this number of non-predicated LL(1) alternates. This is modified by the grammar option "codeGenMakeSwitchThreshold"


tabs

protected int tabs
Current tab indentation for code output

Constructor Details

CodeGenerator

public CodeGenerator()
Construct code generator base class

Method Details

_print

protected void _print(String s)
Output a String to the currentOutput stream. Ignored if string is null.

Parameters:
s - The string to output


_printAction

protected void _printAction(String s)
Print an action without leading tabs, attempting to preserve the current indentation level for multi-line actions Ignored if string is null.

Parameters:
s - The action string to output


_println

protected void _println(String s)
Output a String followed by newline, to the currentOutput stream. Ignored if string is null.

Parameters:
s - The string to output


decodeLexerRuleName

public static String decodeLexerRuleName(String id)


elementsAreRange

public static boolean elementsAreRange(int[] elems)
Test if a set element array represents a contiguous range.

Parameters:
elems - The array of elements representing the set, usually from BitSet.toArray().

Returns:
true if the elements are a contiguous range (with two or more).


encodeLexerRuleName

public static String encodeLexerRuleName(String id)


extractIdOfAction

protected String extractIdOfAction(String s,
                                   int line,
                                   int column)
Get the identifier portion of an argument-action. The ID of an action is assumed to be a trailing identifier. Specific code-generators may want to override this if the language has unusual declaration syntax.

Parameters:
s - The action text
line - Line used for error reporting.
column - Line used for error reporting.

Returns:
A string containing the text of the identifier


extractIdOfAction

protected String extractIdOfAction(Token t)
Get the identifier portion of an argument-action token. The ID of an action is assumed to be a trailing identifier. Specific code-generators may want to override this if the language has unusual declaration syntax.

Parameters:
t - The action token

Returns:
A string containing the text of the identifier


extractTypeOfAction

protected String extractTypeOfAction(String s,
                                     int line,
                                     int column)
Get the type portion of an argument-action. The type of an action is assumed to precede a trailing identifier Specific code-generators may want to override this if the language has unusual declaration syntax.

Parameters:
s - The action text
line - Line used for error reporting.

Returns:
A string containing the text of the type


extractTypeOfAction

protected String extractTypeOfAction(Token t)
Get the type string out of an argument-action token. The type of an action is assumed to precede a trailing identifier Specific code-generators may want to override this if the language has unusual declaration syntax.

Parameters:
t - The action token

Returns:
A string containing the text of the type


gen

public void gen()
Generate the code for all grammars


gen

public void gen(ActionElement action)
Generate code for the given grammar element.

Parameters:
action - The {...} action to generate


gen

public void gen(AlternativeBlock blk)
Generate code for the given grammar element.

Parameters:
blk - The "x|y|z|..." block to generate


gen

public void gen(BlockEndElement end)
Generate code for the given grammar element.

Parameters:
end - The block-end element to generate. Block-end elements are synthesized by the grammar parser to represent the end of a block.


gen

public void gen(CharLiteralElement atom)
Generate code for the given grammar element.

Parameters:
atom - The character literal reference to generate


gen

public void gen(CharRangeElement r)
Generate code for the given grammar element.

Parameters:
r - The character-range reference to generate


gen

public void gen(LexerGrammar g)
            throws IOException
Generate the code for a parser


gen

public void gen(OneOrMoreBlock blk)
Generate code for the given grammar element.

Parameters:
blk - The (...)+ block to generate


gen

public void gen(ParserGrammar g)
            throws IOException
Generate the code for a parser


gen

public void gen(RuleRefElement rr)
Generate code for the given grammar element.

Parameters:
rr - The rule-reference to generate


gen

public void gen(StringLiteralElement atom)
Generate code for the given grammar element.

Parameters:
atom - The string-literal reference to generate


gen

public void gen(TokenRangeElement r)
Generate code for the given grammar element.

Parameters:
r - The token-range reference to generate


gen

public void gen(TokenRefElement atom)
Generate code for the given grammar element.

Parameters:
atom - The token-reference to generate


gen

public void gen(TreeElement t)
Generate code for the given grammar element.

Parameters:


gen

public void gen(TreeWalkerGrammar g)
            throws IOException
Generate the code for a parser


gen

public void gen(WildcardElement wc)
Generate code for the given grammar element.

Parameters:
wc - The wildcard element to generate


gen

public void gen(ZeroOrMoreBlock blk)
Generate code for the given grammar element.

Parameters:
blk - The (...)* block to generate


genTokenInterchange

protected void genTokenInterchange(TokenManager tm)
            throws IOException
Generate the token types as a text file for persistence across shared lexer/parser


getASTCreateString

public String getASTCreateString(GrammarAtom atom,
                                 String str)
Get a string for an expression to generate creating of an AST node

Parameters:
str - The text of the arguments to the AST construction


getASTCreateString

public String getASTCreateString(Vector v)
Get a string for an expression to generate creation of an AST subtree.

Parameters:
v - A Vector of String, where each element is an expression in the target language yielding an AST node.


getBitsetName

protected String getBitsetName(int index)
Given the index of a bitset in the bitset list, generate a unique name. Specific code-generators may want to override this if the language does not allow '_' or numerals in identifiers.

Parameters:
index - The index of the bitset in the bitset list.


getFIRSTBitSet

public String getFIRSTBitSet(String ruleName,
                             int k)


getFOLLOWBitSet

public String getFOLLOWBitSet(String ruleName,
                              int k)


mapTreeId

public String mapTreeId(String id,
                        ActionTransInfo tInfo)
Map an identifier to it's corresponding tree-node variable. This is context-sensitive, depending on the rule and alternative being generated

Parameters:
id - The identifier name to map

Returns:
The mapped id (which may be the same as the input), or null if the mapping is invalid due to duplicates


markBitsetForGen

protected int markBitsetForGen(BitSet p)
Add a bitset to the list of bitsets to be generated. if the bitset is already in the list, ignore the request. Always adds the bitset to the end of the list, so the caller can rely on the position of bitsets in the list. The returned position can be used to format the bitset name, since it is invariant.

Parameters:
p - Bit set to mark for code generation

Returns:
The position of the bitset in the list.


print

protected void print(String s)
Output tab indent followed by a String, to the currentOutput stream. Ignored if string is null.

Parameters:
s - The string to output.


printAction

protected void printAction(String s)
Print an action with leading tabs, attempting to preserve the current indentation level for multi-line actions Ignored if string is null.

Parameters:
s - The action string to output


printTabs

protected void printTabs()
Output the current tab indentation. This outputs the number of tabs indicated by the "tabs" variable to the currentOutput stream.


println

protected void println(String s)
Output tab indent followed by a String followed by newline, to the currentOutput stream. Ignored if string is null.

Parameters:
s - The string to output


processActionForSpecialSymbols

protected String processActionForSpecialSymbols(String actionStr,
                                                int line,
                                                RuleBlock currentRule,
                                                ActionTransInfo tInfo)
Lexically process $ and # references within the action. This will replace #id and #(...) with the appropriate function calls and/or variables etc...


processStringForASTConstructor

public String processStringForASTConstructor(String str)
Process a string for an simple expression for use in xx/action.g it is used to cast simple tokens/references to the right type for the generated language.

Parameters:
str - A String.


removeAssignmentFromDeclaration

protected String removeAssignmentFromDeclaration(String d)
Remove the assignment portion of a declaration, if any.

Parameters:
d - the declaration

Returns:
the declaration without any assignment portion


reverseLexerRuleName

public static String reverseLexerRuleName(String id)


setAnalyzer

public void setAnalyzer(LLkGrammarAnalyzer analyzer_)


setBehavior

public void setBehavior(DefineGrammarSymbols behavior_)


setGrammar

protected void setGrammar(Grammar g)
Set a grammar for the code generator to use


setTool

public void setTool(Tool tool)