|
You can think about macros as of a system of compile-time transformations and automatic generation of code with regard to some rules. It can be used either to automate manipulations performed on similar data-types and fragments of code or to add syntax shortcuts to the language, optimize and make some computations safer by moving them from runtime to compile-time.
The idea of making simple inline operations on the code comes from preprocessor macros, which many languages (especially C, C++) have contained since early times of compiler design. We are following them in the direction of much more powerful, and at the same time more secure (type-safe) solutions like Haskell Template Meta-programming.
For the most appealing examples of macro use see this page. It contains a description of how we incorporate methodologies as Design By Contract, compile-time verification of SQL statements and Aspects-Oriented Programming into Nemerle.
Basically every macro is a function, which takes a fragment of code as parameter(s) and returns some other code. On the highest level of abstraction it doesn't matter if parameters are function calls, type definitions or just a sequence of assignments. The most important fact is that they are not common objects (e.g. instances of some types, like integer numbers), but their internal representation in the compiler (i.e. syntax trees).
A macro is defined in the program just like any other function, using common Nemerle syntax. The only difference is the structure of the data it operates on and the way how it is used (executed at compile-time).
A macro, once created, can be used to process some parts of the code. It's done by calling it with block(s) of code as parameter(s). This operation is in most cases indistinguishable from a common function call (like f(1)), so a programmer using a macro would not be confused by unknown syntax. The main concept of our design is to make the usage of macros as transparent as possible. From the user point of view, it is not important if particular parameters are passed to a function, which would process them at the compile-time and insert some new code in their place, or to an ordinary one.
Writing the macro is as simple as writing the common function. It looks the same except it is preceded by a keyword macro. This will make the compiler know about how to use defined method (i.e. run it at the compile-time in every place where it is used).
Macros can take zero (if we just want to generate new code) or more parameters. They are all elements of the language grammar, so their type is limited to the set of defined syntax objects. The same holds for a return value of a macro.
Example:
macro generate_expression () { compute_some_expression (); } |
This example macro does not take any parameters and is used in the code by simply writing generate_expression ();. The most important is the difference between generate_expression and compute_some_expression - the first one is a function executed by the compiler during compilation, while the latter is just some common function that must return syntax tree of expression (which is here returned and inserted into program code by generate_expression).
In order to create and use a macro you have to write a library, which will contain its executable form. You simply create a new file mymacro.n, which can contain for example
macro m () { Nemerle.IO.printf ("compile-time\n"); <[ Nemerle.IO.printf ("run-time\n") ]> } |
and compile it with command
ncc -r Nemerle.Compiler.dll -tdll mymacro.n -o mymacro.dll |
The library Nemerle.Compiler.dll will be probably loaded automatically in future releases.
Now you can use m() in any program, like here
module M { public Main () : void { m (); } } |
You must add a reference to mymacro.dll during compilation of this program. It might look like
ncc -r mymacro.dll myprog.n -o myprog.exe |
Write a macro, which, when used, should slow down the compilation by 5 seconds (use System.Timers namespace) and print the version of the operating system used to the compile program (use System.Environment namespace).
Definition of function compute_some_expression might look like:
compute_some_expression () : Expr { if (debug_on) <[ System.Console.WriteLine ("Hello, I'm debug message") ]> else <[ () ]> } |
The examples above show a macro, which conditionally inlines expression printing a message. It's not quite useful yet, but it has introduced the meaning of compile-time computations and also some new syntax used only in writing macros and functions operating on syntax trees. We have written here the <[ ... ]> constructor to build a syntax tree of expression (e.g. '()').
<[ ... ]> is used to both construction and decomposition of syntax trees. Those operations are similar to quotation of code. Simply, everything which is written inside <[ ... ]>, corresponds to its own syntax tree. It can be any valid Nemerle code, so a programmer does not have to learn internal representation of syntax trees in the compiler.
macro print_date (at_compile_time) { match (at_compile_time) { | <[ true ]> => print_compilation_time () | _ => <[ WriteLine (DateTime.Now.ToString ()) ]> } } |
The quotation alone allows using only constant expressions, which is insufficient for most tasks. For example, to write function print_compilation_time we must be able to create an expression based on a value known at the compile-time. In next sections we introduce the rest of macros' syntax to operate on general syntax trees.
When we want to decompose some large code (or more precisely, its syntax tree), we must bind its smaller parts to variables. Then we can process them recursively or just use them in an arbitrary way to construct the result.
We can operate on entire subexpressions by writing $( ... ) or $ID inside the quotation operator <[ ... ]>. This means binding the value of ID or the interior of parenthesized expression to the part of syntax tree described by corresponding quotation.
macro for (init, cond, change, body) { <[ $init; def loop () : void { if ($cond) { $body; $change; loop() } else () }; loop () ]> } |
The above macro defines function for, which is similar to the loop known from C. It can be used like this
for (mutable i = 0; i < 10; ++i, printf ("%d", i)) |
Later we show how to extend the language syntax to make the syntax of for exactly as in C.
Sometimes quoted expressions have literals inside of them (like strings, integers, etc.) and we want to operate on their value, not on their syntax trees. It is possible, because they are constant expressions and their runtime value is known at the compile-time.
Let's consider the previously used function print_compilation_time.
print_compilation_time () : Expr { <[ System.Console.WriteLine ($(DateTime.Now.ToString () : string)) ]> } |
Here we see some new extension of splicing syntax where we create a syntax tree of string literal from a known value. It is done by adding : string inside the $(...) construct. One can think about it as of enforcing the type of spliced expression to a literal (similar to common Nemerle type enforcement), but in the matter of fact something more is happening here - a real value is lifted to its representation as syntax tree of a literal.
Other types of literals (int, bool, float, char) are treated the same. This notation can be used also in pattern matching. We can match constant values in expressions this way.
There is also a similar schema for splicing and matching variables of a given name. $(v : name) denotes a variable, whose name is contained by object v (of special type Name). There are some good reasons for encapsulating a real identifier within this object.
You might have noticed, that Nemerle has a few grammar elements, which are composed of a list of subexpressions. For example, a sequence of expressions enclosed with { .. } braces may contain zero or more elements.
When splicing values of some expressions, we would like to decompose or compose such constructs in a general way - i.e. obtain all expressions in a given sequence. It is natural to think about them as if a list of expressions and to bind this list to some variable in meta-language. It is done with special syntax ..:
mutable exps = [ <[ printf ("%d ", x) ]>, <[ printf ("%d ", y) ]> ]; exps = <[ def x = 1 ]> :: <[ def y = 2 ]> :: exps; <[ {.. $exps } ]> |
We used { .. $exps } here to create the sequence of expressions from list exps : list<Expr>;. A similar syntax is used to splice the content of tuples (( .. $elist )) and other constructs, like array []:
macro castedarray (e) { match (e) { | <[ array [.. $elements ] ]> => def casted = List.Map (elements, fun (x) {<[ ($x :> object) ]>}); <[ array [.. $casted] ]> | _ => e } |
If the exact number of expressions in tuple/sequence is known during writing the quotation, then it can be expressed with
<[ $e_1; $e_2; $e_3; x = 2; f () ]> |
The .. syntax is used when there are e_i : Expr for 1 <= i <= n.
Write a macro rotate, which takes two parameters: a pair of floating point numbers (describing a point in 2D space) and an angle (in radians). The macro should return a new pair -- a point rotated by the given angle. The macro should use as much information as is available at the compile-time, e.g. if all numbers supplied are constant, then only the final result should be inlined, otherwise the result must be computed at runtime.
After we have written the for macro, we would like the compiler to understand some changes to its syntax. Especially the C-like notation
for (mutable i = 0; i < n; --i) { sum += i; Nemerle.IO.printf ("%d\n", sum); } |
In order to achieve that, we have to define which tokens and grammar elements may form a call of for macro. We do that by changing its header to
macro for (init, cond, change, body) syntax ("for", "(", init, ";", cond, ";", change, ")", body) |
The syntax keyword is used here to define a list of elements forming the syntax of the macro call. The first token must always be an unique identifier (from now on it is treated as a special keyword triggering parsing of defined sequence). It is followed by tokens composed of operators or identifiers passed as string literals or names of parameters of macro. Each parameter must occur exactly once.
Parsing of syntax rule is straightforward - tokens from input program must match those from definition, parameters are parsed according to their type. Default type of a parameter is Expr, which is just an ordinary expression (consult Nemerle grammar in reference.html). All allowed parameter types will be described in the extended version of reference manual corresponding to macros.
Add a new syntactic construct forpermutation to your program. It should be defined as the macro
macro forp (i, n : int, m : int, body) |
and introduce syntax, which allows writing the following program
mutable i = 0; forpermutation (i in 3 to 10) Nemerle.IO.printf ("%d\n", i) |
It should create a random permutation p of numbers x_j, m <= x_j <= n at the compile-time. Then generate the code executing body of the loop n - m + 1 times, preceding each of them with assignment of permutation element to i.
Nemerle macros are simply plugins to the compiler. We decided not to restrict them only to operations on expressions, but allow them to transform almost any part of program.
Macros can be used within custom attributes written near methods, type declarations, method parameters, fields, etc. They are executed with those entities passed as their parameters.
As an example, let us take a look at Serializable macro. Its usage looks like this:
[Serializable] class S { public this (v : int, m : S) { a = v; my = m; } my : S; a : int; } |
From now on, S has additional method Serialize and it implements interface ISerializable. We can use it in our code like this
def s = S (4, S (5, null)); s.Serialize (); |
And the output is
<a>4</a> <my> <a>5</a> <my> <null/> </my> </my> |
The macro modifies type S at compile-time and adds some code to it. Also inheritance relation of given class is changed, by making it implement interface ISerializable
public interface ISerializable { Serialize () : void; } |
In general, macros placed in attributes can do many transformations and analysis of program objects passed to them. To see Serializable macro's internals and discuss some design issues, let's go into its code.
[MacroUsage (BeforeInheritance, MacroTargets.Class)] macro Serializable (t : TypeBuilder) { t.AddImplementedInterface (<[ type: ISerializable ]>) } |
First we have to add interface, which given type is about to implement. But more important thing is the phase modifier BeforeInheritance in macro's custom attribute. In general, we separate three stages of execution for attribute macros. They are:
(They are still under design)
So, we have added interface to our type, we now have to create Serialize () method.
[MacroUsage (WithTypedMembers, MacroTargets.Class)] macro Serializable (t : TypeBuilder) { /// here we list its fields and choose only those, which are not derived /// or static def fields = t.GetFields (BindingFlags.Instance %| BindingFlags.Public %| BindingFlags.NonPublic %| BindingFlags.DeclaredOnly); /// now create list of expressions which will print object's data mutable serializers = []; /// traverse through fields, taking their type constructors foreach (x : IField in fields) { def tc = Tyutil.GetTypeTycon (x.GetMemType ()); def nm = Macros.UseSiteSymbol (x.Name); if (tc != null) if (tc.IsValueType) /// we can safely print value types as strings serializers = <[ printf ("<%s>", $(x.Name : string)); System.Console.Write ($(nm : name)); printf ("</%s>\n", $(x.Name : string)); ]> :: serializers else /// we can try to check, if type of given field also implements ISerializable if (Tyutil.subtypes (x.GetMemType (), <[ ttype: ISerializable ]>)) serializers = <[ printf ("<%s>\n", $(x.Name : string)); if ($(nm : name) != null) $(nm : name).Serialize () else printf ("<null/>\n"); printf ("</%s>\n", $(x.Name : string)); ]> :: serializers else /// and finally, we encounter case when there is no easy way to serialize /// given field Message.fatal_error ("field `" + x.Name + "' cannot be serialized") else Message.fatal_error ("field `" + x.Name + "' cannot be serialized") }; // after analyzing fields, we create method in our type, to execute created // expressions t.Define (<[ decl: public Serialize () : void implements ISerializable.Serialize { .. $serializers } ]>) } |
Identifiers in quoted code (object code) must be treated in a special way, because we usually do not know in which scope they would appear. Especially they should not mix with variables with the same names from the macro-use site.
Consider the following macro defining a local function f
macro identity (e) { <[ def f (x) { x }; f($e) ]> } |
Calling it with identity (f(1)) might generate confusing code like
def f (x) { x }; f (f (1)) |
To preserve names capture, all macro generated variables should be renamed to their unique counterparts, like in
def f_42 (x_43) { x_43 }; f_42 (f (1)) |
The idea of separating variables introduced by a macro from those defined in the plain code (or other macros) is called ``hygiene'' after Lisp and Scheme languages. In Nemerle we define it as putting identifiers created during a single macro execution into a unique namespace. Variables from different namespaces cannot bind to each other.
In other words, a macro cannot create identifiers capturing any external variables or visible outside of its own generated code. This means, that there is no need to care about locally used names.
The Hygiene is obtained by encapsulating identifiers in special Name class. The compiler uses it to distinguish names from different macro executions and scopes (for details of implementation consult metaprogramming.pdf). Variables with appropriate information are created automatically by quotation.
def definition = <[ def y = 4 ]>; <[ def x = 5; $definition; x + y ]> |
When a macro creates the above code, identifiers y and x are tagged with the same unique mark. Now they cannot be captured by any external variables (with a different mark). We operate on the Name class, when the quoted code is composed or decomposed and we use <[ $(x : name) ]> construct. Here x is bound to am object of type Name, which we can use in other place to create exactly the same identifier.
An identifier can be also created by calling method Macros.NewSymbol(), which returns Name with an unique identifier, tagged with a current mark.
def x = Macros.NewSymbol (); <[ def $(x : name) = 5; $(x : name) + 4 ]> |
Sometimes it is useful to generate identifiers, which bind to variables visible in place where a macro is used.