The Axis of Eval: macros

Showing posts with label macros. Show all posts

Sunday, June 20, 2010

Refining Syntactic Sugar

Another great dissertation just came out, Ryan Culpepper's Refining Syntactic Sugar.

Over the past two decades, Scheme macros have evolved into a powerful API for the compiler front-end. Like Lisp macros, their predecessors, Scheme macros expand source programs into a small core language; unlike Lisp systems, Scheme macro expanders preserve lexical scoping, and advanced Scheme macro systems handle other important properties such as source location. Using such macros, Scheme programmers now routinely develop the ultimate abstraction: embedded domain-specific programming languages.

Unfortunately, a typical Scheme programming environment provides little support for macro development. The tools for understanding macro expansion are poor, which makes it difficult for experienced programmers to debug their macros and for novices to study the behavior of macros. At the same time, the language for specifying macros is limited in expressive power, and it fails to validate syntactic correctness of macro uses.

This dissertation presents tools for macro development that specifically address these two needs. The first is a stepping debugger specialized to the pragmatics of hygienic macros. The second is a system for writing macros and specifying syntax that automatically validates macro uses and reports syntax errors.

It describes two contributions: a debugger that's tightly integrated with macroexpansion, as well as a new expander called syntax-parse.

syntax-parse seems like it will become a standard facility for implementing macros. It lets you describe the shapes of the s-expression inputs of macros in a parser-like language. You can define new "syntax classes", that contain both information about the shape of a form, as well as additional, procedural checks.

One example of a syntax class would be the variables bound by a LET: they have to be identifier/expression pairs, and the identifiers must be unique. syntax-parse lets you describe these constraints succinctly, and facilitates useful error reporting (i.e. you don't get an error in terms of the expanded output of a macro, which would require understanding of a macro's implementation). This obviates the need for explicitly programmed checks.

syntax-parse seems like a big step forward for macro writing. The macro debugger is also impressive. Congratulations, Ryan!

Saturday, June 19, 2010

A Theory of Typed Hygienic Macros

Dave Herman's much-awaited dissertation is out!

Hygienic macro expansion is one of the crown jewels of Scheme, but to this day nobody understands just exactly what it is.

Put that in your pipe!

Unhygienic macro expansion identifies programs with their representation as trees. But because of variable scope, programs in fact have additional graph structure, with edges between variable bindings and their references. Since these edges are not explicitly represented in S-expressions, maintaining their integrity during macro expansion becomes the responsibility of programmers. Put differently, the tree representation of a program has unenforced representation invariants that are the responsibility of programmers to maintain.

A good explanation of why "code is more than data". (Though Pascal Costanza seems to have found a way to introduce hygiene in an unhygienic macro system.)

Though the motivations are clear enough, hygienic macro expansion has so far resisted a precise, formal specification. At the heart of the problem is identifying what is meant by “scope” in a language with extensible syntax. ...

The key insight of this dissertation is that by annotating all macro definitions with interfaces describing their grammar and binding structure, we can reason formally about the binding structure of Scheme programs, and without first macro-expanding. More technically, explicit annotations provide us with enough information to obtain a formal definition of α-equivalence of pre-expansion Scheme programs.

These binding specifications make use of tree addresses, a syntax for describing subtrees of a cons tree, analogous to Common Lisp's CADR, CADDR, CADAR, etc functions. The corresponding tree addresses would be AD, ADD, and ADA.

Furthermore, the specifications make use of attribute grammars. These allow synthesized attributes (passed from children upwards to their parents) and inherited attributes (passed from parents downward to their children). For example, the specification of LAMDBA would be:

(lambda xs:formals e:expr)
↪ e.imports = xs.exports :: ε

This means that the imports attribute (the bound variables) in E corresponds to the exports attribute (the introduced formals) of XS. (There are additional rules, which I'm not showing here.)

I've only read the first two chapters so far, but this work seems like a clear winner. Very readable and insightful. Congratulations, Dave!

More later.

Tuesday, May 4, 2010

Doing the Right Thing

As I've said before, the great thing about Lisp is it's age. Like a good wine, Lisp has had a long time to age.

I think one of the best examples of the ripeness is R6RS' expansion process. It's a very good specification of a quite complex process, namely the processing of a number of expressions by the compiler.

The complexity stems from the fact that Lisp is at its core a multi-phase language: source code contains runtime values, such as variables and functions, as well as compile-time values, such as macros.

In a Lisp source file, one expression may be for execution at runtime, while the next is for execution at compile-time.

The intermingling of these different phases in a single source file seems to be somewhat of a historical accident, as one day we'll all be programming in a hypercode environment, but well, that's how it is.

Anyway, if you're writing a Lisp, study R6RS' expansion process and enjoy! :)

Understanding Hygiene (part 1)

Because macroexpansion is performed before compilation, the compiler receives only core language expressions with known semantics (LAMBDA, DEFINE, SET, IF, ...).

What's important though is that the expressions received by the compiler contain a mixture of user- and macro-written code. For example, given the following hygienic OR macro:

(define-syntax or (form1 form2)
 `(let ((tmp ,form1))
    (if tmp
        tmp
        ,form2)))

The code

(or foo bar)

is translated to:

(let ((tmp foo))
 (if tmp
     tmp
      bar))

The blue parts are user-written, the white parts are macro-written.

The important thing is that parts with different colors do not interact.

So even if we use the variable name TMP, the code still works as expected because a white TMP doesn't interact with a blue TMP.

(or foo tmp)

is translated to:

(let ((tmp foo))
 (if tmp
     tmp
      tmp))

Without the colors, the code would always return the value of FOO.

This is one part of the equation: the macro-written white TMP doesn't shadow the user-written blue TMP.

The other part is that user-written code mustn't shadow macro-written variables. If you have a (stupid) macro:

(define-syntax my-list (&rest args)
 `(list ,@args))

and a user uses it thusly:

(let ((list 12))
 (my-list 1 2 3))

This expands to:

(let ((list 12))
  (list 1 2 3))

Again, this causes no problem, because different colors don't interact.

Everytime you create a piece of code with quasiquote, the system generates a new unique color.

Tuesday, April 20, 2010

Code is (more than) data

In Lisps like Common Lisp, code is data.

(In CL, code is represented as cons lists, which totally sucks, because you can't attach any metadata (such as line number information) to the code.)

One of the insights of the "Scheme community" is that code is more than data. They found this out by investigating hygiene.

In essence, code has some features that aren't easily captured by simply using s-expressions.

Namely, code has lexical structure. Code containing LET for example introduces new bindings, and if you simply render that as lists, you're losing important information.

If you want hygiene (hint: you want it), simply using s-expressions isn't enough. You need syntax objects.

More later.