The Axis of Eval: hygiene

Showing posts with label hygiene. Show all posts

Saturday, May 14, 2011

Secret toplevel

A somewhat arcane aspect of hygienic macro systems is the notion that macro-introduced toplevel identifiers are secret.

An example of this in EdgeLisp:

The macro FOO expands to code that defines a variable X using DEFVAR, and prints its value:

(defmacro foo () #'(progn (defvar x 1) (print x)))

(#' is EdgeLisp's code quotation operator.)

Using the macro has the expected effect:

(foo)

1

But X isn't actually a global variable now:

x

; Condition: The variable x is unbound.

; Restarts:

; 1: #[handler [use-value]]

; 2: #[handler [retry-repl-request]]

; 3: #[handler [abort]]

The name X only makes sense inside the expansion of (foo). Outside, it gets transparently renamed by the hygienic macro system.

(In EdgeLisp, a UUID is attached as color (or hygiene context) to the identifier.)

Why the secrecy of toplevel variables? Well, it's simply an extension of the notion that identifiers that introduce new variable bindings (such as those in a LET) need to be treated specially to prevent unhygienic name clashes between macro- and user-written code. This secrecy completely frees macro writers from having to care about the identifiers they choose.

(Discussion of this topic on the R7RS list: Are generated toplevel definitions secret?)

Monday, March 7, 2011

Understanding Hygiene (part 2) -- SRFI 72

My critically acclaimed Understanding Hygiene (part 1) contains the absolute baseline necessary for understanding hygienic macros.

This post is about my understanding of André van Tonder's SRFI 72 - a specific way of implementing hygiene, which differs from more widely used hygienic systems such as the venerable syntax-case.

It must be said that hygiene is still an area of active research. There be dragons. Nevertheless, van Tonder is quite confident of his design, going so far as calling it "improved hygiene". David Herman, who wrote the ~~book~~dissertation on the theory of hygienic macros is skeptical. Anton van Straaten makes some mediatory points. Matthew Flatt raises even more mind-bending questions.

That said, for a hygiene noob like me, SRFI 72 seems to be an elegant and workable design. I wrote a half-assed Lisp->C compiler that contains a SRFI 72 inspired macro system (here's a test from SRFI 72 in my Lisp that works -- phew) and the experience was quite pleasurable. Of the couple thousand lines of C in the system, only a couple dozen or so deal with hygiene.

Basically, what SRFI 72 says is this: when you create a piece of code using the code constructors syntax or quasisyntax, e.g.

(syntax (let ((x 1)) (foo x)))

the identifiers in there (let, x, foo) get a fresh hygiene context (or "color"). Thus they won't conflict with identifiers of another color -- such as those created by somebody else (e.g. another macro, or the user's source). That was the easy part.

Additionally, SRFI 72 makes such pieces of code obey a discipline that is similar to lexical scoping. That is the meaning of 72's money quote that van Tonder repeats multiple times, in the hope of hammering his point home:

the evaluation of any nested, unquoted syntax or quasisyntax forms counts as part of the evaluation of an enclosing quasisyntax.

Say we have some piece of silly code, like:

(quasisyntax (bla bla bla ,(some-unquoted-stuff (lambda () (quasisyntax (bla bla bla))))))

The outer quasisyntax constructs/quotes the piece of code. Then the "," unquotes and we call some contrived function that happens to take a lambda as argument. Then, inside the lambda there's another quasisyntax, so we're quoted again. The contrived unquoted stuff and the lambda don't matter. What matters is that there's an inner quasisyntax lexically embedded in the outer quasisyntax.

Now, SRFI 72's money quote tells us that the inner quasisyntax's bla's have the same color as the outer quasisyntax's bla's. Why? Because the inner quasisyntax is lexically nested in the outer quasisyntax, and like a lexical variable binding, the outer quasisyntax's color is "inherited" down to the inner quasisyntax.

This took me a looooooooooooooooooong while to figure out, and I hope that this post will help those trying to understand SRFI 72.

Questions?

Saturday, June 19, 2010

A Theory of Typed Hygienic Macros

Dave Herman's much-awaited dissertation is out!

Hygienic macro expansion is one of the crown jewels of Scheme, but to this day nobody understands just exactly what it is.

Put that in your pipe!

Unhygienic macro expansion identifies programs with their representation as trees. But because of variable scope, programs in fact have additional graph structure, with edges between variable bindings and their references. Since these edges are not explicitly represented in S-expressions, maintaining their integrity during macro expansion becomes the responsibility of programmers. Put differently, the tree representation of a program has unenforced representation invariants that are the responsibility of programmers to maintain.

A good explanation of why "code is more than data". (Though Pascal Costanza seems to have found a way to introduce hygiene in an unhygienic macro system.)

Though the motivations are clear enough, hygienic macro expansion has so far resisted a precise, formal specification. At the heart of the problem is identifying what is meant by “scope” in a language with extensible syntax. ...

The key insight of this dissertation is that by annotating all macro definitions with interfaces describing their grammar and binding structure, we can reason formally about the binding structure of Scheme programs, and without first macro-expanding. More technically, explicit annotations provide us with enough information to obtain a formal definition of α-equivalence of pre-expansion Scheme programs.

These binding specifications make use of tree addresses, a syntax for describing subtrees of a cons tree, analogous to Common Lisp's CADR, CADDR, CADAR, etc functions. The corresponding tree addresses would be AD, ADD, and ADA.

Furthermore, the specifications make use of attribute grammars. These allow synthesized attributes (passed from children upwards to their parents) and inherited attributes (passed from parents downward to their children). For example, the specification of LAMDBA would be:

(lambda xs:formals e:expr)
↪ e.imports = xs.exports :: ε

This means that the imports attribute (the bound variables) in E corresponds to the exports attribute (the introduced formals) of XS. (There are additional rules, which I'm not showing here.)

I've only read the first two chapters so far, but this work seems like a clear winner. Very readable and insightful. Congratulations, Dave!

More later.

Sunday, May 9, 2010

Ell Kernel Language

I'm nearing completion of the kernel language of my new Lisp, the Executable and Linkable Lisp, ell(1).

It's basically what I want from a modern Lisp, as simple as possible, but not simpler. The language will be translated to straightforward C (or C++, don't know yet), and enable true Lisp dynamicity and very deep Unix integration.

The major goal of the language is to make programming delightful. Of the "right" kinds of programs that is, since a language can't be delightful for everything.

Dave Moon's Programming Language for Old Timers also proves a great inspiration again. Am I an old-timer already? Dunno.

Specifically, Moon gives a no-nonsense description of hygiene, something which the Schemers have slight trouble with, as they seem to be a bit high on their own kleverness.

Tuesday, May 4, 2010

Understanding Hygiene (part 1)

Because macroexpansion is performed before compilation, the compiler receives only core language expressions with known semantics (LAMBDA, DEFINE, SET, IF, ...).

What's important though is that the expressions received by the compiler contain a mixture of user- and macro-written code. For example, given the following hygienic OR macro:

(define-syntax or (form1 form2)
 `(let ((tmp ,form1))
    (if tmp
        tmp
        ,form2)))

The code

(or foo bar)

is translated to:

(let ((tmp foo))
 (if tmp
     tmp
      bar))

The blue parts are user-written, the white parts are macro-written.

The important thing is that parts with different colors do not interact.

So even if we use the variable name TMP, the code still works as expected because a white TMP doesn't interact with a blue TMP.

(or foo tmp)

is translated to:

(let ((tmp foo))
 (if tmp
     tmp
      tmp))

Without the colors, the code would always return the value of FOO.

This is one part of the equation: the macro-written white TMP doesn't shadow the user-written blue TMP.

The other part is that user-written code mustn't shadow macro-written variables. If you have a (stupid) macro:

(define-syntax my-list (&rest args)
 `(list ,@args))

and a user uses it thusly:

(let ((list 12))
 (my-list 1 2 3))

This expands to:

(let ((list 12))
  (list 1 2 3))

Again, this causes no problem, because different colors don't interact.

Everytime you create a piece of code with quasiquote, the system generates a new unique color.

Tuesday, April 20, 2010

Code is (more than) data

In Lisps like Common Lisp, code is data.

(In CL, code is represented as cons lists, which totally sucks, because you can't attach any metadata (such as line number information) to the code.)

One of the insights of the "Scheme community" is that code is more than data. They found this out by investigating hygiene.

In essence, code has some features that aren't easily captured by simply using s-expressions.

Namely, code has lexical structure. Code containing LET for example introduces new bindings, and if you simply render that as lists, you're losing important information.

If you want hygiene (hint: you want it), simply using s-expressions isn't enough. You need syntax objects.

More later.