The Axis of Eval

Friday, July 30, 2010

So true

Great response to Ask HN: Do you use more than one programming language?

Not only do I use multiple languages professionally, I don't know some of them. – anamax

Monday, July 26, 2010

Concurrency's Shysters

Excellent article by Bryan Cantrill, Concurrency’s Shysters:

[C]oncurrency is still being used to instill panic in the uninformed. This time, it is chip-level multiprocessing (CMP) instead of SMP that promises to be the End of Days — and the shysters have taken a new guise in the form of transactional memory. The proponents of this new magic tonic are in some ways darker than their forebears: it is no longer enough to warn of Judgement Day — they must also conjure up notions of Original Sin to motivate their perverted salvation. “The heart of the problem is, perhaps, that no one really knows how to organize and maintain large systems that rely on locking” admonished Nir Shavit recently in CACM. (Which gives rise to the natural follow-up question: is the Solaris kernel not large, does it not rely on locking or do we not know how to organize and maintain it? Or is that we do not exist at all?)

I know much too little about the subject to have an informed opinion, but I think countering hype in all its forms it important.

Sunday, July 25, 2010

Ian Lance Taylor is working on split stacks. This will permit the stacks of gccgo goroutines, which are mapped to native threads, to start small and grow as needed. This will of course be usable from C as well, and should be welcome news for the millions-of-threads!!!1! faction.

Tom Tromey is working on making GCC an incremental compiler: GCC will run as a server and maintain a model of the user's program. When a translation unit is recompiled, GCC will re-compile the minimum necessary. One of the goals is to make GCC a backend for IDEs that do stuff like autocompletion based on the program model (Interview, paper from GCC Summit).

People from Intel and others are working on transactional memory. Sections of code can be marked as atomic, and their temporary changes will be saved in thread-local storage. The semantics will make it appear as if transactions were protected by a single global lock.

It's a great time to be writing a Lisp->C compiler! :)

Efficient Method Dispatch in PCL

Efficient Method Dispatch in PCL by Gregor J. Kiczales and Luis H. Rodriguez Jr. is an all-out awesome yet compact paper about generic functions in their portable CLOS implementation.

Some salient points:

They add a level of indirection between objects and their classes, called wrappers. An object points to a wrapper and that points to the class of the object. Inside each wrapper they store a random value for use by the memoization tables of generic functions.

Each generic function has a small memoization table that maps the hashed random value of a wrapper to the program counter (PC) of the corresponding method defined for the wrapper. The memoization tables use a simple random % memotablesize computation, and linear scanning after that.

How they handle class redefinition is swell: they simply set the wrapper's random value to zero, thereby invalidating the wrapper. They always leave the first entry of generic functions' memoization tables empty, and because an invalidated wrapper's random value will now hash to that empty entry, they only need to check for an invalid wrapper after a memoization miss occurs. Genius.

Generic functions run through a kind of state machine, depending on how they're used. (Similar to how polymorphic inline caches switch between mono-, poly-, and mega-morphic states, but with additional states tuned to CLOS semantics.) For example, if a generic function is only ever used with two different classes it can use a simple IF to test for these two classes. Here's another kicker: if a generic function's state machine detects that it's used as a slot accessor, it can directly store the slot offset instead of the method's PC in the memoization table. AWESOME.

Thursday, July 22, 2010

Piet

I think the use of color in syntax is very interesting. I first heard of this idea in ColorForth.

Sorting in ColorForth.

Piet is awesome and takes this to a whole new level! Here are some sample programs:

Prints "Piet".

A random number generator.

Tuesday, July 20, 2010

Proof that parsing is evil

total    blank   lines w/   nb, nc    semi- preproc. file
lines    lines   comments   lines   colons  direct.
--------+--------+--------+--------+--------+--------+----
1339      182      242     1072      666       25 lua-5.1.4/src/lparser.c

As DJB sez, don't parse.

This is getting silly

Guess what's a goal for C# 5.0?

Tadaaaaaaaaaaaaaa....

Compiler as a service. Which is code for eval. Anders Hejlsberg shows it off in the great The Future of C# at 59:30.

Unbeknownst until just now to your correspondent, the JVM actually has a similar facility, but as Earl says, it's over-engineered to the brink of spontaneous self-combustion. (Love that quote.)

(I don't want to diss Java and the JVM too much, btw. I think it's a great platform for when you need to get something done. It's just not inspiring.)

CaaS is great on so many levels! Lispers will be able to claim another first on a vital technology. No future languages will be REPL-less.

My prediction for C# 6.0: Hygienic macros, quasisyntax, and phase separation. Seriously.

Monday, July 19, 2010

On Understanding Types, Data Abstraction, and Polymorphism

Cardelli and Wegner's On Understanding Types, Data Abstraction, and Polymorphism from 1985 is a treat. No wonder it has more than 2000 citations on Google Scholar.

It starts off with a simple functional language, and then adds universal and existential quantification. Universal quantification (∀) brings parametric polymorphism ("length returns the length of any list, whatever the type of its contents are"). Existential quantification (∃) lets you hide the details of a data type ("there exists some data type that implements a stack, but I'm not telling you more"). All of this is described in terms of a set-based description of types, which clears up a lot of issues.

It seems that this paper is the granddaddy of papers about polymorphism. It's expository style is unmatched.

P.S. I discovered this paper while skimming Adrian Moors' thesis Type Constructor Polymorphism for Scala: Theory and Practice. Its final chapter gives a great overview of where type systems are going.

Thursday, July 15, 2010

What's phase separation and when do you want it? (Part 1)

One thing I've learned, now that I'm working on my third attempt at implementing an Acceptable Lisp™ (after an abandoned project years ago, and the JavaScript-based, also abandoned, but at least somewhat interesting and remotely complete CyberLisp) is that there's probably no way to understand Lisp, unless you implement one. (So I don't expect you to understand any of the following wisecracks, unless and until you've written your own Lisp, or spent some quality time thinking this whale of an issue through. It's like the masters of old said: you can only explain something the listener already knows. Or as Perlis said, you can't communicate complexity, only an awareness of it.)

My tip: if you want some Serious Insight™, write a compiler and follow the footsteps of modern Lisp compilation managers such as PLT Scheme's or XCVB. What's the difference between those and the rest of the pack?

Phase separation.

That probably doesn't ring a bell with you unless a) you've followed Scheme recently, or b) are writing your own Scheme, or c) are simply an all-around wizard. All of which are good for you!

Let me explain... Lisp, because it has macros that are written in a Turing-complete language (Lisp), is fundamentally a multi-phase language. We could also call it multi-time. Why? Because there's your ordinary runtime runtime, but before that happens, there's also a compile-time runtime. The right-hand sides of macro definitions run in the compile-time runtime. (I'll just call the runtime runtime runtime and the compile-time runtime compile-time from now on. (Yeah I had to do that thing with the three "runtime"s in a row there.))

This is another concept, like hygiene, that is only now (or just recently) becoming understood by the peoples. Where by the peoples I mean your correspondent. (The concept has been debated at length on the wonderful r6rs-discuss list (sorry, I'm too lazy to dig up the threads). It's been discussed almost more than case sensitivity, which should give you an indication of how novel and controversial the topic is.) This is deep stuff. At first. And later, too.

Compile-Time

One fundamental feature of Lisp is that you get to extend the compile-time with your own code, macros. (Now, Tom Lord would say that's totally the wrong way to do it, but hey, that's the way it's been done the last couple decades, so I'm covering that.)

There's a bit of sneakiness in how macro calls look like function calls in Lisp when they're a totally different thing, and there are languages that make a syntactic difference between the two, but Lisp doesn't, and for the better.

The first fundamental thing to understand is that macros don't exist at runtime.

When you do a (defmacro foo () ...) in a Lisp with phase separation, there will be no object called FOO in the runtime. But where is it? It's in the compile-time.

Yeaaah.

I know, it's a seriously weird concept, but it's absolutely fundamental to grasp this: macros are compile-time entities, they don't exist at runtime.

So when and where is this compile-time? The when is easy: when you compile the file. (Of course it's not that easy, but let's leave it at that for now.)

The where is more tricky. A Lisp source file contains two kinds of stuff that are fundamentally different: macro definitions exist at compile-time, while the other stuff such as global variables and functions (obviously) exist at runtime. (That a source file describes multiple runtimes, the runtime runtime and the compile-time runtime, is one of those subtle, small things that add up to make Lisp the powerhouse that it is.)

That's where XCVB-and-SBCL (and PLT Scheme, and probably Chez, and probably others) come in with a shattering device: the CFASL.

CFASLs

Let's take a look at non-C, plain FASLs first.

A FASL (fast-load file) is the output that a Lisp compiler produces from a .lisp input file. If your compiler compiles to bytecode, the FASL contains bytecode. If your compiler compiles to machine code, the FASL contains machine code. If your compiler compiles to JavaScript, the FASL contains JavaScript. FASLs are fundamentally implementation-specific, but always contain some form of executable code which can be run on a (real or virtual) machine.

What's in a FASL? Say we have a Lisp file containing the following expressions:

(defun foo ()
(print "hello world!"))
(foo)

Then, the FASL will contain executable code that a) defines a new function called FOO, and b) calls that function, printing "hello world!" to the console. Simple.

Now what's in the FASL for the following Lisp file? Hint: not so simple.

(defmacro bar ()
`(print "hello meta-world!"))
(bar)

The FASL for that file will (gasp!) only contain executable code for (print "hello meta-world!"). The reason is that we're defining the macro BAR, and the (macro) call to BAR in the third line gets expanded at compile-time to the PRINT expression.

And that's where CFASLs come in. (The C stands for compiler, or compile-time.) The CFASL of the above Lisp file will contain (defmacro bar () `(print "hello meta-world!")).

See? The FASL contains the runtime expressions of the Lisp file, and the CFASL contains the compile-time expressions of the Lisp file, and never the twain shall meet.

Why, oh why?

Now, this is where it gets tricky. >:->

We have to distinguish the two ways in which an Acceptable Lisp™ needs to work: interactive REPL mode, and static file-compilation mode.

Let's start with the file-compilation mode, because it's simpler, and fachrissakes, let's not discuss module dependencies for now.

In file-compilation mode, the Lisp's job is dead simple: slurp in the source file, process it, and produce a FASL from the runtime expressions and a CFASL file from the compile-time expressions.

If we want to execute the resulting program, all we have to do is load the FASL. Loading means simply to execute it, and since it contains the runtime expressions of the input file, it will do whatever we told it to do. Note that we're not using the CFASL! Since the CFASL contains compile-time expressions, there's no need to even touch the CFASL when we want to run our program.

Now, for something a bit more challenging, the good ol' REPL mode.

So, we're at the REPL, feeling free, taking a sip, and sneaking a sneaky (defmacro quux () `(print "hello")) in there.

WTF? What's the Lisp supposed to do? After all, a macro definition is not a runtime expression!

Now, older Lisps, which are not so big on following the latest trends in the scene, or newer Lisps that just don't care, will get this wrong. They'll intermingle runtime and compile-time in the REPL, leading to all kinds of undesirable effects, as Matthew Flatt will be pleased to point out to you. This means you'll have the compile-time entity in your runtime, which is simply sick. (In this post's model; Tom Lord and John Shutt will tell you something totally different.) One of the bad results of that is that code that works at the REPL won't necessarily work when put into a file. But don't take my blahger's word for it, take Professor Flatt's peer-reviewed one.

So what's an Acceptable Lisp™ to do, at the REPL, when the luser enters a DEFMACRO? An Acceptable Lisp™ has to keep the runtime runtime and the compile-time runtime strictly separated. This means it will need one environment for the runtime, that contains the global variables and functions, and another, completely separate environment for the compile-time, that contains macro definitions.

Let's try to wrap this up...

Wrap-up

We've seen that a Lisp file contains code for two different times: the runtime runtime and the compile-time runtime. If we want to make our users' lives easier, we as Lisp implementors have to keep these times separate.

In Part II of this post, I'll write about how PLT Scheme handles module dependencies in such a phase-separated model.

Now I gotsta hack.

Wednesday, July 14, 2010

semantic, adj.

In English, mental refers both to things involving the mind in general, as well as crazy in particular.

I propose to use the word semantic the same way for PLs.

If a language has wack semantics, you'd say ``Woah, that language is totally semantic!''

Monday, July 12, 2010

I, for one, welcome our new app-inventing grandchildren of Interlisp overlords

Outlining! 2D forms!

App Inventor for Android is based on OpenBlocks, which looks very much like Scratch, but uses Kawa instead of Squeak.

~~All of this stuff seems to use a dataflow/event-driven layer instead of a "common" programming language. I wonder if such 2D forms would also be useful as a syntax for an ordinary PL...~~ Need to investigate more...

Sunday, July 11, 2010

Optimism

Nice and thought-provoking comment from one markt on LtU:

One of the things that stands out to me about APL, and CS in general is the pure, unfettered optimism concerning the human condition. What other field of human endeavor seeks to provide tools to enhance human productivety and intelligence, to augment the facutlties of the mind? The implicit assumption that this can be done is a leap of hope in the purist form.

"Tools for thinking" I mean who would think of such a thing? Seems like this goes unstated a lot of the time, I imagine most of us here take it for granted. But it really is an amazing proposition (imo).

Thursday, July 8, 2010

(no longer a) PSA

(Sorry for this stupid post, I'll get back to the topic at another time.)

Using multiple kinds of parentheses in Lisp considered silly

Complaining about parentheses in Lisp is like complaining about the heat in the kitchen—as Jamie Oliver said, the cocktail of hot oil, sharp knives and cocaine is fucking lethal. Wait, what am I trying to say? Anyway, I'm not going there.

I'm stealing a picture from that post though, to show something that bothers me:

Yeah, that WTF? is spot on, but for a different reason.

Note how a couple of unshapely ]'s have snuck their way into the midst of that lovely column of gorgeous )'s.

Now why is that WTF?-worthy? Well, one of the Big Benefits of using just two characters for all punctuation is the editing convenience you get from that. After you've moved code around in Lisp, you just keep your finger on the delete key until the superfluous parens are gone for good, or, else you just keep your finger on the )-key until the right amount of parens are where they should be.

This is completely broken when you use, for some irrational reason, a second or – gasp! – third kind of parenthesis. Doing that brings you back to the laughable world of lots of irrational and silly punctuation. You have to manually match up a [ with a ]. Think about that! It defeats one of the Big Benefits of using S-expressions, the only sane, rational syntax there is.

Silly.

(Of course, a lot of the time you use Emacs's sexp-aware killing and stuff, and then you can get away with multiple kinds of parens. But still, that doesn't work in the general case.)

Wednesday, July 7, 2010

C# 4.0: The industrial response to Lisp?

I just skimmed the C# 4.0 specification, and I'm quite fascinated by it. The spec itself is extremely well-written and readable, as far as language specs go. Anyone who's ever tried to read ECMA-262 or the JLS will agree.

But more importantly, the features of the language really raise the bar for all dynlangs out there. With dynamic, C# effectively usurps untyped programming, while maintaining static types elsewhere. With LINQ's expression trees, they seem to tackle the same problem as Lisp does with S-expressions. And MSR's work on evolving generics, a lot of which seems to finds its way into C#, is highly interesting.

That C# qua language is way ahead of Java is clear. What's more worrying is that all dynlangs, including the few acceptable ones, like Common Lisp and Factor, are starting to look horribly dated against C#. Seriously.

Note the following quote by grandmaster Tim Sweeney:

In Bracha's work here, and in Microsoft's work on C# 3.0, I sense an undercurrent dragging the language model toward the LISP/Smalltalk "ideal" of metadata-intensive, introspective, dynamic, loosely-typeable programming programming. ... If you go this route, one day you'll realize you evolved the ultimate hacker language, and it became a godawful mess for writing real programs. (My emphasis)

Evolving the ultimate hacker language, and making it a godawful mess for writing real programs, is of course the topic of this blog. Note also the following quote on C#'s type system, again by Sweeney:

These [Variance and Generalized Constraints for C# Generics] extensions stretch the C language family to an impressive new local optima

Now, I must say that I haven't ever used C#, so I don't know how it stacks up ITRW. I'm assuming, based on long observation, that Microsoft with high likelihood fucked up the pragmatics completely, and that programming in C#, as opposed to reading about it, is deeply depressing.

But what I'm saying is that the Perl-Python-Ruby-PHP-Tcl-Lisp-language has to catch up, maybe for the first time since 1959!

Tuesday, July 6, 2010

Manifesto on JAR's Next Language

I just had a good laugh reading this, so I thought I'd share:

Reasons to design a programming language:
All current languages are terrible.
It's fun, easy, and satisfying.
You get to choose a silly name for it.
Languages are games, and people like games.
Everyone else is doing it.
There's a chance it might be useful for something.
By doing it once again, perhaps some people will learn something new.

Tuesday, June 29, 2010

Weekly notes

Fascinosum & Tremendum

OMG, first IDEs took programming, now they're trying take programming languages. Really, ditch your IDE. Stomp on it. Burn it.

I'm afraid of what will happen when PHBs start to demand DSLs from their Maven-toting programming morons. And it's happening right now! However, since this can only further the Lisp cause in the longer run, I'm quite happy.

GADTs

GADTs are teh hotness. As you know, all Haskellers' favorite pastime is to implement an evaluator for the addition of numbers. There are literally a thousand papers on this topic. With GADTs, they can now exploit the host language type system to make sure that their petty additions are correctly typed. Of course, geeks of other languages are trembling in the face of a number addition gap, so they want to typecheck GADTs in their languages, too. And maybe even throw in a Visitor, or two.

One way of bringing GADT-like power to lesser languages is via type constructor polymorphism, as discussed on The Axis of Eval previously. However, the hordes from the Evil Northwest are proposing a different solution. Instead of higher-kinding type bounds, they are proposing a rather humble where clause. Be assured that your correspondent will report back with further information.

The mother lode of pointer tagging schemes

Nuff said.

Relatedly, from Stress-testing Control Structures for Dynamic Dispatch in Java ('02):

[O]ur results show that strength reduction of control structures is likely to be beneficial regardless of the hardware and JVM, when the number of possible receiver types can be determined to be small. For numbers of possible types up to 4, if sequences are most efficient. Between 4 and 10, binary tree dispatch is generally preferable. For more types, the best implementation is a classical table-based implementation such as currently provided by most JVMs for virtual calls. These are safe, conservative bets, that generally provide a significant improvement and, when not optimal, result only in a small decrease in performance.

And of course, Agner's The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers.

Hack we must!

Saturday, June 26, 2010

Programmer Proverb

May you program in an interesting language.

Compile-Time Resolution

David Barbour has explored some highly interesting issues over on the C2 wiki, under the title of CompileTimeResolution. (The page is several years old, though.)

I just wrote him a mail with some of my thoughts, and maybe it's useful to some of you, so I'm replicating it here.

Hi David,

I just saw your C2 page on Compile-Time Resolution.

Highly interesting stuff, and the example of embedding JPEG data right into the code clarified a lot for me.

Even more so, your statement that compile-time code should be idempotent and side-effect-free rang a big bell with me.

In Scheme, there is current work regarding exactly this topic in the context of separate compilation and modularity, under the moniker of "phase separation".

Classic Lisps didn't know or care much about this, and EVAL-WHEN, their tool for handling this is seriously broken [1].

Schemers have started a rather rigorous investigation of this issue, probably starting with Matthew Flatt's work on modularity [2] (although Queinnec's earlier work on reflective towers [3] tackles similar issues.)

In work based on Flatt's, such as R6RS, it's required to explicitly demarcate the phase(s) in which an identifier is needed -- Schemes not only have a single compile-time, they may have a nested tower of compile-times, in case of lexically nested macros (LET-SYNTAX).

One of the latest contributions to the issue is the Implicit Phasing framework [4]. It shows that the programmer can be relieved from the burden of manually having to put code into a particular phase, and let the system infer the compile-times (phases) in which a given identifer is really used. Since that code should be idempotent and side-effect-free anyway, evaluating it multiple times, or not at all, isn't a problem.

Implicit phasing however, is (rightfully, IMO) critiqued for applying this principle even to runtime code. In a Scheme with implicit phasing, REQUIRE'ing a package for runtime will only load that package if, in fact, one of its exported identifiers is accessed, thereby violating a traditional Lisp assumption, namely that simply loading [edit: requiring] a module may have (important) side-effects.

However, for compile-time code, implicit phasing seems to be an advanced solution to the issue of compile-time resolution, so you might want to look at it.

In fact, in the new Lisp I am working on, a "strict" model is used for runtime dependencies (i.e. when you require a package for runtime it will be loaded no matter what), and a "lazy" model is used for compile-time dependencies (i.e. when you require a module "for compile-time", or "for-syntax" as Schemers call it, it will get loaded automatically, in whatever compile-time it is actually used, or not at all if it isn't actually used. After all, compile-time should be idempotent and side-effect free.)

Best regards,
Manuel

P.S. Even old Common Lisp is now getting pushed towards such a saner model by Fare Rideau's XCVB build system [5].

Thursday, June 24, 2010

What Makes Lisp Great

Faré: What Makes Lisp Great:

[U]nlike most other languages, Lisp is not an arbitrary design based on the whims of some author, but it is a discovery produced by a long evolution and a deep tradition. It has seen several revolutions: transition from M-expression to S-expression, from EVALQUOTE to EVAL or from dynamic binding to static binding, introduction of data-structures, of object-oriented programming and of a meta-object protocol, the invention and latter standardization of many many individual features. Many dialects of Lisp also introduce their own revolutions, such as continuations and single name space in Scheme, the graphical user interface and dreaded DWIM features in InterLISP, etc. If anything remains among all these revolutions and variants, it is the ability to evolve. Lisp is a language fit for evolution -- not just by a small group of wizards who control the language implementation, but by every single programmer who uses the language. And this wasn't a conscious intended part of the original design: this was the great discovery of Lisp. (My emphasis.)

Now I understand better why I'm wary of Clojure: Clojure is big on revolution and small on evolution. And in a revolution, people get killed. Or, in Clojure's case, language features that have evolved over decades. Maybe devolution is the correct term.

Wednesday, June 23, 2010

Quo Vadis, Cons?

Everybody seems to agree that we have to get rid of the cons, if for different reasons.

Guy Steele proposes an incremental extension, the conc, with an eye towards losing the accumulator idiom, and emphasizing divide-and-conquer. Alexei Alexandrescu proposes the radical range, for enhanced generic algorithm programming and general awesomeness. Dylan, Goo, and PLOT have stepper-based iteration protocols, that were designed so that the compiler can erase them in some cases. Oleg proposes a left-fold operator with premature termination as the best collection API, and turns it inside-out if a stream-based interface is needed. Clojure uses the cons's first/rest interface as the generic interface for all sequences, which is kinda unbelievable.

The jury is still out, but I think ranges are the most interesting solution for stateful languages, whereas conc lists seem to have their charms in a poorly applicable, I mean, purely applicative setting. Oleg's API probably only works for people with a brain the size of Oleg's.

Sunday, June 20, 2010

Refining Syntactic Sugar

Another great dissertation just came out, Ryan Culpepper's Refining Syntactic Sugar.

Over the past two decades, Scheme macros have evolved into a powerful API for the compiler front-end. Like Lisp macros, their predecessors, Scheme macros expand source programs into a small core language; unlike Lisp systems, Scheme macro expanders preserve lexical scoping, and advanced Scheme macro systems handle other important properties such as source location. Using such macros, Scheme programmers now routinely develop the ultimate abstraction: embedded domain-specific programming languages.

Unfortunately, a typical Scheme programming environment provides little support for macro development. The tools for understanding macro expansion are poor, which makes it difficult for experienced programmers to debug their macros and for novices to study the behavior of macros. At the same time, the language for specifying macros is limited in expressive power, and it fails to validate syntactic correctness of macro uses.

This dissertation presents tools for macro development that specifically address these two needs. The first is a stepping debugger specialized to the pragmatics of hygienic macros. The second is a system for writing macros and specifying syntax that automatically validates macro uses and reports syntax errors.

It describes two contributions: a debugger that's tightly integrated with macroexpansion, as well as a new expander called syntax-parse.

syntax-parse seems like it will become a standard facility for implementing macros. It lets you describe the shapes of the s-expression inputs of macros in a parser-like language. You can define new "syntax classes", that contain both information about the shape of a form, as well as additional, procedural checks.

One example of a syntax class would be the variables bound by a LET: they have to be identifier/expression pairs, and the identifiers must be unique. syntax-parse lets you describe these constraints succinctly, and facilitates useful error reporting (i.e. you don't get an error in terms of the expanded output of a macro, which would require understanding of a macro's implementation). This obviates the need for explicitly programmed checks.

syntax-parse seems like a big step forward for macro writing. The macro debugger is also impressive. Congratulations, Ryan!

Saturday, June 19, 2010

A Theory of Typed Hygienic Macros

Dave Herman's much-awaited dissertation is out!

Hygienic macro expansion is one of the crown jewels of Scheme, but to this day nobody understands just exactly what it is.

Put that in your pipe!

Unhygienic macro expansion identifies programs with their representation as trees. But because of variable scope, programs in fact have additional graph structure, with edges between variable bindings and their references. Since these edges are not explicitly represented in S-expressions, maintaining their integrity during macro expansion becomes the responsibility of programmers. Put differently, the tree representation of a program has unenforced representation invariants that are the responsibility of programmers to maintain.

A good explanation of why "code is more than data". (Though Pascal Costanza seems to have found a way to introduce hygiene in an unhygienic macro system.)

Though the motivations are clear enough, hygienic macro expansion has so far resisted a precise, formal specification. At the heart of the problem is identifying what is meant by “scope” in a language with extensible syntax. ...

The key insight of this dissertation is that by annotating all macro definitions with interfaces describing their grammar and binding structure, we can reason formally about the binding structure of Scheme programs, and without first macro-expanding. More technically, explicit annotations provide us with enough information to obtain a formal definition of α-equivalence of pre-expansion Scheme programs.

These binding specifications make use of tree addresses, a syntax for describing subtrees of a cons tree, analogous to Common Lisp's CADR, CADDR, CADAR, etc functions. The corresponding tree addresses would be AD, ADD, and ADA.

Furthermore, the specifications make use of attribute grammars. These allow synthesized attributes (passed from children upwards to their parents) and inherited attributes (passed from parents downward to their children). For example, the specification of LAMDBA would be:

(lambda xs:formals e:expr)
↪ e.imports = xs.exports :: ε

This means that the imports attribute (the bound variables) in E corresponds to the exports attribute (the introduced formals) of XS. (There are additional rules, which I'm not showing here.)

I've only read the first two chapters so far, but this work seems like a clear winner. Very readable and insightful. Congratulations, Dave!

More later.