The Axis of Eval: types

Showing posts with label types. Show all posts

Saturday, November 9, 2024

An alternative idea for a typed language living alongside/inside JavaScript

What if instead of trying to type existing JS code, like TypeScript, we made objects of a new typed language completely separate from JS objects?

The TypeScript idea is to be able to type any old JavaScript that's out there. On the one hand this is admirable, but OTOH it leads to a very complicated type system.

Here's an alternative idea: instead of typing existing, legacy JavaScript, treat it as completely unsafe and outside of the system. The only safe part will be that written in the new language, let's call it Qwax for now.

Qwax will have a very simple object-functional type system like Ben Titzer's Virgil: classes, functions, type parameters, nothing else.

Qwax will also have full RTTI (so you can distinguish e.g. an array of strings from an array of numbers at runtime). This is a big difference from TS, which uses type erasure.

RTTI will also make it possible to distinguish Qwax objects from JS objects. (Essentially, every Qwax object will have a special symbolic property that points to its Qwax type information, or be an instance of a special QwaxObject class.)

JS builtins like booleans, numbers, and strings will be special cased, so they are also native Qwax booleans, numbers, and strings.

Other than RTTI for all its objects, Qwax will have exactly the same semantics as JS and transpile to JS in a straightforward way.

What are the benefits of this?

A very simple type safe language for the JS ecosystem. You could write type safe code with roughly the same performance and code size as JS, without needing the complex TS type system. (If I had to guess I'd say you can probably write a type checker for a language like this in under 2000 LOC, but I'll have to check the Virgil source, and hope Ben doesn't do a big eyeroll if reads this.)

How would you interop with JS?

JS would essentially be an unsafe "foreign" language for Qwax code. There would absolutely be no integration between JS types and Qwax types. For Qwax all JS objects (other than builtins) are of type ForeignObject. Due to RTTI you could do a safe cast from a ForeignObject to a Qwax object (the cast may fail, but will never produce an invalid result).

So there would be much more friction when calling JS from Qwax than there is when calling JS from TypeScript. But many systems have been written against such foreign function interfaces, and maybe it's not such a big deal.

JS could call all Qwax functions normally, and work with Qwax objects like normal JS objects.

How would this be different from languages like Elm or ReScript?

Unlike those languages, and like TS, it would not be a completely new language, it would be "(a subset of) JS with types". It would reuse existing JS syntax. Some programs could be ported to it simply by adding type declarations.

Furthermore, via RTTI, it would allow safely casting back Qwax objects that have been passed to unsafe JS code. I don't think Elm or ReScript support that.

Downsides

Every new language is a lot of work and introduces a lot of friction. Maybe we could reuse existing TypeScript syntax (but obviously not semantics) to get some mileage out of LLMs for this new language.

It's unclear how much overhead the RTTI introduces. I think there would essentially be one more superclass in the chain for each object due to generics. E.g. a list of numbers would be of class List<Number> which would be represented by a JS class that's a subclass of the representation of List<T>. But I could be completely wrong and missing something.

Status

I am not working on this, but I keep thinking about it, so who knows? Today I had the idea of completely separating Qwax objects from JS objects so I wanted to blog about it and invite comments.

(Lobsters)

Tuesday, August 7, 2012

Scheme's missing ingredient: first-class type tags

I've always felt that Scheme does data and types wrong, but I couldn't really say what bothered me until yesterday. After all, R7RS now has define-record-type for creating disjoint types. But that ain't enuff!

In my new language, Wat, I've found a satisfying solution, inspired by Kernel's encapsulation types, but going beyond them slightly. There are two procedures:

(make-type) returns a list of three elements: 1) a first-class type 2) a tagger function for tagging a value with that type, creating a tagged object and 3) an untagger function for extracting the value of tagged objects of that type.
(type-of obj) returns the first-class type of an object. The crucial point is that type-of not only works for tagged objects of user-defined types created by make-type, but also for all built-in types. E.g. (type-of 12) will return the number first-class type.

This system has the benefits of Kernel's encapsulated types: only someone with access to the tagger function (capability) may create new instances of a type. Only someone with access to the untagger function may access the contents of tagged objects. So object contents are potentially fully encapsulated.

But at the same time, the fact that every object, including built-in ones, has a first-class type makes it possible to efficiently program generically. E.g. one may create Smalltalk-like dynamic dispatch by using the first-class types of objects as indexes into a virtual lookup table of a generic function. This is not possible in either Scheme or Kernel. In both languages, programming generically requires one to use a cascade of type predicates (e.g. number?).

Example in Wat:

; destructuringly bind the three elements
; returned by make-type
(def (person-type person-tagger person-untagger) (make-type))

(define (make-person name email)
  (person-tagger (list name email)))

; untagger also performs a type check that
; person is in fact of type person
(define (get-name person)
  (car (person-untagger person)))

(define (get-email person)
  (cadr (person-untagger person)))

(define p1 (make-person "Quux" "quux@example.com"))
(get-name p1) --> "Quux"
(get-email p1) --> "quux@example.com"

Thursday, November 18, 2010

Foundations for Structured Programming with GADTs

Makes me wish I understood category theory: Foundations for Structured Programming with GADTs (neelk's LtU post) by Patricia Johann and Neil Ghani @ POPL 2008.

GADTs are at the cutting edge of functional programming and become more widely used every day. Nevertheless, the semantic foundations underlying GADTs are not well understood. In this paper we solve this problem by showing that the standard theory of datatypes as carriers of initial algebras of functors can be extended from algebraic and nested data types to GADTs. We then use this observation to derive an initial algebra semantics for GADTs, thus ensuring that all of the accumulated knowledge about initial algebras can be brought to bear on them. Next, we use our initial algebra semantics for GADTs to derive expressive and principled tools -- analogous to the well-known and widely-used ones for algebraic and nested data types -- for reasoning about, programming with, and improving the performance of programs involving, GADTs; we christen such a collection of tools for a GADT an initial algebra package. Along the way, we give a constructive demonstration that every GADT can be reduced to one which uses only the equality GADT and existential quantification. Although other such reductions exist in the literature, ours is entirely local, is independent of any particular syntactic presentation of GADTs, and can be implemented in the host language, rather than existing solely as a metatheoretical artifact. The main technical ideas underlying our approach are (i) to modify the notion of a higher-order functor so that GADTs can be seen as carriers of initial algebras of higher-order functors, and (ii) to use left Kan extensions to trade arbitrary GADTs for simpler-but-equivalent ones for which initial algebra semantics can be derived.

Mentioned by Oleg in GADTs in OCaml.

Thursday, November 11, 2010

Dependent types linkdump

This dependent typing thing has me. For the first time in years I find I'm actually learning something about programming. There is life after the Observer pattern.

Here's a list of some papers I've found worthwhile or just found, and not read yet...

Fun with Type Functions (slides). Oleg Kiselyov, Simon Peyton Jones, Chung-chieh Shan. 2010.

How to fake it with Haskell's recently added type families. But Haskell increasingly looks horribly impure: it's not just that every type's inhabited by bottom, type-level programming is untyped! I think they really should work on their purity! ;) Mentions Ωmega's kind system - my prediction: typed type-level programming is coming to Haskell soon.

Proof-assistants using dependent type systems. Henk Barendregt, Herman Geuvers. 2001.

Recommended as a great intro to all things dependent typing. Not by me though, as I still have to read it.

ΠΣ: Dependent Types without the Sugar. Thorsten Altenkirch, Nils Anders Danielsson, Andres Löh, Nicolas Oury. 2009ish.

What's under the sugary hood of dependent programming languages. For when I develop a dependent language - because I'll have to.

A Tutorial Implementation of a Dependently Typed Lambda Calculus. Andres Löh, Conor McBride and Wouter Swierstra. 2009.

Also frequently mentioned as an intro to dependent types.

Verified Programming in GURU. Aaron Stump. 2009.

A book on dependent programming with a focus on verification.

Generic Programming with Dependent Types (part II, part III). Stephanie Weirich. 2010.

Slides focussing on the extra expressivity afforded by dependent types. Agda.

An Epigram Implementation. Edwin Brady, James Chapman, Pierre-Evariste Dagand, Adam Gundry, Conor McBride, Peter Morris, Ulf Norell, Nicolas Oury. 2010.

The 300-page literate Epigram (2?) implementation.

Monday, October 25, 2010

Notes on Dependent Types

Since discovering Ωmega, I've been drawn into the fantastic world of dependent typing. Dependent types allow the expression of static invariants that your program is then guaranteed to respect dynamically. This works up to computation: proofs of static invariants are programs, so arbitrary invariants may be expressed by the programmer, and verified by the type system. I repeat, arbitrary invariants. If you're not salivating yet, you might want to skip this post.

A couple of systems I've (superficially, so far) looked at are Ωmega, Epigram, Coq, and Agda.

Ωmega is closest to existing FP languages like Haskell, which makes it easier to grok than the others for newbies - it's not a full-spectrum dependent type system by design. Technically speaking, Ωmega is "faking it" by maintaining a strict phase distinction between the value and the type level (and the kind level ...). Proofs are represented as runtime instances of GADT datatypes.

GADTs, or generalized algebraic datatypes, are a more powerful (generalized) form of ordinary ADTs, such as ML's 'a list, that are used as datatypes (or "object system") in many dependently typed systems. GADTs are very closely related to generics in Java and C#; understanding generics and their proposed higher-level extensions thus provides a gentle route into dependent types.

Java and C#'s object systems already have all of the power of GADTs, only they're not as easy to use, and they are not as typesafe. What separates GADTs from ADTs can be seen in this Java code:

class A<T> {}
class B extends A<Foo> {}
B extends A with T fixed to Foo, so instances of B have type A<Foo>, by subtyping. With ordinary ADTs, B would also need to be parameterized by T - individual type constructors of an ADT, which we've expressed here as Java classes, cannot fix a type parameter to a specific type. GADT is attractive because it's a modest extension of existing ADT theory, and yet enables a wide range of data structuring styles used in fancypants typeful programming.

Epigram is programmatically aligned with Ωmega - both are passionate about the advances in programming made possible by dependent types, and vocally call programmers and researchers to join them on their quest. Epigram, in contrast to Ωmega, goes whole hog dependent typing. This includes arbitrary type-level computation, a succinct 2D-layouted syntax, and interactive support by the development environment, that goes beyond what IDEs for ordinary languages offer (and can offer).

Coq is different from Ωmega and Epigram in that it's designed as a proof assistant, and not a dependent programming language. Adam Chlipala maintains that Coq is the most useful implementation to date for expressing complex proofs, and that systems like Epigram are more useful when proofs arise very naturally from the structure of the program and/or its data. Coq programs, such as Xavier Leroy's verified C compiler, have a different flavor from ordinary and dependent functional programs. Concoqtion has an O'Caml value level, and a Coq type level.

One problem with all dependently typed systems is that you have to deal with non-terminating type-level programs somehow, otherwise your logic becomes unsound. There are different strategies: Ωmega type-level programs are written as term rewriting systems, that have to terminate; Epigram and Coq programs are written in total languages - every program terminates; in Agda, programs are interrupted when they get stuck in a loop - soundness is regained through a separate termination check (I think). In practice, nontermination doesn't seem to be a big problem.

Some of the most interesting applications of dependent types are in the area of metaprogramming by program generation: you prove interesting static properties of your program, and then spit out a program in another language that is guaranteed to obey them dynamically. Which means that you can throw away and erase a lot of stuff at compile-time, that other languages have to remember and/or check at runtime. As Epigram implementers state, they have more information available about their programs statically, than they know what to do with. Consider: in a dependent language, once a program typechecks, you may not ever need to run it!

The existing systems for doing dependent programming all seem very closely related, and "conceptual epiphanies" can be translated between them. Dependent types are an amazing area of research & development, and I urge you to join it.

Update:

Stephanie Weirich maintains that dependent types not only aid verification, but they also increase expressiveness of a language, and that generic programming is a killer-app for dependently-typed languages.

Also: what are dependent types actually good for? I think a good example is found in Why Dependent Types Matter: it shows how to write a list-sorting function, whose type proves that the output lists are always correctly sorted. I think a lot of software will be written this way in the near future.

Monday, September 13, 2010

The far side of the lambda cube

Of course, once you investigate higher-order types, you begin to think about unifying the type and term levels.

Reading a bit about this I found some interesting links:

Sage: Unified Hybrid Checking for First-Class Types, General Refinement Types, and Dynamic. A recent paper I found via Ahmed, Findler, Siek, and Wadler's new Blame for All. I find the use of a counter-example database in a compiler a bit frightening, though.

Henk: A Typed Intermediate Language, from 1997, by SPJ and Erik Meijer:

There is growing interest in the use of richly-typed intermediate languages in sophisticated compilers for higher-order, typed source languages. These intermediate languages are typically stratified, involving terms, types, and kinds. As the sophistication of the type system increases, these three levels begin to look more and more similar, so an attractive approach is to use a single syntax, and a single data type in the compiler, to represent all three.

The theory of so-called pure type systems makes precisely such an identification. This paper describes Henk, a new typed intermediate language based closely on a particular pure type system, the lambda cube. On the way we give a tutorial introduction to the lambda cube.

Languages of the Future, by Tim Sheard on Omega, on which I'll post something soon.

And, Your lambda-cube is puny.

A moment of respectful silence, please.

Sunday, September 12, 2010

Higher kinds are sexier

In the process of formalizing generics for my Lisp, I'm also studying FGJ_ω again, the extension of Featherweight Generic Java with higher kinds. Instead of just parameterizing types with first-order types, such as T, you can parameterize types with type-level "functions", such as F<T>. One example of such a function is the type List<T> - you give it a T, and it "returns" the concrete type of lists containing that type.

And it turns out, the theory behind that looks simpler, and thus sexier:

(types)             T ::= (K K*)
(type constructors) K ::= X | C | (P* => T)
(type param defs)   P ::= (X P*) | (<: (X P*) (C K*))
(class defs)        D ::= DEFCLASS (C P*) ((C K*)*)
(method defs)       M ::= DEFMETHOD (m P*) ((param T)* -> T)

I like the definition of types: a type's the application of a constructor to zero or more other constructors. Constructors may be variables (X), classes (C), or functions taking zero or more parameters and returning a type. (So you basically have a lambda calculus at the type-level.) A type parameter is a variable constructor with zero or more parameters, optionally bounded to a class type with zero or more type arguments. Class definitions are parameterized over zero or more type parameters, and extend zero or more existing classes. (I'm taking some liberties with adding multiple inheritance, hoping it doesn't mess up the theory. :P) Method definitions are, as usual, parameterized on the type- and term-level and have a result type.

It's kinda embarrassing to post this stuff here, since I'm so bad at type theory, but I think I can make it work. One insight giving me this faith is that none of this stuff survives till runtime: at runtime, only concrete instantiations, such as List<String> remain. Or so I hope.

Friday, September 10, 2010

Formalizing generics

In the previous post, Minimal generics for an untyped language, I've discussed some basic requirements for the implementation of parametric polymorphism in the Lisp I'm working on.

I've now made some progress in formalizing this, based on Featherweight Generic Java. Here's a basic overview of the type system in the FGJ style:

(types)       T ::= C | X | Top
(classes)     C ::= (klass T*)
(class defs)  D ::= DEFCLASS (klass (<: X C)*) (C*) ((slot T)*)
(method defs) M ::= DEFMETHOD (method (<: X C)*) ((param T)* -> T)

Types are either class types (C), type parameters (X), or Top. Class types have a name (klass) and zero or more types as parameters. A class definition introduces a named class, parameterized over zero or more type parameters that have class types as bounds, zero or more superclass types, and zero or more named and typed slots. A method definition introduces a named method, parameterized over zero or more type parameters that have class types as bounds, with zero or more named and typed parameters and a result type.

Type parameters in a class definition or method definition (X) are scoped over the entire definition, including the bounds (in the style of F-bounded polymorphism).

All of this is pretty much equivalent to FGJ. The only difference is that classes do not contain methods.

[Updates: I've added type parameterization to method definitions. Added Top. Added some parentheses to make the syntax unambiguous.]

Wednesday, September 8, 2010

Minimal generics for an untyped language

So, I'm developing a Lisp, which means I have to support untyped, dynamically type-checked code in any case. But I still like typeful programming.

Generics, or parametric polymorphism, are jolly useful, even on a totally limited scale: all I want from the first round of this endeavor is to support something like the following:

(defclass (list T))
(defmethod add ((list (list T)) (element T))
...)

Here, I define the class (list T) - a list parameterized over the type of its elements, T. That's the same as Java's List<T>. The method ADD takes such a list and an element, and adds it to the list.

To create an instance of a parameterized class, you have to pass a concrete type argument to use for the element type, as in:

(defvar *my-list* (make (list string)))

This creates a list that accepts strings (and instances of subclasses of string) as elements.

The instance has to remember that it's a list of strings for type safety. When ADD is called on a list, the runtime needs to check that the element type matches:

(add *my-list* "foo") ;; OK
(add *my-list* 12) ;; runtime error

One more thing I'd also like is the following;

(defclass (foo (<: X bar)))

That's a class foo that has one parameter, that must be a subtype of bar, analogous to Java's Foo<X extends Bar>. Being able to give such a bound to a type parameter seems quite essential.

Furthermore, it has to be possible to pass on type parameters to superclasses, as in:

(defclass (super-1 T))
(defclass (super-2 T))
(defclass (klass X Y) ((super-1 X) (super-2 Y)))

(klass X Y) is parameterized over two types, X and Y, and passes them on to its two superclasses, each of which is parameterized over one type.

Of course, it also has to be possible to remove parameterization, as in:

(defclass string-list ((list string)))

This defines an unparameterized class string-list, which is a subclass of (list string).

Another requirement is that it has to be possible to create instances of type arguments, as in:

(defclass (klass T))
(defmethod make-a-new-t ((k (klass T)))
(make T))

That's just cute, and may have some applications to dependency injection.

I'd also like to be able to write polymorphic functions:

(defun identity ((a T) -> T) a)

The arrow indicates the result type of the function. In this case it takes an instance of any type and returns it.

It should also be possible to give slots (member variables) of classes the types of type parameters:

(defclass (klass X Y) ()
((slot-1 X)
(slot-2 Y)))

This defines a class with two type parameters X and Y, (no superclasses,) and two slots slot-1 and slot-2 with the types X and Y, respectively.

These are the basic requirements. Now I need a plan! ;)

Saturday, September 4, 2010

The many forms of polymorphism

It's important to realize that different kinds of polymorphism in type systems can be quite orthogonal.

Here are some forms of polymorphism, from Types and Polymorphism in Persistent Programming Systems by R.C.H. Connor:

ad hoc: An operation is defined over a number of different types, and its semantics may depend upon the type of its operands. An example is the operator "+", which may be defined over integers and reals, and has a different interpretation for each type.

parametric: Instances of the same type within a type description may be abstracted over by an implicit or explicit type parameter. An example is an identity function, where although the parameter and result may be of any type, they are statically known to be the same type.

inclusion: The type of a value may be partially abstracted over, so that unnecessary type information need not be stated. An example is a function which is defined over any record value which has an age field of type integer.

He further writes:

All of these language concepts have evolved independently from each other. Languages such as early assemblers contained no polymorphism whatsoever. Ad hoc polymorphism was the first to appear, in languages such as Fortran, which defined overloaded operators, such as "+", along with the ability to coerce values from integer to real according to the use of such operators. Parametric polymorphism appeared in ML, and inclusion polymorphism in Simula. Existentially quantified types as described by Mitchell and Plotkin is a model of abstract data types which introduces abstraction over a type description, and such types are included in the definition of parametric polymorphism given above.

The motivation for all of these diverse language models is the same: the need to abstract over type. As type systems include more static constraints, then flexibility is lost as well as safety gained. Some of this flexibility, however, may be regained without the loss of safety by the introduction of type abstraction. Polymorphism is viewed here not as some theoretical attribute of type systems but as a solution to a class of practical problems which require type abstraction.

Saturday, August 7, 2010

You know you're reading LtU when...

I believe the preferable solution is typeclasses as record types, instances as records, scoped implicit parameters for propagating them around, associated types as record members, and a backtracking logic solver for instantiation. But that just solves the immediate problem; on top of that I'd like a dependent type system with staging to enable type dependency on values (e.g. arrays-with-length), and a higher-order logic solver so simple logic-programming idioms like let zip(as,bs)=f(x) in ... can be expressed and translated via instantiation into efficient runtime code. And a total (non-partial) function subset of the language for reasoning with proofs as programs. — Tim Sweeney

Monday, July 19, 2010

On Understanding Types, Data Abstraction, and Polymorphism

Cardelli and Wegner's On Understanding Types, Data Abstraction, and Polymorphism from 1985 is a treat. No wonder it has more than 2000 citations on Google Scholar.

It starts off with a simple functional language, and then adds universal and existential quantification. Universal quantification (∀) brings parametric polymorphism ("length returns the length of any list, whatever the type of its contents are"). Existential quantification (∃) lets you hide the details of a data type ("there exists some data type that implements a stack, but I'm not telling you more"). All of this is described in terms of a set-based description of types, which clears up a lot of issues.

It seems that this paper is the granddaddy of papers about polymorphism. It's expository style is unmatched.

P.S. I discovered this paper while skimming Adrian Moors' thesis Type Constructor Polymorphism for Scala: Theory and Practice. Its final chapter gives a great overview of where type systems are going.

Saturday, June 19, 2010

A Theory of Typed Hygienic Macros

Dave Herman's much-awaited dissertation is out!

Hygienic macro expansion is one of the crown jewels of Scheme, but to this day nobody understands just exactly what it is.

Put that in your pipe!

Unhygienic macro expansion identifies programs with their representation as trees. But because of variable scope, programs in fact have additional graph structure, with edges between variable bindings and their references. Since these edges are not explicitly represented in S-expressions, maintaining their integrity during macro expansion becomes the responsibility of programmers. Put differently, the tree representation of a program has unenforced representation invariants that are the responsibility of programmers to maintain.

A good explanation of why "code is more than data". (Though Pascal Costanza seems to have found a way to introduce hygiene in an unhygienic macro system.)

Though the motivations are clear enough, hygienic macro expansion has so far resisted a precise, formal specification. At the heart of the problem is identifying what is meant by “scope” in a language with extensible syntax. ...

The key insight of this dissertation is that by annotating all macro definitions with interfaces describing their grammar and binding structure, we can reason formally about the binding structure of Scheme programs, and without first macro-expanding. More technically, explicit annotations provide us with enough information to obtain a formal definition of α-equivalence of pre-expansion Scheme programs.

These binding specifications make use of tree addresses, a syntax for describing subtrees of a cons tree, analogous to Common Lisp's CADR, CADDR, CADAR, etc functions. The corresponding tree addresses would be AD, ADD, and ADA.

Furthermore, the specifications make use of attribute grammars. These allow synthesized attributes (passed from children upwards to their parents) and inherited attributes (passed from parents downward to their children). For example, the specification of LAMDBA would be:

(lambda xs:formals e:expr)
↪ e.imports = xs.exports :: ε

This means that the imports attribute (the bound variables) in E corresponds to the exports attribute (the introduced formals) of XS. (There are additional rules, which I'm not showing here.)

I've only read the first two chapters so far, but this work seems like a clear winner. Very readable and insightful. Congratulations, Dave!

More later.

Thursday, June 17, 2010

Some more notes on Newspeak

While Bracha does nothing but diss type systems in his talk on Newspeak, he's actually working on optional type systems (see Gilad is Right on LtU).

He says that type systems should be purely optional (can I getta yes please), and mustn't interfere with the runtime semantics of the language (he gives Java's evil overloading as an example). But he doesn't seem to know Cecil or Diesel, both of which already do this (and with interesting type systems). Anyway, I think that optional type systems are clearly the future (of Lisp-like languages).

A second, mildly interesting thing is mirror-based reflection. Instead of being able to call getClass() on any object, you need a special capability that can then be used to reflect upon the metaobjects. Which sounds agreeable, if somewhat software-engineerey (OMG these evil, unwashed other programmers will reflect on my private objects).

Finally, his demo again shows the failure of the antiquated image-based paradigm, or so it seems to me. Yes, he can produce a 12KB object file, but to actually run it, it needs the 7MB development environment. I can only chant, destroy the one true Lisp world, long live the multiverse!

Wednesday, June 2, 2010

What's that ATS thing?

ATS rocks the shootout, and now grandmaster Chris Double shows how to achieve remarkable feats using its (dependent and linear) type system:

By defining ‘curl_easy_cleanup’ in this way we enforce a contract which states that the programmer must call this function if they have a live ‘CURLptr l’ object. You can see this in practice if you remove the cleanup call and try to compile. A compile error will result. A compile error will also occur if you try to use the curl handle after the cleanup call.

You get a compile error if you don't dispose of our objects correctly or use them after they've been disposed of! Now that sounds like progress.

Friday, May 28, 2010

Type Constructor Polymorphism

I'm starting to dig type systems, which I have ignored for years. I'm still not able to make it through a single page of TAPL in one sitting, so my learning here is purely intuitional.

I think generics in Java are quite nice, surface issues aside. (In theory, that is. In practice, "Java seems to have been designed to strike a careful balance between making the type system as obstructive as possible while minimizing any actual guarantees of correctness" →. I think this is mostly due to Java's lack of RTTI for generics. C# doesn't have the same problems, except in places where they intentionally copied Java.)

My first nice discovery was that there's a beautifully simple theory behind generics: F-bounded polymorphism. (Reassuringly, it wasn't developed in academia, but by practitioners at HP Labs.)

In ordinary bounded polymorphism, you can give bounds to your type parameters: A <: Foo. The bound here is Foo, which means that all actual As must be subtypes of Foo.

F-bounded polymorphism is a simple extension of ordinary bounded polymorphism, in which the bound may recursively contain the type it bounds: B <: Bar<B>. It turns out that with this simple (but at first somewhat confusing) concept, we can statically typecheck the expression problem, which is a standard measure of the worth of a type system.

Type constructor polymorphism is another beautifully simple theory. In type constructor polymorphism, the bounds of your types may not only be concrete types, but also functions from types to types: C <: F<D> says that Cs must be subtypes of Fs, which are type-level functions that take a D as input. (Type constructor polymorphism also works naturally with F-bounded polymorphism.) It turns out that with this simple concept, we can statically typecheck GADTs, an advanced issue in typeful programming.

My take-away: so far studying type systems has been a lot of fun, though extremely challenging, and the concepts are simpler and more profound than I thought. As long as we keep type systems optional, and never let them get in the way, I think we should strive to understand and apply them in new languages.

Wednesday, May 19, 2010

Type systems for dynamic languages

In the past days I've studied type systems for OOP, with an eye towards making them work in a Lisp.

Cecil and its successor Diesel already do this. There, type systems are purely optional:

Diesel supports the view that static type checking is a useful tool for programmers willing to add extra annotations to their programs, but that all static efficiently-decidable checking techniques are ultimately limited in power, and programmers should not be constrained by the inherent limitations of static type checking. ... Accordingly, error reports do not prevent the user from executing the suspect code; users are free to ignore any type checking errors reported by the system, relying instead of dynamic type checks. Static type checking is a useful tool, not a complete solution

I think that's a good spirit.

Both languages support F-bounded polymorphism, which is a fancy way of saying that you allow recursive types, like T extends Foo<T>. With this simple mechanism, the expression problem can be typed, and that's something.

(Scala and O'Caml extend upon this, each in their own way (here and here), but I think they're much too complicated for what I have in mind.)

Saturday, May 15, 2010

Typeful Dynamic Programming, or, Java/C# Generics and what does F-bounded polymorphism have to do with it?

On LtU, Anton van Straaten repeatedly and vehemently made the point that users of dynamically type-checked languages don't program without types. Instead, in my interpretation, they do they type checking and inference in their heads.

Now, Java can in fact be used as a dynamically type-checked language: just declare all your variables, parameters, and return values as Object, and use instanceof to dispatch on types or reflection to invoke methods. Runtime type information (RTTI) saves the day. But, Java with generics also offers a quite potent type system. (Even if it's hampered by type erasure for backwards compatibility. C# keeps RTTI about polymorphic type variables around (the actual class of T in List<T>).)

And Java's generics are in turn based on F-bounded polymorphism for object-orientation, a nice and rather simple way for type checking parameterized classes. F-bounded polymorphism can do cool stuff like solve the expression problem (if heavy-handedly), and type families of mutually recursive types.

* * *
I recently had the case of a Listener, that's parameterized over its Broadcaster, that again is parameterized over the listener. But I didn't know Java generics well by then, and just gave up on typing it. With F-bounded polymorphism, the two types could be defined like this:

It's weird, but I'm developing a bit of intuition about it. And I have to say, the expressivity is nice, even if the syntax is bad. But my hope is that macros can help factor out some common use cases of F-bounded polymorphism.

* * *
One of the goals of typesystems like O'Caml's is to never have to store runtime type information (RTTI). But in a Lisp or other dynamic languages, users are already prepared to pay for RTTI anyway, so type systems can be layered on top of the completely dynamically type-checked language. What does this say to us as people looking for the ultimate dynamic language?

For one, there's a vista that an advanced type system like F-bounded polymorphism and a at its core dynamically type-checked language can work well together. (Cecil does exactly this.) O'Caml and Haskell programmers routinely write down the exact types of variables and functions. And anyone who has ever experienced these languages knows about the value of their type systems (no pun intended). Let's try out advanced types in dynamic languages!

Second, in a completely dynamically type-checked language augmented with an advanced type-system, compiler type errors become warnings about potential runtime type errors. Why? Because it's guaranteed that you'll get a runtime error for type violations from the dynamically type-checking core language anyhow. This means, the error listing from the compiler changes to a dynamic console of current warnings about expressions that may yield a runtime type error, if executed. But the compiler lets you start your program anyway. Until you hit the first actual error.