Sunday, October 31, 2010

Concurrency as Basis for Scalable Parallelism

David Barbour's Concurrency as basis for Scalable Parallelism may change the way you think about programming:

It has been observed that many 'concurrent' applications fail to scale beyond a few parallel components. In many cases, some sort of data-parallelism is feasible (e.g. parmap, SIMD, GPU) and we should certainly be able to leverage those opportunities! But I'd like to address just the concurrency aspect - the coordination of diverse systems and interests - and argue that even that is a sufficient basis for scalable parallelism, assuming we leverage it properly.

The scalability of concurrency as a basis for parallelism comes in realizing that the number of relationships in a system can rise O(N^2) with the number of components. Thus, we need to embed most significant computation into the relationships, rather than the components. ...

When most computation is moved to the relationships, the resources used by a service will scale commensurately with the number of clients it directly services - and vice versa; resources used by a client will be commensurate with the number of services it directly uses.
It changed my thinking: from "ouch, this is a lot of data to process" to "wow, lots of opportunities for concurrency, and thus parallelism".

Monday, October 25, 2010

Notes on Dependent Types

Since discovering Ωmega, I've been drawn into the fantastic world of dependent typing. Dependent types allow the expression of static invariants that your program is then guaranteed to respect dynamically. This works up to computation: proofs of static invariants are programs, so arbitrary invariants may be expressed by the programmer, and verified by the type system. I repeat, arbitrary invariants. If you're not salivating yet, you might want to skip this post.

A couple of systems I've (superficially, so far) looked at are Ωmega, Epigram, Coq, and Agda.

Ωmega is closest to existing FP languages like Haskell, which makes it easier to grok than the others for newbies - it's not a full-spectrum dependent type system by design. Technically speaking, Ωmega is "faking it" by maintaining a strict phase distinction between the value and the type level (and the kind level ...). Proofs are represented as runtime instances of GADT datatypes.
GADTs, or generalized algebraic datatypes, are a more powerful (generalized) form of ordinary ADTs, such as ML's 'a list, that are used as datatypes (or "object system") in many dependently typed systems. GADTs are very closely related to generics in Java and C#; understanding generics and their proposed higher-level extensions thus provides a gentle route into dependent types.

Java and C#'s object systems already have all of the power of GADTs, only they're not as easy to use, and they are not as typesafe. What separates GADTs from ADTs can be seen in this Java code:

class A<T> {}
class B extends A
<Foo> {}
B extends A with T fixed to Foo, so instances of B have type A<Foo>, by subtyping. With ordinary ADTs, B would also need to be parameterized by T - individual type constructors of an ADT, which we've expressed here as Java classes, cannot fix a type parameter to a specific type. GADT is attractive because it's a modest extension of existing ADT theory, and yet enables a wide range of data structuring styles used in fancypants typeful programming.
Epigram is programmatically aligned with Ωmega - both are passionate about the advances in programming made possible by dependent types, and vocally call programmers and researchers to join them on their quest. Epigram, in contrast to Ωmega, goes whole hog dependent typing. This includes arbitrary type-level computation, a succinct 2D-layouted syntax, and interactive support by the development environment, that goes beyond what IDEs for ordinary languages offer (and can offer).

Coq is different from Ωmega and Epigram in that it's designed as a proof assistant, and not a dependent programming language. Adam Chlipala maintains that Coq is the most useful implementation to date for expressing complex proofs, and that systems like Epigram are more useful when proofs arise very naturally from the structure of the program and/or its data. Coq programs, such as Xavier Leroy's verified C compiler, have a different flavor from ordinary and dependent functional programs. Concoqtion has an O'Caml value level, and a Coq type level.

One problem with all dependently typed systems is that you have to deal with non-terminating type-level programs somehow, otherwise your logic becomes unsound. There are different strategies: Ωmega type-level programs are written as term rewriting systems, that have to terminate; Epigram and Coq programs are written in total languages - every program terminates; in Agda, programs are interrupted when they get stuck in a loop - soundness is regained through a separate termination check (I think). In practice, nontermination doesn't seem to be a big problem.

Some of the most interesting applications of dependent types are in the area of metaprogramming by program generation: you prove interesting static properties of your program, and then spit out a program in another language that is guaranteed to obey them dynamically. Which means that you can throw away and erase a lot of stuff at compile-time, that other languages have to remember and/or check at runtime. As Epigram implementers state, they have more information available about their programs statically, than they know what to do with. Consider: in a dependent language, once a program typechecks, you may not ever need to run it!

The existing systems for doing dependent programming all seem very closely related, and "conceptual epiphanies" can be translated between them. Dependent types are an amazing area of research & development, and I urge you to join it.

Update:

Stephanie Weirich maintains that dependent types not only aid verification, but they also increase expressiveness of a language, and that generic programming is a killer-app for dependently-typed languages.

Also: what are dependent types actually good for? I think a good example is found in Why Dependent Types Matter: it shows how to write a list-sorting function, whose type proves that the output lists are always correctly sorted. I think a lot of software will be written this way in the near future.

Wednesday, October 20, 2010

Keeping up with the GNUses

Just wanted to mention that for keeping up with GNU/Linux progress, the following two sources are very good:

Tuesday, October 19, 2010

Split stacks!

This could be big - in the October 2010 GNU Toolchain Update (subscribe!), Nick Clifton reports that split stacks have been added to GCC. With split stacks, thread stacks can grow (and shrink (?)) automatically.

Monday, October 11, 2010

al-Khwārizmī



I think that's a great statue showing the act of algorithm design.