Friday, December 31, 2010

Nice quotations

Although I'm not sure I get most of them - Tim Chevalier's Quotations File:
[Satanism and Wicca] are to world religions what JavaScript is to programming languages. -- Juli Mallett

...Debugging-by-printf is a universal theme that transcends cultures and concrete syntaxes. This shit is Joseph Campbell, yo. -- Jason Reed

...real programmers often wear climbing boots to work in case a mountain should suddenly spring up in the middle of the machine room. -- anonymous
Happy new year!

Wednesday, December 29, 2010

PL Predictions for 2011

  • JSON will be the default format for new internet APIs.
  • As more people use JSON, we'll see a XML renaissance, as we - for the first time - discover that XML actually gets some things right. No really.
  • GHC will get typed type-level programming and end this highly embarrassing state of untypedness. ;)
  • We'll have a verified compiler-and-OS toolchain for compiling and running some kinds of apps. It won't be x86-based.
  • All kinds of stuff targetting JavaScript.
  • Split stacks and maybe some scheduler improvements will be shown competitive with green threads in the millions-of-threads!!1! (anti-)scenario.
I'd really like to predict some insanely great new PL for 2011...

Sunday, December 19, 2010

Quick, what does this machine do?


From Dieter Rams, ten principles for good design.

HXA just published A comment adding to Dijkstra on natural language programming:
Software must always be an ‘un-natural language’ because its (ultimate, essential) purpose is different (and particular): it is not communication, it is design. ...

When you look at software, what you see is not a language, it is a
machine.
I'll have to come back to this topic at another time, for now take it as a fine weekend inspiration.

[Update: Of course, Lev Manovich comes to mind immediately. From his Info-aesthetics:
Never before a single machine was an engine of economy -- AND the main tool for representation.


[Further riffing off on this idea reveals that our major platform for delivering software is called a (markup) language.]]

Thursday, December 16, 2010

On Git

Giles Thomas commenting on A Guide to GIT using spatial analogies:

A much simpler view IMO is to consider the Git repositories as narratives, or meta-narratives, each committer being seen as an author of one or more. This is made particularly clear in the git man pages, where a predominant concept is the distinction between closing and opening — in a sense, it promotes the use of structural deappropriation to challenge the linear view of “revisions”. As Torvalds says, “[l]inearity is fundamentally impossible”, so the repository — or at least, a branch — is interpolated into a realism that includes the “current” revision (tip or head, as you prefer) as a reality. To put it in political terms, Mercurial deconstructs Marxist socialism, while Git analyses capitalist construction.

Tuesday, December 14, 2010

HTTP request visualization


You can clearly see the handshake, slow-start ramp-up and full bandwidth phases.
Packet Flight: HTTP request @ 40X from Carlos Bueno on Vimeo. (via Russ Cox)

Saturday, December 4, 2010

objectophilia alert!

Here is our latest minicluster design. (Ron Minnich on 9fans (hat tip to Chris))



Plan 9 is in the works. This is a nice test and development platform
for software. We're going to make the design available in a way that
people can easily build their own.

Wednesday, December 1, 2010

User space? Almost certainly not.

> Should this kind of thing be done in user space?

Almost certainly not.

First off, user-space is a fragmented mess. Just from a "let's get it
done" angle, it just doesn't work. There are lots of different thing
that create new tty's, and you can't have them all fixed. Plus it
would be _way_ more code in user space than it is in kernel space.

Secondly, user-space daemons are a total mess. We've tried it many
many times, and every time the _intention_ is to make things simpler
to debug and deploy. And it almost never is. The interfaces end up
being too painful, and the "part of the code is in kernel space, part
of it is in user space" means that things just break all the time.

Finally, the whole "user space is more flexible" is just a lie. It
simply doesn't end up being true. It will be _harder_ to configure
some user-space daemon than it is to just set a flag in /sys or
whatever. The "flexibility" tends to be more a flexibility to get
things wrong than any actual advantage.

Just look at the patch in question. It's simple, it's clean, and it
"just works". Doing the same thing in user space? It would be a total
nightmare, and exactly _because_ it would be a total nightmare, the
code would never be that simple or clean.

Linus
(from LWN)

Ghosts of Unix Past

One of the most awesome collection of articles on software design is Neil Brown's

Nulla Salus Extra Ecclesiam

There is always a well-known solution to every human problem - neat, plausible, and wrong. — H. L. Mencken
In software design, there often appears to be a situation where you either get a concept exactly right, or it will just be plain wrong and mess up your entire downstream development. (Maybe sometimes there are multiple good ways to solve a problem, instead of just one, but the wrong ones will always outnumber them.)

I think a good example is the difference in lexical scoping in Scheme vs Python. For my tastes, Scheme gets it exactly right, and harvests a lot of benefits from its clean design, whereas Python is just plain wrong, and harvests a lot of pain, everywhere. [In my limited experience.]

How can we avoid making systemic design errors? One way is to stick to what works. If you're designing a new PL, and your lexical scoping is different from Scheme or Haskell, I think you have a lot of explaining to do.

To know what actually works, you need taste. To create new designs without systemic error, you need a lot of midnight oil.

Which brings us back to:
Why do you glorify doing something new and stupid, when doing good things well is what people really should be admiring. — Linus Torvalds

Tuesday, November 30, 2010

Wow

A Turing Machine in the classic style:



Here's the awesome video:



(via LtU)

Systems programmers relax, most advice isn't for you

I'm an incredibly slow programmer (in the hobby projects I do for the love of it, that is). My ratio of thinking to typing must be around 10,000 : 1. Just to give you an indication of how slow I am: there's this one nagging project on which I've been working more or less continuously since... 2001.

Thus, programming advice of the DTSTTCPW variety (i.e. 99% of programming advice) always makes me a bit uneasy. Instead of thinking - shouldn't I just be crankin' out code, putting it online, and blogging about it on LJ?

But a couple of days ago I made an observation: almost all programming advice applies to applications, but what I'm mostly working on are systems. Since then, I'm sleeping better.

As Dr. Theodor Holm Nelson says, you can't get to the moon by piling up chairs. You may be able to learn something about an application by cranking out a quick and dirty prototype. But for systems, prototyping seems futile. Either your ideas work, or they don't, and a quick gedankenexperiment will usually be enough to find out.

Of course, I don't want to denounce experimentation as worthless. You have to know whether your ideas will be useful and work on really existing computers. But if your system is so removed from reality or complex that you're unsure about that, you're probably doing it wrong.

Take Unix pipes. There's an early memo about the need for pipes from 1964. And in 1972, pipes appeared in Unix. Do you think they spent 8 years DTSTTCPWing? I don't think so. I think that in these 8 years they spent a lot of quality time thinking about the issue, and when they had the right idea, they just knew it was right, and went ahead and implemented it.

So, to all systems programmers out there: most programming advice isn't for you. Relax, and take the time to find the right ideas.

That looks like one fine book

(via proggit)

Thursday, November 18, 2010

Numbers Everybody Should Know

Julian Hyde graciously transcribed the following table from Jeff Dean's Stanford talk:

L1 cache reference0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1K bytes w/ cheap algorithm 3,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from disk 20,000,000 ns
Send packet CA->Netherlands->CA150,000,000 ns


For more details: Agner's Software optimization resources.

Foundations for Structured Programming with GADTs

Makes me wish I understood category theory: Foundations for Structured Programming with GADTs (neelk's LtU post) by Patricia Johann and Neil Ghani @ POPL 2008.
GADTs are at the cutting edge of functional programming and become more widely used every day. Nevertheless, the semantic foundations underlying GADTs are not well understood. In this paper we solve this problem by showing that the standard theory of datatypes as carriers of initial algebras of functors can be extended from algebraic and nested data types to GADTs. We then use this observation to derive an initial algebra semantics for GADTs, thus ensuring that all of the accumulated knowledge about initial algebras can be brought to bear on them. Next, we use our initial algebra semantics for GADTs to derive expressive and principled tools -- analogous to the well-known and widely-used ones for algebraic and nested data types -- for reasoning about, programming with, and improving the performance of programs involving, GADTs; we christen such a collection of tools for a GADT an initial algebra package. Along the way, we give a constructive demonstration that every GADT can be reduced to one which uses only the equality GADT and existential quantification. Although other such reductions exist in the literature, ours is entirely local, is independent of any particular syntactic presentation of GADTs, and can be implemented in the host language, rather than existing solely as a metatheoretical artifact. The main technical ideas underlying our approach are (i) to modify the notion of a higher-order functor so that GADTs can be seen as carriers of initial algebras of higher-order functors, and (ii) to use left Kan extensions to trade arbitrary GADTs for simpler-but-equivalent ones for which initial algebra semantics can be derived.
Mentioned by Oleg in GADTs in OCaml.

Wednesday, November 17, 2010

Boltzmann Samplers for the Random Generation of Combinatorial Structures

To create complex random data objects and shapes like these, use Boltzmann sampling.


From the paper Boltzmann Samplers for the Random Generation of Combinatorial Structures.

While we're at it, Z-order space-filling curves are hot:



In 3D:

Why are objects so unintuitive

In the LtU permathread Why are objects so unintuitive?, Jonathan Shapiro adds another piece to the puzzle:
All of the mappings we have so far from real-world problems to software realizations are either bad (in the sense of imprecise and convoluted) mappings or map to pretty bad models (in the sense that the models are either devoid of semantics or have semantics too complex for the working programmer to really understand). OO "works" because it is the least-bad and least-incomprehensible mapping that has been identified to date.
The thread is also notable for David Barbour's treatise Being poor at modeling is essential to OOP:
The poor modeling facilities of OOP are caused by essential properties of OOP. These essential properties include encapsulation and message-passing polymorphism. These are the same properties that make OOP powerful for configurable modularity, scalability, and object capability security. ...

I do not believe that modeling 'tax reports' in OOP is something that we would 'want' to do. Ever. We might need a tax-report value object as an artifact of the dataflow system (e.g. if tax-reports are dumped into the system as complex form objects), but that's just plain-old-data and could be done just as easily in functional or procedural, and is really not 'OOP'. ...

Your comments (modeling the situation, wanting to model the 'tax reports' explicitly even when it is unnecessary, etc.) and those like them (being extremely prevalent in OO books and the education), are what lead people to start each program by writing a business simulator when what they want are business data processors.

Thursday, November 11, 2010

Dependent types linkdump

This dependent typing thing has me. For the first time in years I find I'm actually learning something about programming. There is life after the Observer pattern.

Here's a list of some papers I've found worthwhile or just found, and not read yet...

Fun with Type Functions (slides). Oleg Kiselyov, Simon Peyton Jones, Chung-chieh Shan. 2010.
How to fake it with Haskell's recently added type families. But Haskell increasingly looks horribly impure: it's not just that every type's inhabited by bottom, type-level programming is untyped! I think they really should work on their purity! ;) Mentions Ωmega's kind system - my prediction: typed type-level programming is coming to Haskell soon.
Proof-assistants using dependent type systems. Henk Barendregt, Herman Geuvers. 2001.
Recommended as a great intro to all things dependent typing. Not by me though, as I still have to read it.
ΠΣ: Dependent Types without the Sugar. Thorsten Altenkirch, Nils Anders Danielsson, Andres Löh, Nicolas Oury. 2009ish.
What's under the sugary hood of dependent programming languages. For when I develop a dependent language - because I'll have to.
A Tutorial Implementation of a Dependently Typed Lambda Calculus. Andres Löh, Conor McBride and Wouter Swierstra. 2009.
Also frequently mentioned as an intro to dependent types.
Verified Programming in GURU. Aaron Stump. 2009.
A book on dependent programming with a focus on verification.
Generic Programming with Dependent Types (part II, part III). Stephanie Weirich. 2010.
Slides focussing on the extra expressivity afforded by dependent types. Agda.
An Epigram Implementation. Edwin Brady, James Chapman, Pierre-Evariste Dagand, Adam Gundry, Conor McBride, Peter Morris, Ulf Norell, Nicolas Oury. 2010.
The 300-page literate Epigram (2?) implementation.

Sunday, October 31, 2010

Concurrency as Basis for Scalable Parallelism

David Barbour's Concurrency as basis for Scalable Parallelism may change the way you think about programming:

It has been observed that many 'concurrent' applications fail to scale beyond a few parallel components. In many cases, some sort of data-parallelism is feasible (e.g. parmap, SIMD, GPU) and we should certainly be able to leverage those opportunities! But I'd like to address just the concurrency aspect - the coordination of diverse systems and interests - and argue that even that is a sufficient basis for scalable parallelism, assuming we leverage it properly.

The scalability of concurrency as a basis for parallelism comes in realizing that the number of relationships in a system can rise O(N^2) with the number of components. Thus, we need to embed most significant computation into the relationships, rather than the components. ...

When most computation is moved to the relationships, the resources used by a service will scale commensurately with the number of clients it directly services - and vice versa; resources used by a client will be commensurate with the number of services it directly uses.
It changed my thinking: from "ouch, this is a lot of data to process" to "wow, lots of opportunities for concurrency, and thus parallelism".

Monday, October 25, 2010

Notes on Dependent Types

Since discovering Ωmega, I've been drawn into the fantastic world of dependent typing. Dependent types allow the expression of static invariants that your program is then guaranteed to respect dynamically. This works up to computation: proofs of static invariants are programs, so arbitrary invariants may be expressed by the programmer, and verified by the type system. I repeat, arbitrary invariants. If you're not salivating yet, you might want to skip this post.

A couple of systems I've (superficially, so far) looked at are Ωmega, Epigram, Coq, and Agda.

Ωmega is closest to existing FP languages like Haskell, which makes it easier to grok than the others for newbies - it's not a full-spectrum dependent type system by design. Technically speaking, Ωmega is "faking it" by maintaining a strict phase distinction between the value and the type level (and the kind level ...). Proofs are represented as runtime instances of GADT datatypes.
GADTs, or generalized algebraic datatypes, are a more powerful (generalized) form of ordinary ADTs, such as ML's 'a list, that are used as datatypes (or "object system") in many dependently typed systems. GADTs are very closely related to generics in Java and C#; understanding generics and their proposed higher-level extensions thus provides a gentle route into dependent types.

Java and C#'s object systems already have all of the power of GADTs, only they're not as easy to use, and they are not as typesafe. What separates GADTs from ADTs can be seen in this Java code:

class A<T> {}
class B extends A
<Foo> {}
B extends A with T fixed to Foo, so instances of B have type A<Foo>, by subtyping. With ordinary ADTs, B would also need to be parameterized by T - individual type constructors of an ADT, which we've expressed here as Java classes, cannot fix a type parameter to a specific type. GADT is attractive because it's a modest extension of existing ADT theory, and yet enables a wide range of data structuring styles used in fancypants typeful programming.
Epigram is programmatically aligned with Ωmega - both are passionate about the advances in programming made possible by dependent types, and vocally call programmers and researchers to join them on their quest. Epigram, in contrast to Ωmega, goes whole hog dependent typing. This includes arbitrary type-level computation, a succinct 2D-layouted syntax, and interactive support by the development environment, that goes beyond what IDEs for ordinary languages offer (and can offer).

Coq is different from Ωmega and Epigram in that it's designed as a proof assistant, and not a dependent programming language. Adam Chlipala maintains that Coq is the most useful implementation to date for expressing complex proofs, and that systems like Epigram are more useful when proofs arise very naturally from the structure of the program and/or its data. Coq programs, such as Xavier Leroy's verified C compiler, have a different flavor from ordinary and dependent functional programs. Concoqtion has an O'Caml value level, and a Coq type level.

One problem with all dependently typed systems is that you have to deal with non-terminating type-level programs somehow, otherwise your logic becomes unsound. There are different strategies: Ωmega type-level programs are written as term rewriting systems, that have to terminate; Epigram and Coq programs are written in total languages - every program terminates; in Agda, programs are interrupted when they get stuck in a loop - soundness is regained through a separate termination check (I think). In practice, nontermination doesn't seem to be a big problem.

Some of the most interesting applications of dependent types are in the area of metaprogramming by program generation: you prove interesting static properties of your program, and then spit out a program in another language that is guaranteed to obey them dynamically. Which means that you can throw away and erase a lot of stuff at compile-time, that other languages have to remember and/or check at runtime. As Epigram implementers state, they have more information available about their programs statically, than they know what to do with. Consider: in a dependent language, once a program typechecks, you may not ever need to run it!

The existing systems for doing dependent programming all seem very closely related, and "conceptual epiphanies" can be translated between them. Dependent types are an amazing area of research & development, and I urge you to join it.

Update:

Stephanie Weirich maintains that dependent types not only aid verification, but they also increase expressiveness of a language, and that generic programming is a killer-app for dependently-typed languages.

Also: what are dependent types actually good for? I think a good example is found in Why Dependent Types Matter: it shows how to write a list-sorting function, whose type proves that the output lists are always correctly sorted. I think a lot of software will be written this way in the near future.

Wednesday, October 20, 2010

Keeping up with the GNUses

Just wanted to mention that for keeping up with GNU/Linux progress, the following two sources are very good:

Tuesday, October 19, 2010

Split stacks!

This could be big - in the October 2010 GNU Toolchain Update (subscribe!), Nick Clifton reports that split stacks have been added to GCC. With split stacks, thread stacks can grow (and shrink (?)) automatically.

Monday, October 11, 2010

al-Khwārizmī



I think that's a great statue showing the act of algorithm design.

Monday, September 27, 2010

Program Your Size

When programming, I often find myself striving for minimalism. Trying to make the program as succinct as possible, sweating over details like how to arrange the sections in a file, or the number of files to create.

But minimalism in programming may cause programs to appear cramped, may persuade you to choose forms of expression that are unneededly terse, and may also lead to programs that are hard to refactor or extend.

Point is, when you start programming, you don't know how big the program's gonna be, so starting with minimalism may be a bad choice. When looking at the sources of some huge programs, I found that some of them have found very relaxed ways to deal with their sprawling size. Gnus for example comprises dozens of large source files, containing hundreds or thousands of operational definitions, and yet remains very readable.

I don't want to get into the details of architecting large programs - I'd rather like to focus on a way to write programs, from the start, so that the source then has a "density" that's adequate to its size. Gnus for example is meandering, and that's perfectly adequate - Gnus being a mail reader for the most customizable OS in existence. That's a job that's never done, so the source should reflect that - you can't write a cute blog post about a program like Gnus, there's no single overarching, minimalist design principle.

Maybe, as Perlis said, every program needs to be written at least twice. After the first run, we know the approximate size of the program, and can then find a code density/terseness that's adequate to that size.

I find minimalism in source code is often limiting. In my next program, I'll write it from the start as if it were really huge.