Monday, July 20, 2015

What I learned about Urbit so far

[Updated, see comment and this Reddit AMA with the Urbit team, that clarifies a lot of issues.]

Urbit is some kind of new operating system design thingy, that is kinda hard to categorize.

Some interesting design points are:
  • Urbit restricts the number of identities in the system to 232. This means Urbit doesn't have enough identities even for currently living humans. In line with the usual obfuscation going on in Urbit, such an identity is called destroyer.
  • Urbit is programmed in a weird programming language called Hoon. Hoon's (only) definition is a 10KLOC file hoon.hoon, written in itself. It uses practically random variable names (such as nail for "parse input", or vase for a "type-value pair"), not to speak of the "nearly 100 ASCII digraph 'runes'". The author acknowledges that the parser is "very intricate".
  • Efficiency-wise, Hoon is impractical as a programming language, so in the real world, the VM will recognize Hoon fragments, and replace them with C code (called jets).
This brings us to the question: why would anybody actually design an OS like that? The best explanation I've seen so far is by Kragen, who writes:


Manuel Simoni said...

Philip Monk sent the following email which clarifies some things (quoted with permission):

It's important to emphasize that anonymity is absolutely possible: just create your own self-signed comet. We're definitely not out to destroy anonymity; on the contrary, there's all kinds of great use cases for it. The thing about anonymous machines, though, is that they cannot have a default positive reputation. You shouldn't trust anything coming from someone anonymous without some amount of crypto establishing their identity. What we provide is a way to be *not* anonymous, which is hard to do in the current unix/internet system. On the internet, every app has its own concept of identity and reputation, and these don't tie together at all. Unix usernames, ip addresses, dns names, facebook accounts, reddit handles -- these are all identities, and they're almost totally unconnected. In urbit, identity is a unified concept built into the os and network.

As for hoon, I was definitely initially put off by it as well. It turns out, though, that after you get past the (admittedly significant) apparent complexity, it's an eminently practical langauge, and a real joy to use. I've never used a functional programming language -- even an impure one -- that felt so much like an imperative one. hoon is what you get when a systems guy needs a purely functional programming language to build an operating system. It's a lot like Haskell without the math, and without the unix stuff.

Since hoon is homoiconic, the syntax isn't that bad. There's a fair number of gotchas that should be smoothed out, but it's all pretty localized. There may be cases where you're not sure what it's going to do, but it's pretty rare for their to be a case where you think it's going to do one thing and it does another. We do apologize for a lot of the nonsense names. Humans are actually quite good at associative memory, and you learn them pretty quickly, but in user-level code we use long-form descriptive names. It's only in the kernel that we use the lapidary style. Also, we generally use only about 30 or so of the runes, and you stop noticing them pretty quickly. It's a rather pleasant experience when punctuation is structure and alphabetic text is content.

The jet system is unique, for sure, but it actually buys you quite a lot. In some sense, jets are just a part of a "sufficiently smart" interpreter. Most of the jets are for fairly low-level operations -- from bit manipulations to fold to map operations. Nothing in the kernel modules, for example, is jetted. What jets buy you is simplicity. You don't ever have to actually break the abstraction, so unless you have a malfunctioning jet you can pretend you're doing it all in pure nock. Without jets you can't use something as simple as nock/hoon to build something as complex and practical as urbit.

The claim that urbit is designed to disempower users is completely backwards, and can only be attributed to gossip passed around by people who never gave a serious attempt to understand the system. On the current SAAS-dominated internet, we are all digital serfs to whoever we have an account with. If some service decides to shut down your account -- or, indeed, their whole service -- then all your data is gone and you can't use it anymore. Megacorps mine and sell our personal data to advertisers as their business model. In a personal cloud computer like urbit, the user controls their own online presence (including their identity), and nobody can take that away from them.

dmbarbour said...

A purely functional base language (like Nock) and use of 'jets' to recognize and accelerate common functions is a fine idea. It really allows low level use of a purely functional language. I was planning something similar with my Awelon Bytecode (ABC) even before reading about Urbit, and I was happy to see the idea validated in practice.

Not sure how I feel about Hoon. Seems difficult to judge something like that without really diving in and trying it.

Bayle Shanks said...

Here's my take on the motivations for Hoon (not necessarily the rest of Urbit). I don't know Hoon, although i'm learning it, so this may be wrong.

The reason for Hoon is to have a language with the properties: (a) based on a core with a tiny spec (Nock) (because Urbit hot-patches itself with code over the network, so in order to specify Urbit's network protocol completely, the complete operational semantics of Nock must be included in the protocol spec, and so an extremely small spec was desired), (b) the core can 'eval' without overhead (because hot-patching), (c) purely functional, (d) does not (natively) support cyclic data (because cyclic data is harder to serialize), (e) good at validating untyped data; defining a custom type in Hoon is the same as defining a validation/normalization function that attempts to cast untyped data into the type.

About Nock: Nock is a purely functional 'assembly language'. One of its design goals is to have an extremely short specification; its spec fits on one page. Nock may be seen as elaboration of SK combinator calculus in the same way that traditional assembly language may be seen as an elaboration of Turing machines. Nock has two data types (integers and binary trees) and 7 primitive instructions: S and K from SK combinator calculus; integer successor; cons; test for structural equality; test whether a value is an integer or a binary tree; and selection of one node from out of a binary tree (given the tree and an address).

Hoon is intended to be the 'C' to Nock's 'assembly language'. It is supposed to be a thin layer above Nock, 'a glorified macro assembler' rather than a fundamentally higher level of abstraction. So Hoon is an answer to the question "In a world where our machine model was SK combinator calculus instead of Turing machines, what would be the analog of C?".

As for the crazy names: orthogonal to the above, the creator of Hoon made a design choice w/r/t naming things. He believes that (a) it's confusing to have two things which are close but not quite the same be given the same name; and (b) the analogs to Turing-machine-land programming constructs in SK-combinator-machine-land tend to be close but not quite the same; therefore he invents an entirely new vocabulary for Hoon. Furthermore, he thinks (c) names in programming languages should be short, one syllable, natural language words; and (d) it's unimportant if a newbie has to memorize many such words; therefore the new vocabulary has words like 'arm', 'leg', 'kelp', rather than longer descriptive names. Furthermore, he chose to (e) reject alphanumeric keywords; Hoon's built-in keywords should all be punctuation. The creator claims that these design choices are validated by the evidence that he's talked to a number of people who have learned Hoon, and who after learning it, agree that it is neither crazy nor difficult.

My personal opinion is that Nock is beautiful, and I'm interested in learning Hoon just to see what 'the C of combinator calculus' might look like. I'm not (yet) a fan of the naming choices in Hoon.