Monday, May 10, 2010

Concurrency FUD

A recent comment — and a golden Czech pilsner — puts me in the right mood to comment on one of my pet peeves: the fallacy of an urgent need for in-language concurrency:
few people realize that stateful objects are dead in the water right now
Yeah, if you use them from multiple threads. Sure. But you don't have to do that, unless you're very special, or you're writing very special programs. Like say, you're a scientific programmer raised on a diet of FORTRAN and C. Then of course, poor you need all the language support for concurrency you can get.

For the kind of programming I'm interested in (internet apps and servers), do like DJB told ya and spawn a friggin' process. Really, for servers, creating much more threads than you have hyperthreads is a design mistake. That's my current thinking, YMMV. Linux offers such great new tools (like eventfd, signalfd, and timerfd), that you don't have to leave the cosy confines of your epoll-loop ever again.

Should all programs be event-driven and written as small Unix processes? Of course not. But the paradigm is applicable to so many server-style programs. And it forces you to communicate via protocols, which may lead you to a better design to begin with. And it also enables managing and upgrading clusters by simply restarting the process (or a new version). All of which in-language concurrency doesn't do.

I'm not against threads or whatever in the language, they have their uses. I just think that most of the time, another solution is actually better. If I were into non-event-driven design, I'd probably look at goroutines, which communicate via channels.

All in all, I don't think we have to redesign our languages for manycore. Keep on using them good old objects in your (hopefully) dynamic PL, and just spawn more of your apps. Quite likely, your architecture will become better for it.

6 comments:

Anonymous said...

Well ... you're right, assuming, of course, that Windows portability isn't a hard requirement. ;-) But seriously, who is actually writing massively concurrent code in a language these days? Who *isn't* using Hadoop, MapReduce, NoSQL databases, PVM, MPI, etc."

Paul Smith said...

At PyCon this year, Joe Gregorio had a talk something along these lines, basically, threads are the wrong level of abstraction for most concurrency problems.

http://bitworking.org/news/2010/02/pycon

His point is that there are really only two concurrency models: CSP, and actors, threads are too primitive.

Both CSP and actors map well to the Unix process model and concurrency tools (Joe didn't say this, I'm extrapolating.)

swannodette said...

Heh. You argument sums up to: "I don't know about this single machine concurrency thing. Sure *some* people seem interested in lockless concurrency. But I mean those Haskellers, Apple Computer (Grand Central Dispatch), and Clojure, and others, who cares about that, that's just crazy talk. Yeah, yeah, I know, all those consumer grade laptops with the i5 and i7 processors that reveal 4 cores to the OS. Whatever. I just want to build web apps ok? Basically, you know it's like India. Sure it's amazing and nothing like I've ever seen before, but I'd have to walk out my front door first. Who wants to do that? I can watch it on the History channel just like I did last week about China."

Who's spreading the FUD? Just sayin ;)

Stateful objects cause plenty of other problems. Concurrency is only one advantage. But trivializing the benefits seems short sighted to me. The new tools mentioned above allow you to write concurrent code *naturally*:

defn update-flock [flock current]
(into [] (pmap #(subflock-run % current) flock)))

pmap will scale with the # of cores that I have on my machine. No passing messages, no receiving messages. Just go over my concurrency safe data structure with multiple threads of execution and give me the result. Thanks, bye.

Finally, distributed concurrency + single machine concurrency sure is fun: http://github.com/amitrathore/swarmiji

Manuel Simoni said...

@swannodette Your first paragraph sure is a fun read, but I don't think I follow.

I'm not against research into tackling concurrency better, heck, I find e.g. Data Parallel Haskell extremely exciting, it's just that I'm more interested in making do with the current tools. And in a quite constrained setting (internet servers).

Unknown said...

You are equating concurrency and parallel execution with each other. You are also equating concurrency with threads to some extent. I'd argue that concurrency and parallel exeuction are different beasts. Same with concurrency and threads.

The lure for concurrency as that certain programs become easier to write. The lure for parallellism is that it seems the only way to fully utilize a modern CPU and get a nice speedup of your program.

If your concurrency model is based on message passing I'll argue that the programs it leads to are usually without deadlocks. In fact, avoiding a deadlock in these programs are as easy as avoiding an infinite loop. Interestingly however, the programs also tend to be smaller and easier to maintain. Often the programs are not faster than one based on another architecture. There is an overhead in context switches to be paid, even though it can be extremely small.

Manuel Simoni said...

No, I'm not conflating concurrency and parallelism. I'm saying that event loops are good way to write a lot of (internet) apps.

See this post by Linus re message passing and deadlocks.