Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

Thursday, November 18, 2010

Numbers Everybody Should Know

Julian Hyde graciously transcribed the following table from Jeff Dean's Stanford talk:

L1 cache reference0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1K bytes w/ cheap algorithm 3,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from disk 20,000,000 ns
Send packet CA->Netherlands->CA150,000,000 ns


For more details: Agner's Software optimization resources.

Tuesday, September 7, 2010

Fast dynamic casting

Fast dynamic casting, a cute paper by Michael Gibbs and Bjarne Stroustrup:
We have demonstrated that it is possible for a linker to generate integer type IDs for classes such that it may be verified by a simple integer modulo computation that one class derives from another in an object-oriented language. When combined with a suitable way of adjusting offsets, this method provides for a fast, constant-time dynamic casting algorithm. A 64-bit type ID is capable of representing quite large class hierarchies containing thousands of classes and at least nine levels deep.
I knew that prime numbers could be used for this somehow, but these guys have already worked it out.

Relatedly, I've also been thinking about using perfect hashing to implement multiple dispatch. It should work just fine, and actually reduce the number of type tests, compared to single dispatch. Unforch, I'm still in the make-it-work phase, which is followed by the make-it-correct phase. Only then does the glorious make-it-fast phase start.

Sunday, July 25, 2010

Efficient Method Dispatch in PCL

Efficient Method Dispatch in PCL by Gregor J. Kiczales and Luis H. Rodriguez Jr. is an all-out awesome yet compact paper about generic functions in their portable CLOS implementation.

Some salient points:

They add a level of indirection between objects and their classes, called wrappers. An object points to a wrapper and that points to the class of the object. Inside each wrapper they store a random value for use by the memoization tables of generic functions.

Each generic function has a small memoization table that maps the hashed random value of a wrapper to the program counter (PC) of the corresponding method defined for the wrapper. The memoization tables use a simple random % memotablesize computation, and linear scanning after that.

How they handle class redefinition is swell: they simply set the wrapper's random value to zero, thereby invalidating the wrapper. They always leave the first entry of generic functions' memoization tables empty, and because an invalidated wrapper's random value will now hash to that empty entry, they only need to check for an invalid wrapper after a memoization miss occurs. Genius.

Generic functions run through a kind of state machine, depending on how they're used. (Similar to how polymorphic inline caches switch between mono-, poly-, and mega-morphic states, but with additional states tuned to CLOS semantics.) For example, if a generic function is only ever used with two different classes it can use a simple IF to test for these two classes. Here's another kicker: if a generic function's state machine detects that it's used as a slot accessor, it can directly store the slot offset instead of the method's PC in the memoization table. AWESOME.