Thursday, October 13, 2011

How to beat the CAP theorem

Nathan Marz has an interesting proposal, How to beat the CAP theorem, that should resonate with all you FP shills out there:
  • make all your data immutable
  • index snapshots of your data using "purely functional" batch computing (e.g. MapReduce)
  • index the realtime data arriving after the last snapshot using an incremental computing system
  • for querying, merge the index of the last batch job with that of the incremental system
As Marz explains, incremental computing has a much higher complexity than batch computing. Using it only for the last few hours of data removes a huge complexity burden from your shoulders:
Since the realtime layer only compensates for the last few hours of data, everything the realtime layer computes is eventually overridden by the batch layer. So if you make a mistake or something goes wrong in the realtime layer, the batch layer will correct it. All that complexity is transient.

No comments: