Sunday, August 2, 2015

RESTful RDP with big values

What if you want to use a big value, like a whole database table or weblog, as a Reactive Demand Programming signal value? This would make it possible to use RDP to orchestrate things like incremental MapReduce pipelines. Here's one weird trick to make it work.

In effect, each RDP signal becomes a RESTful server, speaking an HTTP-like protocol. Clients of a signal remember the ETag of the last version of the signal they've processed, and send it to the server on subsequent requests.

The protocol to retrieve the value of a signal may work like this:
  • If a client hasn't seen a previous version of the signal, it sends no ETag. The server replies with the complete value. The value may be split up into multiple pages, using something like an Atom paged feed. (Other non-sequential kinds of splits are possible: for example, tree-like data like a filesystem can be fetched in parallel using hierarchical splits.)
  • If a client has seen and processed a previous version of the signal it sends the stored ETag. There are three possibilities:
    • The content hasn't changed (i.e. the server's ETag matches the client-sent ETag), so the server replies with a status code that indicates that there is no new content (HTTP's 304 Not Modified).
    • The content has changed, so the server replies with a diff of the client's version versus the server's version.
    • The content has changed, but the server is unable to provide a diff against the client's version. This can happen for servers that do not keep a complete history, or also if the server determines that it's more efficient to have the client retrieve the whole value again instead of sending a diff. The client has to re-fetch the whole signal value, as if it had seen no previous version of the value.
I haven't worked out all the details, but I think this scheme could be made to work.


Daniel Yokomizo said...

Looks good. 304 should be used instead of 204 in that case though. RFC 7232 may provide some food for thought.

dmbarbour said...

Leveraging HTTP protocol for RDP is something I've planned to do. Though, my work on RDP was tabled to work on other aspects of my Awelon project. My sketched designs for RDP over HTTP had pushed differences into the URL query string. The reply could include information such as stability of the signal and a URL to query again in the near future.

I was planning to leverage this together with exponential decay, i.e. such that users can examine the deep past (albeit at diminishing detail).

With the e-tag approach, I'd recommend use of `accept` (on GET) and `content-type` fields (on the result) to indicate diffs vs. whole-value signals.