RavenDB Multi GET support
One of the annoyances of HTTP is that it is not really possible to make complex queries easily. To be rather more exact, you can make a complex query fairly easily, but at some point you’ll reach the URI limit, and worse, there is no easy way to make multiple queries in a single round trip.
I have been thinking about this a lot lately, because it is a stumbling block for a feature that is near and dear to my heart, the Future Queries feature that is so useful when using NHibernate.
The problem was that I couldn’t think of a good way of doing this. Well, I could think of how to do this quite easily, to be truthful. I just couldn’t think of a good way to make this work nicely with the other features of RavenDB.
In particular, it was hard to figure out how to deal with caching. One of the really nice things about RavenDB’s RESTful nature is that caching is about as easy as it can be. But since we need to tunnel requests through another medium for it to work, I couldn’t figure out how to make this work in a nice fashion. And then I remembered that REST didn’t actually have anything to do with HTTP itself, you can do REST on top of any transport protocol.
Let us look at how requests are handled in RavenDB over the wire:
GET http://localhost:8080/docs/bobs_address HTTP/1.1 200 OK { "FirstName": "Bob", "LastName": "Smith", "Address": "5 Elm St." } GET http://localhost:8080/docs/users/ayende HTTP/1.1 404 Not Found
As you can see, we have 2 request / reply calls.
What we did in order to make RavenDB support multiple requests in a single round trip is to build on top of this exact nature using:
POST http://localhost:8080/multi_get [ { "Url": "http://localhsot:8080/docs/bobs_address", "Headers": {} }, { "Url": "http://localhsot:8080/docs/users/ayende", "Headers": {} }, ] HTTP/1.1 200 OK [ { "Status": 200, "Result": { "FirstName": "Bob", "LastName": "Smith", "Address": "5 Elm St." }}, { "Status": 404 "Result": null }, ]
Using this approach, we can handle multiple requests in a single round trip.
You might not be surprised to learn that it was actually very easy to do, we just needed to add an endpoint and have a way of executing the request pipeline internally. All very easy.
The really hard part was with the client, but I’ll touch on that in my next post.
Comments
Out of interest, does this gain enough compared to multiple pipelined http 1.1 requests to be worth the "cost" of losing upstream and/or reverse proxy caching that you'd get with GETs?
Oren, any inspiration taken from https://developers.facebook.com/docs/reference/api/batch/ ? Not that it is a bad thing really.
With this approach you cannot rely on any built-in HTTP caching functionality (if you are using it), because the framework does not understand you custom protocol. It is basically like using a custom binary protocol - nothing built-in.
When/if there's more broad support for it, perhaps Raven could utilize Google's SPDY protocol in addition to HTTP. I believe multiple calls per connection are one its main benefits.
Here's the link: http://www.chromium.org/spdy/spdy-whitepaper
Steven, That assumes that you have a way proxy there. You don't usually have a proxy between the app server and the db server. We actually handling caching of this internally pretty well, so I don't think that is an issue. If you do have a proxy somewhere, you can make that decision on your own, based on real perf numbers.
Njy, No, first time I hear about this, but it is really about the only thing that makes sense, to be truthful
Tobi, Not really, no. If you have a proxy involved, that would bypass it, yes. But RavenDB handles the protocol at both ends and make it appear just like HTTP, that includes caching. A child request can send a 304, for example, which will be correctly processed
SPDY is interesting, but as this is the very first time I heard about it, I guess it isn't really ready to be used yet.
@Ayende no it doesn't, it just assumes you're not stopping HTTPs "built in" caching support form working, should you want to use it.
You may have excellent caching in RavenDb, but that's missing the point - this breaks the semantics of the web and my question was is that "cost" worth it for whatever gains you are getting?
The answer may well be "yes", but that doesn't make the question "pointless" if you don't have a proxy :-)
Steven, I don't really understand what you mean here. When I am talking about RavenDB caching, I am talking about the low level HTTP cache, we extended that to also support multi_get, but that is it. All the semantics are still the same. And the cache level is on the request, not on the batch.
Sure, you can't do that through a proxy, but that is the only limitation.
SPDY was announced a while ago and in fact Google is already using it when you use Chrome and their services. Check out this thread:
http://groups.google.com/group/spdy-dev/browse_thread/thread/4c2396ecbc36b1c4
Two points against SPDY for RavenDB: 1. No .NET client. Ayende probably doesn't want to take the time to implement a new protocol stack. 2. The name makes me think "spidey", not "speedy".
I am wondering the same thing as Steven, how is this better than Persistent/Pipelined connection that HTTP 1.1 introduced?
I still see two request and two responses it's just the order of events. Either way with HTTP 1.1 only one TCP Connection will be used.
SPDY allows for multiple concurrent requests per TCP connection, but this implementation of "http in http" encoded as json looks to be a FIFO just like HTTP 1.1 pipelining.
Perhaps I am just missing something.
Justin, While you have a single TCP connection using pipelined mode, it is still a request/reply mode. So you have to send a request, wait for reply, etc. In this mode, you send a single request to the server, get a single reply back, but you get it at once.
This means that you save on round trips, not on connections
Looks to me like your sending two request, and receiving two replies either way:
request1 request2 reply1 reply2
instead of:
request1 reply1 request2 reply2
Either way the same amount of data is sent/received, a single TCP connection is used and the same total time spent, so the efficiency gains are what, are the serialization/deserialization parts expensive to setup tear down on the client/server?
The way it would be exposed in the client API could still look like a single request/reply it's really just how the request/reply is encoded on the single TCP connection at that point.
It would interesting to see how much faster your implementation is over just pipelined http 1.1, and where the extra overhead is coming from.
When rhino receives the batch does it send off the contained http requests?
Ayende, I seemingly misunderstood your implementation. If you own the caching layer, you can of course interpret your response as you see fit. I thought you were using the caching layer of something else.
But if you own the caching layer, what was the problem to begin with? HTTP is just an implementation detail in this case.
Justin, The problem is saving the round trip. Let us take the case of grocery shopping as an example, you need to buy milk & sugar.
HTTP 1.x (making two requests using two separate tcp connections) means that you have to leave the house, get to the store, get the milk, pay, go home, pay, milk in fridge, go back to the store, get the sugar, pay, put sugar in cupboard.
HTTP 1.1 (making two requests using a single tcp connection) means that you have to leave the house, go to the store, use the drive in window to get the milk, drive home, drop the milk off without getting out of the car, drive back to the store, get the sugar, drive home again.
Multi GET approach (making a single request) means that you go to the store, pick up milk & sugar and go home.
The major difference is that we only have to go to the server _once_. Where as even with HTTP 1.1, using a single tcp connection, you have to go to the db multiple times.
Colin, There are no rhinos involved :-) And I don't understand the question
Tobi, The major difference is _how you detect changes_. We use the HTTP methods to do that 304, etags, etc. We basically implemented HTTP caching as part of RavenDB client API.
I really really think HTTP calls in db level does not makes sense for a high performance application, and I guess everyone wants their app to be well performed.
One other problem is -- Code gets lot messier here and abstraction is tough to achieve.
I don't really follow you here It isn't http calls at db level, the http is merely a transport for the calls, nothing more. And the question is how we catake advantage of that and make the most performing db /app using it.
Pedro, Can you show me how this can be done in .NET ?
Ayende,
With HTTP 1.1 pipelining, the second request can be sent without waiting for the first reply [RFC 2616, section 8.1.2.2]. Pipelining is more than only sharing the same connection.
-- GET #1 request --> -- GET #2 request --> <-- GET #1 response -- <-- GET #2 response
1) On the client side, HttpWebRequest supports pipelining (see http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.pipelined.aspx). Note however, that this pipelining only starts after the client is assured that the server is HTTP/1.1 compliant, that is, after the first response.
2) HTTP.SYS and HttpListener also support pipelining. However, from my observations, it appears that the requests are delivered to the handlers (BeginGetContext callback) in sequential order. This means that the Nth request starts processing only after the (N-1)th request is completed.
Comment preview