RavenDB Multi GET support

time to read 3 min | 557 words

One of the annoyances of HTTP is that it is not really possible to make complex queries easily. To be rather more exact, you can make a complex query fairly easily, but at some point you’ll reach the URI limit, and worse, there is no easy way to make multiple queries in a single round trip.

I have been thinking about this a lot lately, because it is a stumbling block for a feature that is near and dear to my heart, the Future Queries feature that is so useful when using NHibernate.

The problem was that I couldn’t think of a good way of doing this. Well, I could think of how to do this quite easily, to be truthful. I just couldn’t think of a good way to make this work nicely with the other features of RavenDB.

In particular, it was hard to figure out how to deal with caching. One of the really nice things about RavenDB’s RESTful nature is that caching is about as easy as it can be. But since we need to tunnel requests through another medium for it to work, I couldn’t figure out how to make this work in a nice fashion. And then I remembered that REST didn’t actually have anything to do with HTTP itself, you can do REST on top of any transport protocol.

Let us look at how requests are handled in RavenDB over the wire:

GET http://localhost:8080/docs/bobs_address

HTTP/1.1 200 OK

{
  "FirstName": "Bob",
  "LastName": "Smith",
  "Address": "5 Elm St."
}

GET http://localhost:8080/docs/users/ayende

HTTP/1.1 404 Not Found

As you can see, we have 2 request / reply calls.

What we did in order to make RavenDB support multiple requests in a single round trip is to build on top of this exact nature using:

POST http://localhost:8080/multi_get

[
   { "Url": "http://localhsot:8080/docs/bobs_address", "Headers": {} },
   { "Url": "http://localhsot:8080/docs/users/ayende", "Headers": {} },
]

HTTP/1.1 200 OK

[
  { "Status": 200, "Result": { "FirstName": "Bob", "LastName": "Smith", "Address": "5 Elm St." }},
  { "Status": 404 "Result": null },
]

Using this approach, we can handle multiple requests in a single round trip.

You might not be surprised to learn that it was actually very easy to do, we just needed to add an endpoint and have a way of executing the request pipeline internally. All very easy.

The really hard part was with the client, but I’ll touch on that in my next post.

Tweet Share Share 25 comments

Tags:

Raven

Comments

18 Aug 2011
10:05 AM

Steven Robbins

Out of interest, does this gain enough compared to multiple pipelined http 1.1 requests to be worth the "cost" of losing upstream and/or reverse proxy caching that you'd get with GETs?

18 Aug 2011
10:51 AM

njy

Oren, any inspiration taken from https://developers.facebook.com/docs/reference/api/batch/ ? Not that it is a bad thing really.

18 Aug 2011
11:36 AM

tobi

With this approach you cannot rely on any built-in HTTP caching functionality (if you are using it), because the framework does not understand you custom protocol. It is basically like using a custom binary protocol - nothing built-in.

18 Aug 2011
12:05 PM

Brian Vallelunga

When/if there's more broad support for it, perhaps Raven could utilize Google's SPDY protocol in addition to HTTP. I believe multiple calls per connection are one its main benefits.

Here's the link: http://www.chromium.org/spdy/spdy-whitepaper

18 Aug 2011
12:59 PM

Ayende Rahien

Steven, That assumes that you have a way proxy there. You don't usually have a proxy between the app server and the db server. We actually handling caching of this internally pretty well, so I don't think that is an issue. If you do have a proxy somewhere, you can make that decision on your own, based on real perf numbers.

18 Aug 2011
13:00 PM

Ayende Rahien

Njy, No, first time I hear about this, but it is really about the only thing that makes sense, to be truthful

18 Aug 2011
13:02 PM

Ayende Rahien

Tobi, Not really, no. If you have a proxy involved, that would bypass it, yes. But RavenDB handles the protocol at both ends and make it appear just like HTTP, that includes caching. A child request can send a 304, for example, which will be correctly processed

18 Aug 2011
13:04 PM

Ayende Rahien

SPDY is interesting, but as this is the very first time I heard about it, I guess it isn't really ready to be used yet.

18 Aug 2011
13:13 PM

Steven Robbins

@Ayende no it doesn't, it just assumes you're not stopping HTTPs "built in" caching support form working, should you want to use it.

You may have excellent caching in RavenDb, but that's missing the point - this breaks the semantics of the web and my question was is that "cost" worth it for whatever gains you are getting?

The answer may well be "yes", but that doesn't make the question "pointless" if you don't have a proxy :-)

18 Aug 2011
13:21 PM

Ayende Rahien

Steven, I don't really understand what you mean here. When I am talking about RavenDB caching, I am talking about the low level HTTP cache, we extended that to also support multi_get, but that is it. All the semantics are still the same. And the cache level is on the request, not on the batch.

Sure, you can't do that through a proxy, but that is the only limitation.

18 Aug 2011
13:35 PM

Brian Vallelunga

SPDY was announced a while ago and in fact Google is already using it when you use Chrome and their services. Check out this thread:

http://groups.google.com/group/spdy-dev/browse_thread/thread/4c2396ecbc36b1c4

18 Aug 2011
16:26 PM

Chris Wright

Two points against SPDY for RavenDB: 1. No .NET client. Ayende probably doesn't want to take the time to implement a new protocol stack. 2. The name makes me think "spidey", not "speedy".

18 Aug 2011
16:57 PM

Justin

I am wondering the same thing as Steven, how is this better than Persistent/Pipelined connection that HTTP 1.1 introduced?

I still see two request and two responses it's just the order of events. Either way with HTTP 1.1 only one TCP Connection will be used.

SPDY allows for multiple concurrent requests per TCP connection, but this implementation of "http in http" encoded as json looks to be a FIFO just like HTTP 1.1 pipelining.

Perhaps I am just missing something.

18 Aug 2011
18:28 PM

Ayende Rahien

Justin, While you have a single TCP connection using pipelined mode, it is still a request/reply mode. So you have to send a request, wait for reply, etc. In this mode, you send a single request to the server, get a single reply back, but you get it at once.

This means that you save on round trips, not on connections

18 Aug 2011
19:13 PM

Justin

Looks to me like your sending two request, and receiving two replies either way:

request1 request2 reply1 reply2

instead of:

request1 reply1 request2 reply2

Either way the same amount of data is sent/received, a single TCP connection is used and the same total time spent, so the efficiency gains are what, are the serialization/deserialization parts expensive to setup tear down on the client/server?

The way it would be exposed in the client API could still look like a single request/reply it's really just how the request/reply is encoded on the single TCP connection at that point.

It would interesting to see how much faster your implementation is over just pipelined http 1.1, and where the extra overhead is coming from.

18 Aug 2011
21:22 PM

Colin Jack

When rhino receives the batch does it send off the contained http requests?

18 Aug 2011
22:10 PM

tobi

Ayende, I seemingly misunderstood your implementation. If you own the caching layer, you can of course interpret your response as you see fit. I thought you were using the caching layer of something else.

But if you own the caching layer, what was the problem to begin with? HTTP is just an implementation detail in this case.

18 Aug 2011
22:32 PM

Ayende Rahien

Justin, The problem is saving the round trip. Let us take the case of grocery shopping as an example, you need to buy milk & sugar.

HTTP 1.x (making two requests using two separate tcp connections) means that you have to leave the house, get to the store, get the milk, pay, go home, pay, milk in fridge, go back to the store, get the sugar, pay, put sugar in cupboard.

HTTP 1.1 (making two requests using a single tcp connection) means that you have to leave the house, go to the store, use the drive in window to get the milk, drive home, drop the milk off without getting out of the car, drive back to the store, get the sugar, drive home again.

Multi GET approach (making a single request) means that you go to the store, pick up milk & sugar and go home.

The major difference is that we only have to go to the server _once_. Where as even with HTTP 1.1, using a single tcp connection, you have to go to the db multiple times.

18 Aug 2011
22:32 PM

Ayende Rahien

Colin, There are no rhinos involved :-) And I don't understand the question

18 Aug 2011
22:33 PM

Ayende Rahien

Tobi, The major difference is _how you detect changes_. We use the HTTP methods to do that 304, etags, etc. We basically implemented HTTP caching as part of RavenDB client API.

22 Aug 2011
04:01 AM

Vadi

I really really think HTTP calls in db level does not makes sense for a high performance application, and I guess everyone wants their app to be well performed.

One other problem is -- Code gets lot messier here and abstraction is tough to achieve.

22 Aug 2011
06:53 AM

Ayende Rahien

I don't really follow you here It isn't http calls at db level, the http is merely a transport for the calls, nothing more. And the question is how we catake advantage of that and make the most performing db /app using it.

28 Aug 2011
18:43 PM

Ayende Rahien

Pedro, Can you show me how this can be done in .NET ?

28 Aug 2011
22:02 PM

Pedro Félix

Ayende,

With HTTP 1.1 pipelining, the second request can be sent without waiting for the first reply [RFC 2616, section 8.1.2.2]. Pipelining is more than only sharing the same connection.

-- GET #1 request --> -- GET #2 request --> <-- GET #1 response -- <-- GET #2 response

30 Aug 2011
13:38 PM

Pedro Félix

1) On the client side, HttpWebRequest supports pipelining (see http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.pipelined.aspx). Note however, that this pipelining only starts after the client is assured that the server is HTTP/1.1 compliant, that is, after the first response.

2) HTTP.SYS and HttpListener also support pipelining. However, from my observations, it appears that the requests are delivered to the handlers (BeginGetContext callback) in sequential order. This means that the Nth request starts processing only after the (N-1)th request is completed.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB