Our booth in VS Live!
And you can tell that it isn't me who is taking the picture because it looks awesome:
Our booth in VS Live!
And you can tell that it isn't me who is taking the picture because it looks awesome:
Our second RavenDB conference is now open. You can visit the conference site to see our speakers and topic, but in general, we are going to talk about the new RavenDB 3.5 release, including some extremely exciting features that we kept as a surprise. We are going to hear from customers and users about their experiences in utilizing RavenDB in demanding and high performance environments and we are going to unveil our plans for RavenDB 4.0.
Come and join us.
Today we found a bug in the way our connection pool. The code in question?
public HttpClient GetClient(TimeSpan timeout, OperationCredentials credentials, Func<HttpMessageHandler> handlerFactory) { var key = new HttpClientCacheKey(timeout, credentials); var queue = cache.GetOrAdd(key, i => new ConcurrentQueue<Tuple<long, HttpClient>>()); Tuple<long, HttpClient> client; while (queue.TryDequeue(out client)) { if (Stopwatch.GetTimestamp() - client.Item1 >= _maxIdleTimeInStopwatchTicks) { client.Item2.Dispose(); continue; } client.Item2.CancelPendingRequests(); client.Item2.DefaultRequestHeaders.Clear(); return client.Item2; } return new HttpClient(handlerFactory()) { Timeout = timeout }; }public void ReleaseClient(HttpClient client, OperationCredentials credentials) { var key = new HttpClientCacheKey(client.Timeout, credentials); var queue = cache.GetOrAdd(key, i => new ConcurrentQueue<Tuple<long, HttpClient>>()); queue.Enqueue(Tuple.Create(Stopwatch.GetTimestamp(), client)); }
Can you see the bug?
I'll let you think about it for a while.
Done?
Okay, here we go!
The basic idea is that we'll keep a bunch of client connected, and free them if they are too old.
The issue is with the use of the concurrent queue. Consider the case of having a sudden spike in traffic, and we suddenly need 100 connections. This code will generate them, and then put them back in the pool.
We then go back to normal level, only handle 20 concurrent requests, but because we are using a queue, we'll actually cycle all the 100 requests we have through the queue all the time, keeping all of them opened.
I was talking with a few of the guys at work, and the concept of rate limiting came up. In particular, being able to limit the number of operations a particular user can do against the system. The actual scenario isn't really important, but the idea kept bouncing in my head, so I sat down and wrote a quick test case.
Nitpicker corner: This is scratchpad code, it isn't production worthy, tested or validated.
The idea is probably best explained in code, like so:
private SemaphoreSlim _rateLimit = new SemaphoreSlim(10); public async Task HandlePath(HttpContext context, string method, string path) { if (await _rateLimit.WaitAsync(3000) == false) { context.Response.StatusCode = 429; // Too many requests return; } // actually process requests }
Basically, we define a semaphore, with the maximum number of operations that we want to allow, and we wait on the semaphore when we are starting the operation.
However, there is nothing that actually releases the semaphore. Here we get into design choices.
We can release the semaphore when the request is over, which effectively gives us rate limiting in terms of concurrent requests.
The more interesting approach from my perspective was to use this:
_timer = new Timer(state => { var currentCount = 10 - _rateLimit.CurrentCount; if (currentCount == 0) return; _rateLimit.Release(currentCount); }, null, 1000, 1000);
Using this approach, we are actually limited to 10 requests a second.
And yes, this actually allows more concurrent requests than the previous option, because if a request takes more than one second, we'll reset its count on the timer's tick.
I actually tested this using gobench, and it confirmed that this is actually serving exactly 10 requests / second.
This is from today, at the RavenDB booth in the Basta conference.
You might have noticed that there we a few errors on the blog recently.
That is related to a testing strategy that we employ for RavenDB.
In particular, part of our test scenario is to shove a RavenDB build to our own production system, to see how it works in a live production system with real workloads.
Our production systems are entirely running on RavenDB, and we have been playing with all sort of configuration and deployment options recently. The general idea is to give us good indication about the performance of RavenDB and make sure that we don’t have any performance regression.
Here is an example taken from our tracking system, for this blog:
You can see that we had a few periods with longer than usual response times. The actual reason for that was that we had some code that was throwing tremendous amount of work to RavenDB, and this is actually exhibiting noisy neighbor syndrome in this case (that is, the blog is behaving fine, but the machine it is on is very busy). That gave us indication about a possible optimization and some better internal metrics.
At any rate, the downside of living on the bleeding edge in production is that we sometimes get a lemon build.
That is the cost of dog fooding, something you need to clean up the results.
No future posts left, oh my!