Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 382 words

This post is what the entire series has been building building toward to. In the previous post we have refactored our code to make it easier to build additional behaviors. Here is one such behavior, which uses the Esprima project to parse the code and see if it uses a pattern that we can optimize for. Let us see the code, then discuss what it does.

The new stuff is mostly in the TryProjectOptimized method. This method is making heavy use of the new C# features to do easy parsing of ASTs, and it really shows that the C# team is using C# to build the compiler (that is one of the common things that compiler writers do, make their own lives easier). Take this code even a couple of language versions back, and it would be much hard to read and work with.

At any rate, what this code is doing is search for an object literal that does simple assignment of properties. If it detect that, instead of calling into the JS engine, it will produce a delegate that will do just that directly.

The result is pretty much the same, but we have to do so much less. If we can’t detect a pattern that we recognize, we just fall back to the JS engine again, and get the pervious behavior.

This is pretty awesome thing to have, because if we detect a particular common pattern, we can optimize that quite easily. The code really lend itself to that approach now.

What about the performance? Well…

And this runs in 0.31 seconds on the 10K, 5K, 1K run.

  • 10K in 83 ms
  • 5K in 200 ms
  • 1K in 31 ms

So that is two orders of magnitude for the 10K run.

An an exercise, try taking this code and teaching it how to recognize simple expressions like + and –. The second projection example is meant to give you some idea about that.

Oh, and one last thought, we are actually biased against the optimization. The Jint code produce results in formats that we need to pull the data from, and we aren’t accounting for that here. The optimize code can generate the output to be already in the format we need it to be ( the difference between the Dictionary and the JsValue).  

time to read 2 min | 256 words

So we got an order of magnitude performance boost, without really doing much, to be honest. But the code is starting to look really ugly, and future optimizations are out until we fix it.

Since we don’t really care about the internal details, we can do two things in one move. First, we’ll cleanup the code by introducing a delegate that will abstract the handling and the second is that this delegate will also allow us to handle caching of additional values simple by way of capturing the delegate state.

Here is what the code looks like:

Note that we have it broken apart into distinct steps, and the ProjectViaJint method is self contained and any caching it needs is done internally. This is much needed refactoring if we want to be able to continue to optimize this code, and it runs in about the same speed as the previous version, so we didn’t hurt anything there.

With this in place, we can see how much better we can make things.

Before we go on, I want to emphasis that this code is actually making a lot of assumptions. The passed dictionary must always have the same set of fields, for example, otherwise we may see a value from a previous iteration. No null handling or error handling is in place, we aren’t showing the unpacking of the values from the JS env to our code, and plenty of other stuff. That isn’t that important at this point, because I’m showing the progression of optimizations, rather then production code.

time to read 2 min | 229 words

After optimizing the projection performance by so much by just caching the engine, I got to thinking, we are also creating a new JS object to pass the arguments every single time. What would happen if we’ll cache that?

Here is what I ended up with, again, all of this is pretty brute force (mainly because I’m writing these posts while the baby is asleep, and I’m trying to get as many of them out before she wakes up):

I wouldn’t have expected this to be a dramatic boost, but we got:

And this runs in 0.57 seconds on the 10K, 5K, 1K run.

  • 10K in 413 ms
  • 5K in 110 ms
  • 1K in 45 ms

That is also a third of the cost that we just saved.

This is interesting, so it is worth another check, there are other two potentially expensive operations here, the invocation of the method and the sending of arguments.

Trawling of the Jint code shows that we can remove some abstract by using ICallable directly, and we can cache the array of arguments, this all leads to:

And this runs in 0.37 seconds on the 10K, 5K, 1K run.

  • 10K in 279 ms
  • 5K in 59  ms
  • 1K in 32 ms

And that is a wow. Because right now, without really doing much at all, we area already an order of magnitude higher then our first attempt, and we can do more.

time to read 2 min | 236 words

In the previous post, I showed how we can handle select projections and setup a perf test. The initial implementation we had run in about 4.4 seconds for our test. But it didn’t give any thought to performance.

Let us see if we can do better. The first thing to do is to avoid building the Jint engine and parsing the code all the time. The way we set things up, we wrap the actual object literal in a function, and there is no state, so we can reuse the previous engine instance without any issues. ;:

That means that we don’t need to pay the cost of creating a new engine, parsing the code, etc. Here is what this looks like:

Note that the cache key here is the raw projection, not the function we send to the entire, this allows us to avoid any string allocations in the common (cached) path.

And this runs in 0.75 seconds on the 10K, 5K, 1K run.

  • 10K in 574 ms
  • 5K in 137 ms
  • 1K in 51 ms

Just this small change boosted our performance by a major margin.

Note that because the engines are not thread safe, to use that in a real application we’ll need to ensure thread safety. The way I did that is to have a pool of these engines for each projection and just use that, so an engine is always access in a single threaded mode.

time to read 2 min | 240 words

I spoke about designing for performance in my previous post and I thought it would be in interesting series  of blog posts. The task I have is that we have a data source of some kind, in our case, I decided to make the task as simple as possible and define the source data as a Dictionary<string, object>, without allowing nesting of data.

We want the user to be able to provide a custom way to project data from that dictionary. Here are a few ways to do that:

And here is the data source in question:

Obviously the user can make the select projections as complex as they wish. But I think that these three sample give us a good representation of the complexity. We’ll also treat them a bit differently, with regards to their likelihood. So when testing, we’ll check that the first projection is run 10,000 times, then second projection is run 5,000 times and the last projection is run 1,000 times.

With the layout of the land settled, let us see how we can get this actually working. I’m going to use Jint and solve the issue in a very brute force manner.

Here is the initial implementation:

And this runs in 4.4 seconds on the 10K, 5K, 1K run.

  • 10K in 2,756 ms
  • 5K in 1,310 ms
  • 1K in 314 ms

I’m pretty sure that we can do better, and we’ll look at that in the next post.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 4 days from now
  3. What happens when a sparse file allocation fails? - 6 days from now
  4. NTFS has an emergency stash of disk space - 8 days from now
  5. Challenge: Giving file system developer ulcer - 11 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}