Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 4 min | 771 words

This question was raised in Twitter, and I thought it was quite interesting. In SQL, you can use the rank() function to generate this value, but if you are working on a large data set and especially if you are sorting, you will probably want to avoid this.

Microsoft has a reference architecture for the leader board problem where they recommend running a separate process to recompute the ranking every few minutes and cite about 20 seconds to run the query on a highly optimized scenario (with 1.6 billion entries in a column store).

RavenDB doesn’t have a rank() function, but that you cannot implement a leader board. Let’s see how we can build one, shall we? We’ll start with looking at the document representing a single game:

image

You’ll probably have a lot more data in your use case, but that should be sufficient to create the leader board. The first step we need to do is to create an index to aggregate those values into total score for the gamers. Here is what the index looks like:

This is a fairly trivial index, which will allow us to compute the total score of a gamer across all games. You might want to also add score per year / month / week / day, etc. I’m not going to touch that since this is basically the same thing.

RavenDB’s map/reduce indexes will process the data and aggregate it across all games. A new game coming in will not require us to recompute the whole dataset, only the gamer that was changed will be updated, and even so, RavenDB can optimize it even further in many cases and touch only some of the data for that gamer to compute the new total.

However, there is a problem here. How do we generate a leader board here? To find the top 20 gamers is easy:

image

That is easy enough, for sure. But a leader board has more features that we want to have. For example, if I’m not in the top 20, I might want to see other gamers around my level. How can we do that?

We’ll need to issue a few queries for that. First, we want to find what the gamer actual score is:

image

And then we will need to get the gamers that are a bit better from the current gamer:

image

What this does is to ask to get the 4 players that are better than the current gamer. And we can do the same for those that are a bit worse:

image

Note that we are switching the direction of the filter and the order by direction in both queries. That way, we’ll have a list of ten players that are ranked higher or lower than the current gamer, with the current one strictly in the middle.

I’m ignoring the possibility of multiple gamers with the same score, but you can change the > to >= to take them into account. Whatever this is important or not depends on how you want to structure your leader board.

The final, and most complex, part of the leader board is finding the actual rank of any arbitrary gamer. How can we do that? We don’t want to go through the entire result set to get it. The answer to this question is to use facets, like so:

image

What this will do is ask RavenDB to provide a count of all the gamers whose score are higher or lower than the current gamer. The output looks like this:

image

And you can now compute the score of the gamer.

Facets are optimized for these kind of queries and are going to operate over the results of the index, so they are already going to operate over aggregated data. Internally, the data is actually stored in a columnar format, so you can expect very quick replies.

There you have it, all the components required to create a leader board in RavenDB.

time to read 2 min | 235 words

A user reported an issue with RavenDB. They got unexpected results in their production database, but when they imported the data locally and tested things, everything worked.

Here is the simplified version of their index:

This is a multi map index that covers multiple collections and aggregate data across them. In this case, the issue was that in production, for some of the results, the CompanyName field was null.

The actual index was more complex, but once we trimmed it down in size to something more manageable, it became obvious what the problem is. Let’s look at the problematic line:

CompanyName = g.First().CompanyName,

The problem is with the First() call. There is no promise of ordering in the grouping results, and you are getting the first item there. If the item happened to be the one from the Company map, the index will appear to work and you’ll get the right company name. However, if the result from the User map will show up first, we’ll have null in the CompanyName.

We don’t make any guarantees about the order of elements in the grouping, but in practice it is often (don’t rely on it) depends on the order of updates in the documents. So you can update the user after the company and see the changes in the index.

The right way to index this data is to do so explicitly, like so:

CompanyName = g.First(x => x.CompanyName != null).CompanyName,

time to read 2 min | 233 words

Our internal ERP system is configured to send an email when we have orders that cross a certain threshold. For obvious reasons, I like knowing when a big deal has closed. For the most part, for large orders, there is a back and forth process with the customer, so we know to expect the order. In most cases, the process involves talking to legal, purchasing agents and varying degrees of bureaucracy.

Sometimes we have a customer that come out of the blue, place a large order and never even talk to a person throughout the whole thing. I love those cases, as you can imagine, because it means that there was no friction anywhere that we needed to deal with.

Today, I got notified about an order with a large enough sum that I had to stop and count the number of zeroes. I was quite happy with the number, but was somewhat concerned that I would have to pay enough taxes to close down the national debt. I checked a bit further with the accounting department what was going on there.

Sadly, we won’t be able to purchase a large island from pocket change. This was a manual order and instead of putting the amount of the order, they put in the invoice id.

Cue sad trombone sound.

If you need me, I’m currently busy trying to cancel my tickets to the moon.

time to read 1 min | 65 words

Next week I’m going to be talking with Ryan Rounkles about his use of RavenDB in Tended App. The Tended App is a medical services app for parents with a child that has medical needs. It can be something as simple as the sniffles or a 24-hour virus, to special needs kids which require constant attention.

Join us on Aug 25, 2020 10:30 AM EST.

time to read 5 min | 970 words

A customer reported that on their system, they suffered from frequent cluster elections in some cases. That is usually an indication that the system resources are hit in some manner. From experience, that usually means that the I/O on the machine is capped (very common in the cloud) or that there is some network issue.

The customer was able to rule these issues out. The latency to the storage was typically withing less than a millisecond and the maximum latency never exceed 5 ms. The network monitoring showed that everything was fine, as well. The CPU was hovering around the 7% CPU and there was no reason for the issue.

Looking at the logs, we saw very suspicious gaps in the servers activity, but with no good reason for them. Furthermore, the customer was able to narrow the issue down to a single scenario. Resetting the indexes would cause the cluster state to become unstable. And it did so with very high frequency.

“That is flat out impossible”, I said. And I meant it. Indexing workload is something that we have a lot of experience managing and in RavenDB 4.0 we have made some major changes to our indexing structure to handle this scenario. In particular, this meant that indexing:

  • Will run in dedicated threads.
  • Are scoped to run outside certain cores, to leave the OS capacity to run other tasks.
  • Self monitor and know when to wind down to avoid impacting system performance.
  • Indexing threads are run with lower priority.
  • The cluster state, on the other hand, is run with high priority.

The entire thing didn’t make sense. However… the customer did a great job in setting up an environment where they could show us: Click on the reset button, and the cluster become unstable.  So it is impossible, but it happens.

We explored a lot of stuff around this issue. The machine is big and running multiple NUMA node, maybe it was some interaction with that? It was highly unlikely, and eventually didn’t pan out, but that is one example of the things that we tried.

We setup a similar cluster on our end and gave it 10x more load than what the customer did, on a set of machines that had a fraction of the customer’s power. The cluster and the system state remain absolutely stable.

I officially declared that we were in a state of perplexation.

When we run the customer’s own scenario on our system, we saw something, but nothing like what we saw on their end. One of the things that we usually do when we investigate resource constraint issues is to give the machines under test a lot less capability. Less memory and slower disks, for example, means that it is much easier to surface many problems. But the worse we made the situation for the test cluster, the better the results became.

We changed things up. We gave the cluster machines with 128 GB of RAM and fast disks and tried it again. The situation immediately reproduced.

Cue facepalm sound here.

Why would giving more resources to the system cause instability in the cluster? Note that the other metrics also suffered, which made absolutely no sense.

We started digging deeper and we found the following index:

It is about as simple an index as you can imagine it would be and should cause absolutely no issue for RavenDB. So what was going on? We then looked at the documents…

image

I would expect the State field to be a simple enum property. But it is an array that describe the state changes in the system. This array also holds complex objects. The size of the array is on average about 450 items and I saw it hit a max of 13,000 items.

That help clarify things. During index, we have to process the State property, and because it is an array, we index each of the elements inside it. That means that for each document, we’ll index between 400 – 13,000 items for the State field. What is more, we have a complex object to index. RavenDB will index that as a JSON string, so effectively the indexing would generate a lot of strings. These strings are going to be held in memory until the end of the indexing batch. So far, so good, but the problem in this case was that there were enough resources to have a big batch of documents.

That means that we would generate over 64 million string objects in one of those batches.

Enter GC, stage left.

The GC will be invoked based on how many allocations you have (in this case, a lot) and how many live objects you have. In this case, also a lot, until the index batch is completed. However, because we run GC multiple times during the indexing batch, we had promoted significant numbers of objects to the next generation, and Gen1 or Gen2 collections are far more expensive.

Once we knew what the problem was, it was easy to find a fix. Don’t index the State field. Given that the values that were indexed were JSON strings, it is unlikely that the customer actually queried on them (later confirmed by talking to the customer).

On the RavenDB side, we added monitoring for the allocation frequency and will close the indexing batch early to prevent handing the GC too much work all at once.

The reason we failed to reproduce that on lower end machine was simple, RavenDB already used enough memory so we closed the batch early, before we could gather enough objects to cause the GC to really work hard. When running on a big machine, it had time to get the ball rolling and hand the whole big mess to the GC for cleanup.

time to read 1 min | 65 words

I had the pleasure of talking with Vaishali Lambe in SoLeadSaturday this week.

We talked about various aspects of being in - Building a business around open source software, Working in a distributed teams, Growing a company from 1 employee to 30+, Non technical details that are important to understand to run a company. Also, discussed an importance of blogging.

You can check it here.

time to read 3 min | 475 words

This post talks about Python, but it generalize well to other programming languages and environments. I’m using Python and my own experience here to make a point, this isn’t really a post about Python itself.

I know how to read and write code in Python. By that I mean that I understand the syntax and how to do things in Python. I wouldn’t say that I’m an expert in all the myriad of ways that you can make Python jump on command, but I’m comfortable reading non trivial code bases and I like to use Python for small scripting jobs. I’m also maintaining (personally) the RavenDB Python Client which is just over 11K lines of code and decidedly non trivial.

But I don’t know Python.

Those two statements may seem to contradict one another, but I don’t really think so.

To this day, I find that packaging code in Python to be an unfamiliar territory. I don’t have a good feel for that and even when I follow the guides exactly, something doesn’t work properly more often than not. I also have only the vaguest idea about the Python virtual machine and the internals of the GC.

My few attempts to build Python interfaces on top of ctypes has been… painful. And a task such as creating an application or a package that would embed a native component in Python is likely beyond me without investing a significant amount of time and effort.

This is interesting, because my threshold for understand a language or a platform means that I should have the ability to do non trivial things with the environment, not just with the code.

Packaging is an obvious problem, once you go beyond the simplest of scripts. But the detailed knowledge on debugging, troubleshooting and analysis of the system is also what I would expect to have before I could claim that I’m familiar with a particular environment.

There is often a lot of derision for job requirements such as “Minimum 5 years experience with Xyz”. Leaving aside Xyz being younger than five years.  In many cases this is a requirement that came from the HR department and not the technical team.

But when I read a requirement like that, I translate that to the difference between knowing how the code work, and grokking how the whole environment operates.

Note that there is nothing really insurmountable with Python, per se. If I would dedicate enough time (probably in the order of weeks, not months) to study it properly, I would have most of the knowledge that I need. But whenever I run into a stumbling block when using Python, it is always easier to simply forgo using Python and go use something that I am more familiar with. Hence, there has never been enough of a reason to make the jump to really understand the platform.

time to read 2 min | 213 words

I was asked what the meaning of default(object) is in the following piece of code:

The code is something that you’ll see a lot in RavenDB indexes, but I understand why it is a strange construct. The default(object) is a way to null. This is asking the C# compiler to add the default value of the object type, which is null.

So why not simply say null there?

Look at the code, we aren’t setting a field here, we are creating an anonymous object. When we set a field to null, the compiler can tell what the type of the field is from the class definition and check that the value is appropriate. You can’t set a null to a Boolean properly, for example.

With anonymous objects, the compiler need to know what the type of the field is, and null doesn’t provide this information. You can use the (object)null construct, which has the same meaning as default(object), but I find the later to be syntactically more pleasant to read.

It may make more sense if you’ll look at the following code snippet:

This technique is probably only useful if you deal with anonymous objects a lot. That is something that you do frequently with RavenDB indexes, which is how I run into this syntax.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}