The inconsistent index results
A user reported an issue with RavenDB. They got unexpected results in their production database, but when they imported the data locally and tested things, everything worked.
Here is the simplified version of their index:
This is a multi map index that covers multiple collections and aggregate data across them. In this case, the issue was that in production, for some of the results, the CompanyName field was null.
The actual index was more complex, but once we trimmed it down in size to something more manageable, it became obvious what the problem is. Let’s look at the problematic line:
CompanyName = g.First().CompanyName,
The problem is with the First() call. There is no promise of ordering in the grouping results, and you are getting the first item there. If the item happened to be the one from the Company map, the index will appear to work and you’ll get the right company name. However, if the result from the User map will show up first, we’ll have null in the CompanyName.
We don’t make any guarantees about the order of elements in the grouping, but in practice it is often (don’t rely on it) depends on the order of updates in the documents. So you can update the user after the company and see the changes in the index.
The right way to index this data is to do so explicitly, like so:
CompanyName = g.First(x => x.CompanyName != null).CompanyName,
Comments
why not just do the group by for what data you want like you would in SQL, for example:
Eluvatar,
The issue is that for some of those documents, the
CompanyName
is null, since you don't have it for results that can fromUsers
Why would
work?
What if for example I don't create a new company but I create 200 new users under an existing company? Wouldn't I then still get the situation that I would receive a null as a CompanyName? As there are no new company's to reduce and only new users?
I can see how it would work if have a procedure which says that you always have to create a company and a user, but I don't understand why this would work if I don't create a company but only users.
Technically I could have a system which has no documents in the companies "table" but only a bunch of users in the users "table", wouldn't the reduce part work in such a case as that is the only way I can see how the fix would work in multiple scenario's
Christiaan ,
Let's take things in order. When you reduce, you are going to reduce the new items with the existing ones. That means that if you write new users docs, we'll take the already reduced value in storage and reduced it with the users' map results.That will give us the right value.
As for the scenario where you have no companies, or no matching company id for the users, we'll get
null
in the CompanyName, yes. Note that you won't get a NullReferenceException, because we re-write the code to be null coalescing behind the scenes.Ok, I understand that and I didn't see that the initial issue was a NullReferenceException, The article says that the problem is : "In this case, the issue was that in production, for some of the results, the CompanyName field was null." But basically your solution is imho just hiding the issue just a few levels deeper and making it a much harder to track bug as I think it will still occur in a normal environment . Because in my experience you usually have lots more users than companies, so there is a real possibility to only have users in a reduce step, how big the chance on it is is dependent on your users/company ratio but the lesser the chance the harder the bug is to track if on one day you see two results with a CompanyName field with null and the next day you can only reproduce it on 1 result (as the other one has been eliminated through new data/ map-reduce steps).
Imho you have just introduced a very rare bug which sometimes occurs en later disappears...
Just as a question, why wouldn't you use for example the following
Or would that create an infinite loop somehow?
Christiaan,
There can't be just users in the reduce step. One of the things that we'll do in the reduce is to add the _already reduced value_, so you have that there as well.
And you cannot call
LoadDocument
from the reduce, there is no way to track what should cause that to be recomputedComment preview