The inconsistent index results

time to read 2 min | 235 words

A user reported an issue with RavenDB. They got unexpected results in their production database, but when they imported the data locally and tested things, everything worked.

Here is the simplified version of their index:

This is a multi map index that covers multiple collections and aggregate data across them. In this case, the issue was that in production, for some of the results, the CompanyName field was null.

The actual index was more complex, but once we trimmed it down in size to something more manageable, it became obvious what the problem is. Let’s look at the problematic line:

CompanyName = g.First().CompanyName,

The problem is with the First() call. There is no promise of ordering in the grouping results, and you are getting the first item there. If the item happened to be the one from the Company map, the index will appear to work and you’ll get the right company name. However, if the result from the User map will show up first, we’ll have null in the CompanyName.

We don’t make any guarantees about the order of elements in the grouping, but in practice it is often (don’t rely on it) depends on the order of updates in the documents. So you can update the user after the company and see the changes in the index.

The right way to index this data is to do so explicitly, like so:

CompanyName = g.First(x => x.CompanyName != null).CompanyName,

Tweet Share Share 6 comments

Tags:

Comments

25 Aug 2020
22:58 PM

Eluvatar

why not just do the group by for what data you want like you would in SQL, for example:

group result by new {result.CompanyId, result.CompanyName}
select new {
    CompanyId = g.Key.CompanyId,
        CompanyName = g.key.CompanyName,
}

26 Aug 2020
09:41 AM

Oren Eini

Eluvatar,

The issue is that for some of those documents, the CompanyName is null, since you don't have it for results that can from Users

26 Aug 2020
23:44 PM

Christiaan Siebeling

Why would

CompanyName = g.First(x => x.CompanyName != null).CompanyName,

work?

What if for example I don't create a new company but I create 200 new users under an existing company? Wouldn't I then still get the situation that I would receive a null as a CompanyName? As there are no new company's to reduce and only new users?

I can see how it would work if have a procedure which says that you always have to create a company and a user, but I don't understand why this would work if I don't create a company but only users.

Technically I could have a system which has no documents in the companies "table" but only a bunch of users in the users "table", wouldn't the reduce part work in such a case as that is the only way I can see how the fix would work in multiple scenario's

27 Aug 2020
08:30 AM

Oren Eini

Christiaan ,

Let's take things in order. When you reduce, you are going to reduce the new items with the existing ones. That means that if you write new users docs, we'll take the already reduced value in storage and reduced it with the users' map results.That will give us the right value.

As for the scenario where you have no companies, or no matching company id for the users, we'll get null in the CompanyName, yes. Note that you won't get a NullReferenceException, because we re-write the code to be null coalescing behind the scenes.

27 Aug 2020
23:49 PM

Christiaan Siebeling

Ok, I understand that and I didn't see that the initial issue was a NullReferenceException, The article says that the problem is : "In this case, the issue was that in production, for some of the results, the CompanyName field was null." But basically your solution is imho just hiding the issue just a few levels deeper and making it a much harder to track bug as I think it will still occur in a normal environment . Because in my experience you usually have lots more users than companies, so there is a real possibility to only have users in a reduce step, how big the chance on it is is dependent on your users/company ratio but the lesser the chance the harder the bug is to track if on one day you see two results with a CompanyName field with null and the next day you can only reproduce it on 1 result (as the other one has been eliminated through new data/ map-reduce steps).

Imho you have just introduced a very rare bug which sometimes occurs en later disappears...

Just as a question, why wouldn't you use for example the following

Reduce = results =>
      from result in results
      group result by result.CompanyId
      into g
     let CompName = LoadDocument<Company>("Companies/" + g.Key)?.CompanyName ?? "Unknown at this time"
      select new 
      {
          CompanyId = g.Key,
          CompanyName = CompName,
          UserCount = g.Sum(dto => dto.UserCount)
      };

Or would that create an infinite loop somehow?

02 Sep 2020
11:59 AM

Oren Eini

Christiaan,

There can't be just users in the reduce step. One of the things that we'll do in the reduce is to add the _already reduced value_, so you have that there as well.

And you cannot call LoadDocument from the reduce, there is no way to track what should cause that to be recomputed

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB