RavenDB C++ client: Laying the ground work

time to read 2 min | 371 words

The core concept underlying the RavenDB client API is the notion of Unit of Work. This provide core features such as change tracking and identity map. In all our previous clients, that was pretty easy to deal with, because the GC solved memory ownership and reflection gave us a lot of stuff basically for free.

Right now, what I want to achieve is the following:

It seems pretty simple, right? Because both the session and the caller code are going to share ownership on the passed User. Notice that we modify the user after we call store() but before save_changes(). We expect to see the modification in the document that is being generated.

The memory ownership is handled here by using shared_ptr as the underlying mode in which we accept and return data to the session. Now, let’s see how we actually deal with serialization, shall we? I have chosen to use nlohmann’s json for the project, which means that the provided API is quite nice. As a consumer, you’ll need to write your JSON serialization code, but it is fairly obvious how to do so, check this out:

Given that C++ doesn’t have reflection, I think that this represent a really nice handling of the issue. Now, how does this play with everything else? Here is what the skeleton of the session looks like:

There is a whole bunch of stuff that is going on here that we need to deal with.

First, we have the IEntityDetails interface, which is non generic. It is implemented by the generic class EntityDetails, which has the actual type that we are using and can then use the json serialization we defined to convert the entity to JSON. The rest are just details, we need to use a vector of shared_ptr, instead of the abstract class, because the abstract class has no defined size.

The generic store() method just capture the generic type and store it, and the rest of the code can work with the non generic interface.

I’m not sure how idiomatic this code is, or how performant, but at least as a proof of concept, it works to show that we can get a really good interface to our users in C++.

Tweet Share Share 7 comments

Tags:

design

Comments

16 Oct 2018
13:08 PM

Mark Boyall

It's always difficult to comment on parts of code without knowing the whole thing, but randomly assuming there's nothing in the other stuff that's material, I've noticed a few things you could improve. Idiomatically it's not bad, just a few missing details.

struct User
{
    std::string name;
    int age;

    User(std::string n) : name(std::move(n)) {}
};

Notice the extra std::move - this eliminates a pointless copy of the string. There's quite a few other places. The general rule is that if you don't need the source object anymore, std::move the last usage. This is particularly good for functions which only reference things one time (as in the above).

For the entity details, you can make life considerably simpler on yourself by using std::function. One of the things people often miss when coming from other languages is how much C++ lambdas can really simplify things. Consider for example:

class Session
{
    std::vector<std::function<std::string()>> items;

public:
    template <typename T>
    void store(std::shared_ptr<T> entity) {
        items.push_back([=] {
           return *entity;
        });
    }

    void save_changes() {
        for (auto&& it : items) {
            auto json = it();
            std::cout << json << std::endl;
        }
    }
};

In this case the std::function can perform small buffer optimisation, eliminating the need for a heap reference. Another benefit is that it's a lot less verbose and keeps everything by value, which is more idiomatic. It's a bit less pleasant if you're trying to keep multiple functions that way though. Even if you decide to stick with the interface (which is not an invalid decision; depends a lot on details not shown here) use a unique_ptr, not a shared_ptr. There's no need for refcounting here.

In fact, the details of keeping a reference to the entity are arguably poor here as well. I've been thinking about how to handle figuring out how to handle serialising the entity, and I think it would be best to just take the serialisation function as the argument to store in the first place. Then the user doesn't need to keep their entities on the heap if they don't need to, and you don't have to worry about ADL or interfaces for figuring out how to serialise.

16 Oct 2018
13:11 PM

Mark Boyall

For example, class Session { std::vector<std::function<std::string()>> items;

public:
    template <typename T>
    void store(std::function<std::string()> serialise) {
        items.push_back(serialise);
    }

    void save_changes() {
        for (auto&& it : items) {
            auto json = it();
            std::cout << json << std::endl;
        }
    }
};

int main()
{
    User user;

    Session session;
    session.store([&] { return to_json(user); });

    user->age = 2;

    session.save_changes();
}

16 Oct 2018
13:39 PM

Oren Eini

Mark, Thank you very much for your feedback. The std::function is very nice, although I have to admit that coming from C#'s background, I'm very wary of capturing labmdas. This looks like it would be a really nice behavior here.

With regards to the shared_ptr, I'm using it there explicitly because the lifetime of the entity is no scoped to either the session or the caller lifetime, but the larger of the two. Since unique_ptr owns the reference, but I need at least two people to actually share ownership, I don't think it would be appropriate.

16 Oct 2018
13:41 PM

Oren Eini

Mark, Something like void store(std::function<std::string()> serialise) is explicitly not something that I want to do. In particular, because it mess up the interface that we expose to the user significantly. Consider: session.store([&] { return to_json(user); }); vs. session.store(user); I would much rather have the later.

16 Oct 2018
13:45 PM

Mark Boyall

If you used an interface for the contents of items, those objects being held there are not the lifetime of the entity, though. That is only the lifetime of the Session, so unique_ptr can be appropriate there.

The advantage of the approach I have suggested where you simply take the serialisation function directly is that the user can choose how the lifetime works. They can use a shared approach as you have used here, but they can also use a fixed lifetime approach as I have shown- or indeed, any other lifetime that works for them.

One of the things I found in the C++ community is that there is always a part who will go nuts over the slightest heap allocation or any perceived inefficiency, even if in reality it's totally meaningless (e.g. a couple extra allocations per entity). APIs that allow them to avoid allocating on the heap if possible tend to be more successful.

16 Oct 2018
13:48 PM

Oren Eini

Mark, I'm actually fine with having overloads of this function. So you can either pass the entity, or pass the serialization function :-)

16 Oct 2018
18:30 PM

Oren Eini

Mark, Actually, things are a bit more complex. Consider the following scenario:

session.store(user, "users/1");
auto user2 = session.load<User>("users/1");

In this case, user and user2 must point to the same location. So we do need the actual instance.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB