Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 352 words

The question came up in the Mailing list, and I thought it would be a good thing to post about.

How do you handle reference data in RavenDB? The typical example in most applications would be something like the list of states, their names and abbreviations. You might want to refer to a state by its abbreviation, but still allow for something like this:

How do you handle this in RavenDB? Using a relational database, you would probably have a separate table just for states, and it is certainly possible to create something similar to that in RavenDB:

image

The problem with this approach is that is it a very wrong headed approach one for a document database. In a relational database, you have no choice but to threat anything that has many items as a table with many rows. In a document database, we have much better alternatives.

Instead of threating each state as an individual document, we can treat them as a whole value, like this:

image

In the sample, I included just a couple of states, but I think that you get the idea. Note the name of this document “Config/States”.

What are the benefits of this approach?

  • We only have to retrieve a single document.
  • We are almost always going to treat the states as a list, never as individual items.
  • There is never a transactional boundary for modifying just a single state.

In fact, I would usually say that this isn’t good enough. I would try to centralized any/all of the reference data that we use in the application into a single document. There is a high likelihood that it would be a fairly small document even after doing so, and it would be insanely easy to cache.

time to read 2 min | 366 words

image

I mentioned in passing that I didn’t like that in the blog post about the MVC Profiler Support for RavenDB.

There were three major reasons for that. First, I don’t like code like this:

image

This seems to be incredibly invasive, it is really not something that I want to force on my users. Considering how much time I spent building and designing profilers, I have strong opinions about them. And one of the things that I consider as huge selling points for my profilers is that they are pretty much fire and forget. Once you initialize them, you can forget about all the rest.

Obviously, a general purpose profiler like this can’t do that, but I still don’t like it very much. And since we were talking about profiling RavenDB, I saw no reason to go to something like this.

Next, there is a pretty big focus on profiling SQL:

image

That is pretty meaningless for us when we talk about Profiling RavenDB.

And finally, there was the Stack Trace management. Providing Stack Trace information is incredibly useful for a profiler, and the MVC Mini Profiler supports that, but I didn’t like how it did it. Calling new StackTrace() is pretty expensive, and there are better ways of doing that.

One of the goals for the RavenDB profiler is to allow people to run it in production, same as I am doing right now in this blog, after all.

Mostly, the decision to go and write it from scratch was based on making it easier for us to build something that made sense for our purposes, not try to re-use something that was built for a completely different purpose.

time to read 2 min | 224 words

One of the horrible things about SecurityException is just how little information you are given.

For example, let us take a look at:

image

Yes, that is informative. The real problem is that most security analysis is done at the method level, which means that _something_ in this method caused this problem. Which meant that in order to debug this, I have to change my code to be like this:

image

Which gives me this:

image

Just to point out, the first, second, etc are purely artificial method names, meant just to give me some idea about the problem areas for this purpose only.

Then we need to go into the code in the First method and figure out what the problem is there, and so on, and so forth.

Annoying, and I wish that I knew of a better way.

time to read 1 min | 178 words

I think that the NHibernate and RavenDB workshops in DevLink has been awesome, and I have been asked to do more of those.

One thing that I learned in particular from the RavenDB workshop is that just one day is just not enough, and we expanded the single day workshop into a two days course. I’ll be giving this course in November, 3 – 4 2011, in New York.

And, of course, my old companion, the Advanced NHibernate Course, for the people who wants to grok NHibernate and really understand what is going on under the cover, and maybe even more important, why? That course will run on October 31 – November 2, 2011, also in New York.

And if you are from the other side of the pond, I’ll be giving my Advanced NHibernate Course in Warsaw, Poland on October 17 – 19, 2011.

Register now for early bird pricing:

time to read 5 min | 903 words

There are several ways to handle schema changes in RavenDB. When I am talking about schema changes, I am talking about changing the format of documents in production. RavenDB doesn’t have a “schema”, of course, but if your previous version of the application had a Name property for customer, and your new version have FirstName and LastName, you need to have some way of handling that.

Please note that in this case I am explicitly talking about a rolling migration, not something that you need to do immediately.

We will start with the following code bases:

Version 1.0 Version 2.0
public class Customer
{
    public string Name {get;set;}
    public string Email {get;set;}
    public int NumberOfOrders {get;set;}
}
public class Customer
{
    public string FirstName {get;set;}
    public string LastName {get;set;}
    public string CustomerEmail {get;set;}
    public bool PreferredCustomer {get;set;}
} 

As I said, there are several approaches, depending on exactly what you are trying to do. Let us enumerate them in order.

Removing a property – NumberOfOrders

As you can see, NumberOfOrders was removed from v1 to v2. In this case, there is absolutely no action required of us. The next time that this customer will be loaded, the NumberOfOrders property will not be bound to anything, RavenDB will note that the document have changed (missing a property) and save it without the now invalid property. It is self cleaning Smile.

Adding a property – PreferredCustomer

In this situation, what we have is a new property, and we need to provide a value for it. If there isn’t any value for the property in the stored json, it won’t be set, which means that the default value (or the one set in the constructor) will be the one actually set. Again, RavenDB will note that the document have changed, (have an extra property) and save it with the new property. It is self healing Smile.

Modifying properties – Email –> CustomerEmail, Name –> FirstName, LastName

This is where things gets annoying. We can’t rely on the default behavior for resolving this. Luckily, we have the extension points to help us.

public class CustomerVersion1ToVersion2Converter : IDocumentConversionListener
{
    public void EntityToDocument(object entity, RavenJObject document, RavenJObject metadata)
    {
        Customer c = entity as Customer;
        if (c == null)
            return;

        metadata["Customer-Schema-Version"] = 2;
        // preserve the old Name proeprty, for now.
        document["Name"] = c.FirstName + " " + c.LastName;
        document["Email"] = c.CustomerEmail;
    }

    public void DocumentToEntity(object entity, RavenJObject document, RavenJObject metadata)
    {
        Customer c = entity as Customer;
        if (c == null)
            return;
        if (metadata.Value<int>("Customer-Schema-Version") >= 2)
            return;

        c.FirstName = document.Value<string>("Name").Split().First();
        c.LastName = document.Value<string>("Name").Split().Last();
        c.CustomerEmail = document.Value<string>("Email");
    }
}

Using this approach, we can easily convert between the two version, including keeping the old schema in place in case we still need to be compatible with the old schema.

Pretty neat, isn’t it?

time to read 2 min | 250 words

When I am talking about migrations, I am talking about changing the format of documents in production. RavenDB doesn’t have a “schema”, of course, but if your previous version of the application had a Name property for customer, and your new version have FirstName and LastName, you need to have some way of handling that.

Before getting around to discussing how to do this, I want to discuss when. There are basically two types of migrations when you are talking about RavenDB.

Immediate, shutdown – there is a specific cutoff point, in which we will move from v1 to v2. That includes shutting down the application for the duration of the update. The shutdown is required because the v1 application expects to find Name property, the v2 application expects to first FirstName, LastName properties. If you try to run them both at the same time, you can get into a situation where each version of the application is fighting to enforce its own view. So the simplest solution is to shut down the application, convert the data and then start the new version.

Rolling update – your v2 application can handle the v1 format, which means that you can switch from the v1 app to v2 app fairly rapidly, and whenever the v2 application encounters the old format, it will convert it for your to the new one, potentially conserving the old format for a certain period.

I’ll discuss exactly how to handle those in my next post.

time to read 3 min | 465 words

Let us say that we have the following server code:

public JsonDocument[] GetDocument(string[] ids)
{
  var  results = new List<JsonDocument>();
  foreach(var id in ids)
  {
    results.Add(GetDocument(id));
  }
  return result.ToArray();  
}

This method is a classic example of batching requests to the server. For the purpose of our discussion, we have a client proxy that looks like this:

public class ClientProxy : IDisposable
{
  MemoryCache cache = new MemoryCache();
  
  public JsonDocument[] GetDocument(params string[] ids)
  {
    
    // make the request to the server, for example
    var request = WebRequest.Create("http://server/get?id=" + string.Join("&id=", ids));
    using(var stream = request.GetResponse().GetResposeStream())
    {
      return  GetResults(stream);
    }
  }
  
  public void Dispose()
  {
    cache.Dispose();
  }
}

Now, as you can probably guess from the title and from the code above, the question relates to caching. We need to make the following pass:

using(var proxy = new ClientProxy())
{
  proxy.GetPopularity("ayende", "oren"); // nothing in the cahce, make request to server
  proxy.GetPopulairty("rhino", "hippo"); // nothing in the cahce, make request to server
  proxy.GetPopulairty("rhino", "aligator"); // only request aligator, 'rhino' is in the cache
  proxy.GetPopulairty("rhino", "hippo");  // don't make any request, serve from cache
  proxy.GetPopulairty("rhino", "oren");   // don't make any request, serve from cache
}

The tricky part, of course, is to make this elegant. You can modify both server and client code.

I simplified the problem drastically, but one of the major benefits in the real issue was reducing the size of data we have to fetch over the network even for partial cached queries.

time to read 2 min | 292 words

My company is hiring again, and since I had so much success in our previous hiring round, I decided to use this blog for the next one as well.

Some information that you probably need to know:

  • You have to know to program in C#.
  • If you have OSS credentials, that is all for the better, but it is not required.
  • It is not a remote position, our offices are located in Hadera, Israel.

We are looking for developers, and are open for people with little to no commercial experience. I am looking for people who are passionate about programming, and I explicitly don’t list the whole alphabet soup that you usually see in a hiring notice. I don’t really care if you have / don’t have 5 years experience in WCF 4.1 update 3. I care that you can think, that you can program your way out of problems and also know when it is not a good idea to throw code at a problem.

If you are a reader of my blog, you probably have a pretty good idea about the sort of things that we are doing. But to be clear, we are talking about working on our flock of ravens (RavenDB, RavenFS and RavenMQ), our suite of profilers as well as some other things that I won’t discuss publically at this point Smile.

What is required:

Commercial experience is not required, but I would want to see code that you wrote.

Additionally, we are still looking for interns, so if you are a student who would like to gain some real world commercial experience (and get paid for it, naturally), contact me.

time to read 3 min | 575 words

We were asked to implement comments RSS for this blog, and many people asked about the recent comments widget. That turned out to be quite a bit more complicated than it appeared on first try, and I thought it would make a great challenge.

On the face of it, it looks like a drop dead simple feature, right? Show me the last 5 comments in the blog.

The problem with that is that it ignores something that is very important to me, the notion of future posts. One of the major advantages of RaccoonBlog is that I am able to post stuff that would go on the queue (for example, this post would be scheduled for about a month from the date of writing it), but still share that post with other people. Moreover, those other people can also comment on the post. To make things interesting, it is quite common for me to re-schedule posts, moving them from one date to another.

Given that complication, let us try to define how we want the Recent Comments feature to behave with regards to future posts. The logic is fairly simple:

  • Until the post is public, do not show the post comments.
  • If a post had comments on it while it was a future post, when it becomes public, the comments that were already posted there, should also be included.

The last requirement is a bit tricky. Allow me to explain. It would be easier to understand with an example, which luckily I have:

image_thumb[2]

As you can see, this is a post that was created on the 1st of July, but was published on the 12th. Mike and me have commented on the post shortly after it was published (while it was still hidden from the general public). Then, after it was published, grega_g and Jonty have commented on that.

Now, let us assume that we query at 12 Jul, 08:55 AM, we will not get any comments from this post, but on 12 Jul, 09:01 AM, we should get both comments from this post. To make things more interesting, those should come after comments that were posted (chronologically) after them. Confusing, isn’t it? Again, let us go with a visual aid for explaining things.

In other words, let us say that we also have this as well:

image

Here is what we should see in the Recent Comments:

12 Jul, 2011 – 08:55 AM 12 Jul, 2011 – 09:05 AM
  1. 07/07/2011 07:42 PM – jdn
  2. 07/08/2011 07:34 PM – Matt Warren
  1. 07/03/2011 05:26 PM – Ayende Rahien
  2. 07/03/2011 05:07 PM - Mike Minutillo
  3. 07/07/2011 07:42 PM – jdn
  4. 07/08/2011 07:34 PM – Matt Warren

Note that the 1st and 2snd location on 9:05 are should sort after 3rd and 4th, but are sorted before them, because of the post publish date, which we also take into account.

Given all of that, and regardless of the actual technology that you use, how would you implement this feature?

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}