Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 1 min | 69 words

Carl and Richard talk to Oren Eini, aka Ayende Rahein, about RavenDB. RavenDB is a NoSQL JSON document database. Oren explains how he came to the realization that he needed to build his own data store, and the advantages of document databases over relational databases. Is SQL dead? Not hardly, but RavenDB is an interesting addition to your data solution!

You can listen to it here.

time to read 1 min | 86 words

My NHibernate Course (4 days of digging into the heart & core of NHibernate) is arriving to Paris.

On the 18th April Sebastien Lambla will be teaching this course (in French).

During the course we build a practical application together, that demonstrates all important data management patterns in Nhibernate. You will learn how to use this O/R mapping tool efficiently in your applications to save time and effort on communicating with database storage.

Click here to register.

time to read 8 min | 1585 words

When I sat down over some years ago to actually decide how to go about dealing with that document database idea that wouldn’t let me sleep, I did several things. One of them was draw down the architecture of the system, and how to approach the technical challenges that we were seeing. But a much more interesting aspect was deciding to make this into a commercial product. One of the things that I did there was build a business plan for RavenDB. According to that business plan, we were supposed to hit the first production deployment of RavenDB round about now.

In practice, adoption of RavenDB has far outstripped my hopes, and we actually had production deployments for RavenDB within a few months of the public release. What actually took longer than deploying RavenDB was getting the stories back about it Smile. I am going to start posting those as soon as I get the authorization for the customers to do so.

The following is from the first production deployment of RavenDB…


1.        Who are you? (Name, company, position)

Henning Christiansen, Senior Consultant at Webstep Fokus, Norway (www.webstep.no) I am working on a development team for a client in the financial sector.

2.        In what kind of project / environment did you deploy RavenDB?

The client is, naturally, very focused on delivering solutions and value to the business at a high pace. The solutions we build are both for internal and external end users. On top of the result oriented environment, there is a heavy focus on building sustainable and maintainable solutions. It is crucial that modules can be changed, removed or added in the future.

In this particular project, we built a solution based on NServiceBus for communication and RavenDB for persistence. This project is part of a larger development effort, and integrates with both old and new systems. The project sounds unimpressive when described in short, but I'll give it a go:

The project's main purpose was to replace an existing system for distribution of financial analysis reports. Analysts/researchers work on reports, and submit them to a proprietary system which adds additional content such as tables and graphs of relevant financial data, and generates the final report as XML and PDF. One of the systems created during this project is notified when a report is submitted, pulls it out of the proprietary system, tags it with relevant metadata and stores it in a RavenDB instance before notifying subscribing systems that a new report is available. The reports are instantly available on the client's website for customers with research access.

One of the subscribing systems is the distribution system which distributes the reports by email or sms depending on the customer's preferences. The customers have a very fine-grained control over their subscriptions, and can filter them on things such as sector, company, and report type among other things. The user preferences are stored in RavenDB. When a user changes preferences, notifications are given to other systems so that other actions can be performed based on what the customers are interested in.

3.        What made you select a NoSQL solution for your project?

We knew the data would be read a lot more than it would be written, and it needed to be read fast. A lot of the team members were heavily battle-scarred from struggling with ORMs in the past, and with a very tight deadline we weren't very interested in spending a lot of time maintaining schemas and mappings.

Most of what we would store in this project could be considered a read model (à la CQRS) or an aggregate root (DDD), so a NoSQL solution seemed like a perfect fit. Getting rid of the impedance mismatch couldn't hurt either. We had a lot of reasons that nudged us in the direction of NoSQL, so if it hadn't been RavenDB it would have been something else.

4.        What made you select RavenDB as the NoSQL solution?

It was the new kid on the block when the project started, and had a few very compelling features such as a native .NET API which is maintained and shipped with the database itself. Another thing was the transaction support. A few of us had played a bit with RavenDB, and compared to other NoSQL solutions it seemed like the most hassle-free way to get up and running in a .NET environment. We were of course worried about RavenDB being in an early development stage and without reference projects, so we had a plan B in case we should hit a roadblock with RavenDB. Plan B was never put into action.

5.        How did you discover RavenDB?

I subscribe to your blog (http://ayende.com/Blog/) :)

There was a lot of fuss about NoSQL at the time, and RavenDB received numerous positive comments on Twitter and in blog posts.

6.        How long did it take to learn to use RavenDB?

I assume you're asking about the basics, as there's a lot to be learnt if you want to. What the team struggled with the most with was indexes. This was before dynamic indexes, so we had to define every index up front, and make sure they were in the database before querying. Breaking free from the RDBMS mindset and wrapping our heads around how indexes work, map/reduce, and how and when to apply the different analyzers took some time, and the documentation was quite sparse back then. The good thing is that you don't really need to know a lot about this stuff to be able to use RavenDB on a basic level.

The team members were differently exposed to RavenDB, so guessing at how long it took to learn is hard. But in general I think it's fair to say that indexes was the team's biggest hurdle when learning how to use RavenDB.

7.        What are you doing with RavenDB?

On this particular project we weren't using a lot of features, as we were learning to use RavenDB while racing to meet a deadline.

Aside from storing documents, we use custom and dynamic indexes, projections, the client API, and transactions. We're also doing some hand-rolled Lucene queries.

On newer projects however, with more experience and confidence with RavenDB, and as features and bundles keep on coming, we're doing our best to keep up with the development and making the best use of RavenDB's features to solve our problems.

8.        What was the experience, compared to other technologies?

For one thing, getting up and running with RavenDB is super-easy and only takes a couple of minutes. This is very different from the RDBMS+ORM experience, which in comparison seems like a huge hassle. Working with an immature and rapidly changing domain model was also a lot easier, as we didn't need to maintain mappings. Also, since everything is a document, which in turn easily maps to an object, you're sort of forced to always work through your aggregate roots. This requires you to think through your domain model perhaps a bit more carefully than you'd do with other technologies which might easier allow you to take shortcuts, and thus compromise your domain model.

9.        What do you consider to be RavenDB strengths?

It's fast, easy to get started with, and it has a growing community of helpful and enthusiastic users. Our support experience has also been excellent, any issues we've had have usually been fixed within hours. The native .NET API is also a huge benefit if you're working in a .NET environment

10.        What do you consider to be RavenDB weaknesses?

If we're comparing apples to apples, I can't think of any weaknesses compared to the other NoSQL solutions out there aside from the fact that it's new. Hence it's not as heavily tested in production environments as some of the older NoSQL alternatives might be. The relatively limited documentation, which admittedly has improved tremendously over the last few months, was also a challenge. The community is very helpful, so anything you can't find in the documentation can normally be answered by someone on the forum. There's also a lot of blog posts and example applications out there.

I find the current web admin UI a bit lacking in functionality, but hopefully the new Silverlight UI will take care of that.

11.        Now that you are in production, do you think that choosing RavenDB was the right choice?

Yes, definitely. We've had a few pains and issues along the way, but that's the price you have to pay for being an early adopter. They were all quickly sorted out, and now everything's been ticking along like clockwork for months. I'm confident that choosing RavenDB over another persistence technology has allowed us to develop faster and spend more time on the problem at hand.

12.        What would you tell other developers who are evaluating RavenDB?

I have little experience with other document databases, but obviously tested a bit and read blog posts when evaluating NoSQL solutions for this project. Since we decided to go with RavenDB there's been a tremendous amount of development done, and at this time none of the competitors are even close featurewise.

time to read 1 min | 98 words

Originally posted at 3/15/2011

I am trying to explain how RavenDB is structured, and we finally generated the following diagram:

image

I was actually surprised, because I don’t generally think about it this way, but having the diagram in front of us was really helpful in explain where everything goes.

And, of course, the inevitable nitpick: OMG it takes a loooong time to layout the graph.

time to read 2 min | 213 words

One of the major complaints about my recent series of code reviews is that I only say what you shouldn’t do, not what you should.

Well, this blog currently have 4,782 posts, many of which are actually talking about how you should design systems. I am sorry, but I really can’t repeat everything for each post.

I can specifically recommend the following series:

But really you are going to have to go over the archives and look. I am pretty good at assigning categories, and you might want to start with the Design category.

If you want to see how I write code, at last count, I had quite a few projects here: https://github.com/ayende

And last, but certainly not least, I can’t really tell you what you should do. There are too many factors involved to be able to do that honestly. People who assume that there is one correct way to do something also assumes that all projects are the same, with the same requirements, constraints and resources. That is not the case, and I try very hard not to provide any prescriptive design out of thin air.

time to read 2 min | 289 words

Originally posted at 3/11/2011

This time, this is a review of the Sharp Commerce application. Again, I have stumbled upon the application by pure chance, and I have very little notion about who wrote it.

You might have wondered why I named this blog series the way I did, I named it because of the method outline below. Please note that I had to invent a new system to visualize the data access behavior on this system:

image

  • In red, we have queries that are executed once: 3 queries total.
  • In aqua, we have queries that are executed once for each item in the order: 2 queries per each product in the order.
  • In purple, we have queries that are executed once for each attribute in each of the products in the order: 1 query per attribute per product in the order.

Now, just to give you some idea, let us say that I order 5 items, and each item have 5 attributes…

We get the following queries:

  • 3 queries – basic method cost
  • 10 queries – 2 queries per each product
  • 25 queries – 1 query for each attribute for each product

Totaling in 38 queries for creating a fairly simple order. After seeing this during the course review of the application, I have used that term for this method, because it is almost too easy to make those sort of mistakes.

time to read 2 min | 321 words

This time, this is a review of the Sharp Commerce application. Again, I have stumbled upon the application by pure chance, and I have very little notion about who wrote it.

As you might have noticed, I am pretty big fan of the NHibernate Futures feature (might be because I wrote it Smile), so I was encouraged when I saw it used in the project. It can drastically help the performance of the system.

Except, that then I saw how it is being used.

image

Which is then called from CurrencyService:

image

Which is then called…

image

Do you see that? We take the nice future query that we have, and turn it into a standard query, blithely crashing any chance to actually optimize the data access of the system.

The problem is that from the service API, there is literally nothing to tell you that you need to order your calls so all the data access happens up front, and it is easy to make mistakes such as this. The problem is that there is meaning lost at any additional abstraction layer, and pretty soon whatever good intention you had when you wrote a particular layer, 7 layers removed, you can’t really remember what is the best way to approach something.

time to read 6 min | 1193 words

Originally posted at 3/11/2011

This time, this is a review of the Sharp Commerce application. Again, I have stumbled upon the application by pure chance, and I have very little notion about who wrote it.

In this case, I want to focus on the ProductRepository:

image

In particular, those methods also participate in Useless Abstraction For The Sake Of Abstraction Anti Pattern. Here is how they are implemented:

public AttributeItem GetAttributeItem(int attributeItemId)
{
    return Session.Get<AttributeItem>(attributeItemId);
}

public Attribute GetAttribute(int attrbuteId)
{
    return Session.Get<Attribute>(attrbuteId);
}

public IEnumerable<Attribute> GetAllAttributes()
{
    return Session.QueryOver<Attribute>()
        .Future<Attribute>();
}

public void SaveOrUpdate(Attribute attribute)
{
    Session.SaveOrUpdate(attribute);
}

And here is how they are called (from ProductService):

public AttributeItem GetAttributeItem(int attributeItemId)
{
    return productRepository.GetAttributeItem(attributeItemId);
}

public Attribute GetAttribute(int attrbuteId)
{
    return productRepository.GetAttribute(attrbuteId);
}

public void SaveAttribute(Attribute attribute)
{
    productRepository.SaveOrUpdate(attribute);
}

 public IList<Product> GetProducts()
 {
     return productRepository.GetAll();
 }

 public Product GetProduct(int id)
 {
     return productRepository.Get(id);
 }

 public void SaveOrUpdate(Product product)
 {
     productRepository.SaveOrUpdate(product);
 }

 public void Delete(Product product)
 {
     productRepository.Delete(product);
 }

 public IEnumerable<Attribute> GetAllAttributes()
 {
     return productRepository.GetAllAttributes();
 }

Um… why exactly?

But as I mentioned, this post is also about the proper usage of abstracting the OR/M. A repository was originally conceived as a to abstract away messy data access code into nicer to use code. The product repository have one method that actually do something meaningful, the Search method:

public IEnumerable<Product> Search(IProductSearchCriteria searchParameters, out int count)
{
    string query = string.Empty;
    if (searchParameters.CategoryId.HasValue && searchParameters.CategoryId.Value > 0)
    {
        var categoryIds = (from c in Session.Query<Category>()
                           from a in c.Descendants
                           where c.Id == searchParameters.CategoryId
                           select a.Id).ToList();

        query = "Categories.Id :" + searchParameters.CategoryId;
        foreach (int categoryId in categoryIds)
        {
            query += " OR Categories.Id :" + categoryId;
        }
    }

    if (!string.IsNullOrEmpty(searchParameters.Keywords))
    {
        if (query.Length > 0)
            query += " AND ";

        query += string.Format("Name :{0} OR Description :{0}", searchParameters.Keywords);
    }

    if (query.Length > 0)
    {
        query += string.Format(" AND IsLive :{0} AND IsDeleted :{1}", true, false);

        var countQuery = global::NHibernate.Search.Search.CreateFullTextSession(Session)
            .CreateFullTextQuery<Product>(query);

        var fullTextQuery = global::NHibernate.Search.Search.CreateFullTextSession(Session)
            .CreateFullTextQuery<Product>(query)
            .SetFetchSize(searchParameters.MaxResults)
            .SetFirstResult(searchParameters.PageIndex * searchParameters.MaxResults);

        count = countQuery.ResultSize;

        return fullTextQuery.List<Product>();
    }
    else
    {
        var results = Session.CreateCriteria<Product>()
            .Add(Restrictions.Eq("IsLive", true))
            .Add(Restrictions.Eq("IsDeleted", false))
            .SetFetchSize(searchParameters.MaxResults)
            .SetFirstResult(searchParameters.PageIndex * searchParameters.MaxResults)
            .Future<Product>();

        count = Session.CreateCriteria<Product>()
            .Add(Restrictions.Eq("IsLive", true))
            .Add(Restrictions.Eq("IsDeleted", false))
            .SetProjection(Projections.Count(Projections.Id()))
            .FutureValue<int>().Value;

        return results;
    }
}

I would quibble about whatever this is the best way to actually implement this method, but there is little doubt that something like this is messy. I would want to put this in a very distant corner of my code base, but it does provides a useful abstraction. I wouldn’t put it in a repository, though. I would probably put it in a Search Service instead, but that isn’t that important.

What is important is to understand where there is actually a big distinction between code that merely wrap code for the sake of increasing the abstraction level and code that provide some useful abstraction over an operation.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}