Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 576 words

Recently, we added a way to track alerts across all the sessions the request. This alert will detect whenever you are making too many database calls in the same request.

But wait, don’t we already have that?

Yes, we do, but that was limited to the scope of one session. there is a very large set of codebases where the usage of OR/Ms is… suboptimal (in other words, they could take the most advantage of the profiler abilities to detect issues and suggest solutions to them), but because of the way they are structured, they weren’t previously detected.

What is the difference between a session and a request?

Note: I am using NHibernate terms here, but naturally this feature is shared among all profiler:

A session is the NHibernate session (or the data/object context in linq to sql / entity framework), and the request is the HTTP request or the WCF operation. If you had code such as the following:

public T GetEntity<T>(int id)
{
    using (var session = sessionFactory.OpenSession())
    {
         return session.Get<T>(id);
    }
}

This code is bad, it micro manages the session, it uses too many connections to the database, it … well, you get the point. The problem is that code that uses this code:

public IEnumerable<Friends> GetFriends(int[] friends)
{
   var results = new List<Friends>();
   foreach(var id in friends)
       results.Add(GetEnttiy<Friend>(id));

   return results;
}

The code above would look like the following in the profiler:

Image1

As you can see, each call is in a separate session, and previously, we wouldn’t have been able to detect that you have too many calls (because each call is a separate session).

Now, however, we will alert the user with a too many database calls in the same request alerts.

Image2

time to read 2 min | 213 words

We have recently been doing some work on Uber Prof, mostly in the sense of a code review, and I wanted to demonstrate how easy it was to add a new feature. The problem is that we couldn’t really think of a nice feature to add that we didn’t already have.

Then we started thinking about features that aren’t there and that there wasn’t anything in Uber Prof to enable, and we reached the conclusion that one limitation we have right now is the inability to analyze your application’s behavior beyond the session’s level. But there is actually a whole set of bad practices that are there when you are using multiple sessions.

That led to the creation of a new concept the Cross Session Alert, unlike the alerts we had so far, those alerts looks at the data stream with a much broader scope, and they can analyze and detect issues that we previously couldn’t detect.

I am going to be posting extensively on some of the new features in just a bit, but in the meantime, why don’t you tell me what sort of features do you think this new concept is enabling.

And just a reminder, my architecture is based around Concepts & Features.

time to read 3 min | 588 words

Originally posted at 2/17/2011

In a recent codebase, I had to go through the following steps to understand how a piece of data in the database got to the screen:

  • Visit Presenter needs to show the most recent visit
    • It calls VisitationService
      • It calls PatientsService
        • It called PatientDataProvider
          • It calls Repository<Patient>
            • It uses NHibernate
      • It called VisitDataProvider
        • It calls Repository<Visit>
          • It uses NHibernate

All of that in order to just grab some data, but you won’t really get the grasp of why this is bad until you realize that you need to change something in the way you load stuff from the database.

A common example (where I usually comes in) is when you have a performance problem and need to optimize the way you access the database.

The problem with this type of architecture is that it looks good. You have good separation, and there are usually tests for it, and it matches every rule in the SOLID rule book. Except, that it is horrible to actually try to make changes in such a system. Oh, you can easily try to replace the way you handle patients, for example, because that has an interface and you can switch that.

But the problem that I usually run into in those projects it that the things that I want to change aren’t along the axis of expected change, and the architecture is usually working directly against my ability to make a meaningful modification.

Guys, we aren’t talking about rocket science here, we are talking about loading some crap from the database. And for the most part, the way I like to see is:

  • Visit Presenter needs to show the most recent visit
    • It uses NHibernate

Basically, we want to make it so that reading from the database has as few frills as possible, because it is taking too much effort otherwise.

Writing is usually when we have to apply things like validation, business logic, rules and behaviors. Put that in a service and run with that, but for reads? Reads should be simple, and close to where they are needed, otherwise you are opening yourself to a world of trouble.

Oh, I just realized that I am describing something quite similar to the CQRS model, although I think that I got to it from a different angle.

time to read 1 min | 67 words

On Thursday, I’ll be giving a Webinar on Building Document Based Systems.

In this webcast we will explore building document based system on top of the RavenDB document database for .NET. We will explore the different modeling requirement, the tradeoffs and the benefits of using a document based approach for modeling our systems.

You can register for the Webinar using the following link.

time to read 3 min | 460 words

I recently have gone over some codebase to find something like this:

public interface IAuditable
{
  DateTime UpdatedAt {get;set;}
  string UpdatedBy {get;set;}
  DateTime CreatedAt {get;set;}
  string CreatedBy {get;set;}
}

public interface IEntity
{
  int Id {get;set;}
}

public class Entity : IEntity
{
  public int Id { get;set; }
}

public class AuditableEntity : Entity, IAuditable
{
  public DateTime UpdatedAt {get;set;}
  public string UpdatedBy {get;set;}
  public DateTime CreatedAt {get;set;}
  public string CreatedBy {get;set;}
 
}

public class Visit : AuditableEntity
{
  // stuff
}

I look at code like that, and it is more than a bit painful. It is painful, because this sort of code is badly abusing inheritance.

The problem is that this is mostly intended to save on typing, but with things like automatic properties, there isn’t really much point here. What it does produce is code that seems to be more complicated than it is, because now we have those classes in the middle that does nothing but provide properties for you to use. Worse than that, they take down the only base class slot that you have, and they force you to think in a way that isn’t always natural.

It is just as easy, and much clearer to use:

public interface IAuditable
{
  DateTime UpdatedAt {get;set;}
  string UpdatedBy {get;set;}
  DateTime CreatedAt {get;set;}
  string CreatedBy {get;set;}
}

public interface IEntity
{

}

public class Visit : IAuditable, IEntity
{
  public int Id {get;set;}
  public DateTime UpdatedAt {get;set;}
  public string UpdatedBy {get;set;}
  public DateTime CreatedAt {get;set;}
  public string CreatedBy {get;set;}
 
}

And hey, you can now have an auditable that have a composite key, something that you used to need a completely separate inheritance hierarchy to deal with.

time to read 3 min | 501 words

The question came up in a somewhat unrelated discussion, about the RavenDB authorization bundle usage:

I have an 'Account' service which is responsible for managing all things 'user'.
I have a 'Messaging' service which is responsible for all things 'messaging' i.e. wall posts, conversations etc.

My question is this:

  • Should the account service store the master User with Roles and Permissions - when it is asked for a user it can send back a dto with the roles and permissions (could get chunky)
  • Should the Messaging Service maintain it's own copy of a User - with it's own set of roles and permissions?

I wasn’t sure what to answer, because a lot depended on the actual physical infrastructure of the system. But after some back & forth, it turned out that those were true services, in other words, they were independent from one another and each had its own data store.

That completely ruled out the first possibility, we don’t want to have to rely on another service for something that is as central for our service as authorization. The other option, of having each service (there are currently 5, all total) maintain their own users, is fraught with the potential for disaster.

Instead, a better option is to simply replicate the relevant parts from the Account’s service database to the related services. The authorization bundle record information about users, roles & permissions, that allows us to create the following data storage scheme for the Account database. Actually, we are talking about two different databases in the Account database instance:

  • Accounts – All the application specific account information
  • Permissions – All the authorization information

We setup RavenDB replication from the Account.Permissions database to each of the services databases, that means that any change to permission will be replicated to all the related databases.

For each service, we treat the authorization information as usual, and we get cross service, background replicated, fully distributed authorization system that can make authorization decisions without touching any external data source.

Let us take the example of viewing a message:

  • Jane sends a message on Joe’s wall (which should only be visible to Joe’s friends). The new message is written to the Messages database.
  • Drew then befriends Joe. That means that we setup the friendships on the Accounts database and the permissions on the Permissions database.
  • The information on the Permissions database then replicates to the Messages database.
  • The next query to the Messages database will make the authorization decisions locally, against its own copy, but it will get the new permissions and show Jane’s message to Drew.

That is quite elegant, even if I say so myself.

time to read 1 min | 66 words

I’ll be giving my Advanced NHibernate course in March 2011 in Dallas. We are talking about 3 days of intensive dive into NHibernate, how it works, fully utilizing its capabilities, and actually grokking the NHibernate’s zen.

You can register to the course here: http://dallas-nhibernate.eventbee.com

Registration will end next week, so if you are thinking about showing up, you had better hurry and register.

time to read 2 min | 332 words

The next build of RavenDB is going to include a major new feature, a Silverlight API. That comes in addition to our REST, .NET and JavaScript API.

What is means is that you can now just download RavenDB’s (the new bits are in the unstable fork right now) and start using it from Silverlight. Here is an example of how the code looks like:

var documentStore = new DocumentStore { Url = "http://localhost:8080" };
documentStore.Initialize();

var entity = new Company { Name = "Async Company #1", Id = "companies/1" };
using (var session = documentStore.OpenAsyncSession(dbname))
{
    session.Store(entity);
    session.SaveChangesAsync(); // returns a task that completes asynchronously

     var query = session.Query<Company>()
       .Where(x => x.Name == "Async Company #1")
       .ToListAsync();  // returns a task that will execute the query

In order to handle the Silverlight Asynchronous requirement, we have taken dependency on the Async CTP, which brings us support for TPL on Silverlight.

Overall, this make things a lot simpler all around, I think.

time to read 3 min | 442 words

I got the following question (originally about RavenDB, but I generalized it a bit):

I'm currently working on a open source project where I need background processing. The main scenarios are:

  • Processing data from a queue of incoming messages, like processing incoming mail that's put in a queue.
  • Processing data from a lot of different web services.

I've worked with scheduling frameworks like quartz.net before to schedule processing but in this case I'm looking at much bigger amounts of processing. It would be nice to add more workers depending on the load like raven db.

I think my main question is what's your experience when building background workers? What should I think about? Is there any framework that can help me?

The first thing to understand is that for data processing, actually implementing queuing is going to be a losing proposition. The absolutely major cost for most data processing task is IO, and the best way to handle that is to handle this via batching. Queues doesn’t really work for this scenario because they make it hard to process a batch of changes in one shot. Queues are natural for “pull from queue, process, move to next message”, which isn’t good when you are processing large amount of information.

The way this is implemented in RavenDB is that I have ensured that there is a cheap way to query by “last updated timestamp”. After that, it means that I am able to issues queries such as:

Give me the next batch of updated documents since update point 121.

Those queries are very cheap (they are fully indexed queries at the storage level).

Following that, each data processing task merely need to keep track of the last update point that it processed. Things get a little more complex when you assume that there can be periods of time where no activity happens, since you want to avoid polling in that scenario.

With RavenDB, if a processing task doesn’t find anything to process, it goes to sleep, and we ensured that this can work by raising a notification whenever the database change, in which case we can wake the waiting tasks. This approach allows us to efficiently process data without waiting for scheduled tasks (which result in update delays), without polling (which consume additional resources) and without complex logic (scheduling, determining what changed, queues, etc).

I find this to be quite an elegant solution.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}