Batch processing with subscriptions in RavenDB 4.0
Subscription is a somewhat neglected feature in RavenDB. It was created to handle a specific customer need and grew from there, but it had relatively little traction and was a bit of a pain to use. When we looked at the things we wanted to do in RavenDB 4.0 re-working how people use subscription was high enough in the list that it got a dedicated dev for about a year.
Here is how a subscription looks like in RavenDB 3.x.
It is only available from code, and the model used is heavily influenced by Reactive Extensions. It give you reliable subscription to data, even if the client or server went down, it could recover on restart, but it was complex to do the more advanced things. There are events that you can register to respond to things that are happening, but there isn’t a complete story. Other things, such as automatic failover or responding to deletes were flat out impossible.
With RavenDB 4.0, we decided to do things differently. I talked about this before several times, but recently we completed a major restructuring and simplification of the user visible behavior that I’m really happy about. To start with, we ditched the Reactive Extensions and IObservable model. This is just not the right fit for the kind of things we want to do. Instead, we are going with full blown batch processing.
Instead of being called once per item, we are going to call you one per batch. This is actually how things are going over the wire, and exposing it directly to the user make our life a lot easier. It also means that you have much better model to actually do things in a batch mode. Such as applying modification to all the items in the batch and saving them back in a single operation.
Subscriptions in RavenDB 4.0 are also fault tolerant and highly available (both client & server), allow to access versioned and deleted snapshots, allow to apply complex filtering and transformations on the server side and in general a lot more suitable for the task we intend them for.
Perhaps what is more exciting is that subscriptions are available to all the clients, and in some cases, it just make more sense to write them as a batch processing script. Consider:
This is the kind of thing that can really make the operations team happy, because they can do targeted jobs with very little friction. I spend the whole of Chapter 5 talking about subscriptions, and I think it is well worth it.
Comments
In the second snippet it probably is Customer customer, not Order customer.
njy, Yes, thanks, fixed
Spell mistake in "Chapter 5" link: An extra a and no ":"
Carsten, Thanks, fixed
Interesting. We use the changes API for listening on document updates i the Foopipes RavenDB plugin. If the new api gives reliable data changes that would indeed improve things. If the operation team would like to do targeted jobs with even less friction take a look at https://foopipes.com/
Niclas, Yes, using subscriptions for this is probably much better.
Assuming 4.0 will still have Versioned<T> support, I'm really digging subscriptions. Versioned<T> combined with batch items will improve the architecture of my web apps.
In several projects at my day job, I've been witnessing my controllers grow fat as new business logic is added over time and as our customers ask for new features. I've been thinking about refactoring the controllers into slim controllers that call out to fat services. It's better, but the controllers still have to know about what services to call.
But this subscriptions features, with it's Versioned<T> support and batched items, I could rework it so subscriptions handle the service invocation: subscribe to changes to Products, look at the pre-update object and post-update object thanks to Versioned<T> and determine whether a service needs invoking, then fire it off. Basically, a bunch small services that operate on Versioned<T>. Controllers don't need to be involved at all.
Subscriptions with batch items and Versioned<T> is a very powerful feature. I'm excited to try it out.
Judah, Yes,
Versioned<T>
is still very much in place, and that is exactly the kind of features this is meant to enable.This is looking great. I could never get batch processing of documents working nicely with the current API.
Does using batch.OpenSession() mean that the batch.Items returns documents that are tracked by the session?
Andrew, No, the batch documents are not tracked by the session, you need to associate them directly. But that is an interesting feature to consider, the problem is that in many cases, you aren't working with documents. You are working with revisions, projects, etc.
Comment preview