Automatic subscription retries with RavenDB

time to read 3 min | 523 words

imageRavenDB’s subscription give you the ability to run batch processing easily and robustly. In other words, you specify a query and subscribe to its results. RavenDB will send you all the documents matching the query. So far, that is pretty obvious, but what is important with subscriptions is the fact that it will keep sending you results. As long as your subscription is opened, you’ll get any changed document that matches your query. That gives you a great way to implement event pipelines, batch processes and in general opens up some interesting options.

In this case, I want to talk about how failures with subscriptions. Not failure in the sense of a server going down, or a client crashing. These are already handled by the subscription mechanism itself. A server going down will cause the cluster to change the ownership of subscription, and your client code will not even notice. A client going down can either failover to another client. Alternatively, upon restart of the client, it will pick up right from where it dropped things. No, this is handled.

What require attention is what happen if there is an error during the processing of a batch of documents. Imagine that we want to do some background processing. We could do that in many ways, such as introducing a queuing system and tasks queue, but in many cases, the overhead of that is quite high. A simpler approach is to just write the tasks out as documents and use a subscription to process them. In this case, let’s imagine that we want to send emails. A subscription will run over all the EmailToSend collection, doing whatever processing is required to actually send it. Once we are done processing a batch, we’ll delete all the items that we processed. Whenever there are new emails to send, the subscriptions will get them for us immediately.

But what happens if there is a failure to send one particular email in a batch? Well, we can ignore this (and not delete the document), but that will require some admin involvement to resolve. Subscriptions will not revisits documents that they have already seen. Except if these documents were changed.  Here is one way to handle this scenario:

In short, we’ll try to process each document, sending the email, etc. If we failed to do so, we’ll not delete the document, instead, we’ll patch it to increment a Retries property in the metadata. This operation has two interesting effects. First, it means that we can keep track of how often we retried a particular document. But as a side effect of modifying the document, we’ll get it back in the subscription again. In other words, this piece of code will give a document 5 retries before it give up.

As an admin, you can then peek into your database and see all the documents that have exceeded the allow retries and make a decision on what to do with them. But anything that failed because of some transient failure will just work.