The design of RavenDB 4.0Making Lucene reliable
I don’t like Lucene. It is an external dependency that works in somewhat funny ways, and the version we use is a relatively old one that has been mostly ported as-is from Java. This leads to some design decisions that are questionable (for example, using exceptions for control flow in parsing queries), or just awkward (by default, an error in merging segments will kill your entire process). Getting Lucene to run properly in production takes quite a bit of work and effort. So I don’t like Lucene.
We have spiked various alternatives to Lucene multiple times, but it is a hard problem, and most solutions that we look at lead toward pretty much the same approach that Lucene does it.By now, we have been working with Lucene for over eight years, so we have gotten good in managing it, but there are still quite a bit of code in RavenDB that is decided to managing Lucene’s state, figuring out how to recover in case of errors, etc.
Just off the top of my head, we have code to recover from aborted indexing, background processes that takes regular backups of the indexes, so we’ll be able to restore them in the case of an error, etc. At some point we had a lab of machines that were dedicated to testing that our code was able to manage Lucene properly in the presence of hard resets. We got it working, eventually, but it was hard. And we still get issues from users that into trouble because Lucene can tie itself into knots (for example, a disk full error midway through indexing can corrupt your index and require us to reset it). And that is leaving aside the joy of I/O re-ordering does to you when you need to ensure reliability.
So the problem isn’t with Lucene itself, the problem is that it isn’t reliable. That led us to the Lucene persistence format. While Lucene persistent mode is technically pluggable, in practice, this isn’t really possible. The file format and the way it works are very closely tied to the idea of files. Actually, the idea of process data as a stream of bytes. At some point, we thought that it would be good to implement a Transactional NTFS Lucene Directory, but that idea isn’t really viable, since that is going away.
It was at this point that we realized that we were barking at the entirely wrong tree. We already have the technology in place to make Lucene reliable: Voron!
Voron is a low level storage engine that offers ACID transactions. All we need to do is develop VoronLuceneDirectory, and that should handle the reliability part of the equation. There are a couple of details that needs to be handled, in particular, Voron needs to know, upfront, how much data you want to write, and a single value in Voron is limited to 2GB. But that is fairly easily done. We write to a temporary file from Lucene, until it tells us to commit. At which point we can write it to Voron directly (potentially breaking it to multiple values if needed).
Voila, we have got ourselves a reliable mechanism for storing Lucene’s data. And we can do all of that in a single atomic transaction.
When reading the data, we can skip all of the hard work and file I/O and serve it directly from Voron’s memory map. And having everything inside a single Voron file means that we can skip doing things like the compound file format Lucene is using, and chose a more optimal approach.
And with a reliable way to handle indexing, quite large swaths of code can just go away. We can now safely assume that indexes are consistent, so we don’t need to have a lot of checks on that, startup verifications, recovery modes, online backups, etc.
Improvement by omission indeed.
More posts in "The design of RavenDB 4.0" series:
- (26 May 2016) The client side
- (24 May 2016) Replication from server side
- (20 May 2016) Getting RavenDB running on Linux
- (18 May 2016) The cost of Load Document in indexing
- (16 May 2016) You can’t see the map/reduce from all the trees
- (12 May 2016) Separation of indexes and documents
- (10 May 2016) Voron has a one track mind
- (05 May 2016) Physically segregating collections
- (03 May 2016) Making Lucene reliable
- (28 Apr 2016) The implications of the blittable format
- (26 Apr 2016) Voron takes flight
- (22 Apr 2016) Over the wire protocol
- (20 Apr 2016) We already got stuff out there
- (18 Apr 2016) The general idea
Comments
Lucene.Net is a fairly complex piece of software but not overly complex. Why not build something custom for Raven? Using the same approach as for Voron.
When Octopus used Raven, Lucene was at the centre of most of our production issues too. Raven has features to cover Lucene's warts - we had to build features into our own product to cover those!
I really think you should go all the way, and build indexing yourself. As a database company, indexing should be one of your core competencies, it makes so much sense to really invest in building that yourself.
I even think you would have been better to stick with ESENT (only caused us a few issues) and concentrate on removing Lucene instead of switching to Voron given the choice.
Paul
Pop Catalin, We looked into what this would take (see the posts about Corax). But it is a very big field, and pretty complex. We decided to hold off on this for now, just fix what was the worst offender and move on. Maybe we'll be able to get to it on the 5.0 release.
Paul, A large problem is related to the type of machine you are running on. Commodity hardware sucks in many cases, and you can't rely on what the hardware will tell you.
We solved most of those problems with Voron, so just by putting Lucene on that we gain a much better safety guarantees. I agree that indexing is something that we want to own, but that isn't very simple. Along with the other changes we do/intend to do in 4.0, there just isn't enough space to also replace Lucene. Esent is problematic of other reasons (doesn't run on Linux, we don't control it and it blocks a lot of opportunities like the blittable format optimizations).
there is no need to re-invent the wheel. Lucene is an awesome software peace, I would only port newer version and optimize it
One advantage to using Lucene is that anyone with previous experience has a head start, and that knowledge is usable across a lot of other stacks (elasticsearch, solr, direct lucene). Removing the warts would be great, but keeping the "api" would be better, IMO :)
Bruno Lopes, Yes, if/when this happens, we are going to have to maintain a lot of backward compatability
I wonder: have you tried using Lucene (the Java version) through IKVM? I used this approach, since I needed some features which weren't available in Lucene.NET, which is 3 major versions late and didn't get a new release for over 3 years. I suppose you could run into performance issues though, but I'm curious if this option has been considered/profiled.
Lucas, That isn't something that would be viable for us, no. We need to be able to properly support it, and adding IKVM is just too complex for our operational requirements. We are going to be focusing on helping the next version of Lucene once we free up the capacity to do so
RavenDB queries are BASE because you build your indexes asynchronously. Why not to add RDMS-like synchronous indexes to RanvenDB ? I mean B+ trees for range queries. You already have B+ trees in Voron so implementing them would not be a big deal. So, certain critical queries would never return stale info, and index based updates would be reliable. This could be your first step toward Lucene independence.
Jesus, Sync indexes or not doesn't actually matter for the implementation. We could implement sync indexes now with Lucene. And B+Trees indexes have sever limitation (you can only do queries on the specified key in the order specified. So, OrderBy FName, LName works with the index, but not using OrderBy LName, FName. Lucene handles that much more nicely.
And we considered ACID indexes, but the problem is that it would bring all the usual pain of those, and result in "all my indexes are ACID because of course I need to do everything ACID".
We added support for waiting when doing the queries for a reason
Yes, of course. But you most queries don't need all that fancy things Lucene does. B+ tree indexes are far simpler that Lucene indexes, lot easier to implement a much more efficient.
Jesus, Sure, they are easier to implement, in fact, we already have them (in 4.0 we moved things like Raven/DocumentsByEntityName to this), but changing something as fundamental as sorting is not something that can easily be done. Leaving aside the fact that we still have to resolve the issue of ACID indexes and what that would do to the kind of optimizations that we can give by not doing them.
Why not helping the guys that develop lucene.net instead of fixing what's wrong on your side?
Franck, The issue with Lucene are fundamental to the way they are working. It isn't something that can be fixed short of complete re-writing of the code. To make things easy, consider the fact that all the I/O in Lucene is buffered, and trying to go to unbuffered I/O (which would allow safety / ACID) has extremely high cost for the kind of usage Lucene is doing
Ok, I see what's wrong, thanks for this short and clear answer!
Comment preview