Lucene.NET is UGLY
If you ever had to go through the Lucene.NET code base, I am sure that you’ll agree that the code base is quite ugly. It does a lot of low level stuff, which is almost always nasty, it is a port of a code from another language and framework, which means that it isn’t idiomatic code, and it has a lot of… strange things going on there.
- Exceptions are used far too often.
- There is a strong tendency to delegate things in such a way that make it hard to figure out where things are actually happening.
- The big stick approach to thread safety (slap a lock on it).
- Some really horrible things with regards to mutable shared state with IndexInputs.
Here is a good example of many of the issues that I talk about:
https://github.com/apache/lucene.net/blob/trunk/src/core/Search/FieldCacheImpl.cs#L207
Read this method, and I think you’ll understand.
Then again, you can see methods of similar or greater complexity in RavenDB, for example, see here:
https://github.com/ayende/ravendb/blob/1.2/Raven.Database/Indexing/IndexingExecuter.cs#L60
My main problem with the Lucene.NET codebase is that it feels alien, it isn’t .NET code, and it shows.
Then again, Lucene is also quite beautiful, but I’ll talk about this in my next post.
Comments
Well, that alienity was the reason why Lucene.NET almost disappeared a few years ago: nobody wanted to work on it as it was too java-like. And it still is that way: it's an automated port from the java version of Lucene... A lot of discussion was made to do it really .NET like, but not sure if .Netification is just something built on top of it, or has deeper roots
Gotta love the next method down in the Lucene.NET code: PrintNewInsanity.
"There is a strong tendency to delegate things in such a way that make it hard to figure out where things are actually happening."
Right at this point it became clear that Lucene.NET was ported from Java.
What Simone said -- as much provenance that was beholden as holy by the original crew. Also, I think you said yourself that low level code that does real things tends towards ugly as you are handling strange edge cases that shouldn't bubble up to the caller.
One solution, although not an overall "simple" one, would be to create a kickstarter project to have a .net-ification of such an important project. In that way we all can have a really idiomatic .net version of Lucene in a relatively small amount of time.
The problem, then, would be to keep it updated with all the features and bug fixes of the original Java version, but that is another story...
@njy -- that is the real challenge. That said, the guys have been doing an amazing job of catching up. Little more than a year ago it was stuck at kinda sorta Lucene 2.9.4, they have got a 3.0.3 release in the works. The project has moved from the incubator to being a real Apache project. Plans are in place to push forward to parity with java Lucene.
@njy that's the ENTIRE story
As a committer to 2 separate Lucene ports (CLucene and Lucene.NET) I can assure you a complete .NET version of Lucene.NET is means with no end. If you'll ever complete this huge task of writing a fresh port, you'll be left with non-maintainable project, because maintainable in this context means keeping up with the original codebase.
There was one such project, btw, called Lucy, and it wasn't able to keep up at all.
There are some isolated places, the FieldCache for example, where a custom .NET implementation could be easily added without fearing of a notable fork. For the rest of those comments, I'm sure the Java Lucene guys would love to hear insights about their code, and they'll most probably act on them.
Other than that I'm afraid we will have to stick with the current approach. Line by line port is the only way we can guarantee full maintainability, and let's not forget the bottom line is whether we have Lucene for the .NET platform or not - and not code readability.
Oren, your future posts list does not show anymore (at least not for me). Is this on purpose?
@Gilligan: is it strange to think of "no more posts by Oren" future, isn't it :) ?
@Itamar: i agree 100%. one crazy idea i had some time ago was tu build a kind of a very thin idiomatized layer on top of lucene, to expose just the basic functionality in a cleaner .net way. But overall it felt not a that smart of an idea to me, too gimmick than anything else.
well, something like this http://code.google.com/p/lucene-dotnet-api/
again, i'm not that sure it is worth the effort.
@njy that's one thing that I really can't understand. People (not necessarily you) have time in their hands to go and write idiomatizing layers or frameworks but they won't engage with the actual community.
If someone feels so strongly about something, he should really come forwards with his concerns. I assure you no good donated piece of code will be left untouched.
@Itamar: you're kind of right. Some possible reasons are that they don't feel comfortable showing their code to the community, or that they have "this freaking super cool idea" and don't want to hear from others if that is good or not, they just want to do it, and make it real... and... obviously... 9 times out of 10 projects get abandoned and this grandiose dream fell of the cliff.
Is anyone else not able to see the last 4 months of posts? The most recent one I can see is from 5/22/2012. I only got to this one by clicking on one of the recent comments.
Chris, Yes, it is a bug that I just found, will be fixed soon
same could be said about NHibernate
Same could be said about every real world code
Comment preview