RavenDB On Linux–Status Update
Several months ago we decided to ramp up the RavenDB on Linux migration effort, and hired a full time developer to do just that.
We started this with great hopes, mostly because we were able to get Voron to run on Linux in a reasonable amount of time. But RavenDB is several orders of magnitude bigger than Voron, and we run into a lot more complexities along the way.
In particular, and I am not quite sure how to put it nicely, the entire environment is pretty unstable. Our target was Mono – 4.3.0, MonoDevelop 5.10 and Ubuntu 14.04. And it takes really no effort at all to break pretty much everything there.
For example, SLEEP_DURATION_BEFORE_ABORT will, if you are running inside a debugger, or just running GC, sometimes, it will intentionally crash if certain operations takes longer than 200ms.
But in general, it feels like Mono just isn’t nearly stable enough for a production platform. Sometimes it would work fine, other time, you get horrible crashes in the process that required us to debug the mono runtime to figure out what is going on. Sometimes it was us doing stupid things, other times it was real bugs in the Mono runtime. Other issues related to missing websockets implementation (it appears to be there and then removed, for some reason), missing chunked encoding support etc.
As an aside, MonoDevelop in particular is… quite uncomfortable IDE, and it doesn’t compare well in pretty much any level of experience to the experience you get from other IDEs. That just exacerbate the problem, to be frank.
In other words, porting RavenDB to mono is a lot of hard work. But that was pretty much expected. What I didn’t really expect was how much work it would be not to port it, but to actually fill in missing / incomplete parts of the framework itself.
Now, there are plenty of other problems in there even without Mono. Typically Windows –> Linux port, anything from file paths to case sensitivity to finding different ways to do various things (from finding how busy the CPU is to getting low memory notifications to … well, you get the drift). Those are the kind of problems that we expected, and the kinds that frankly, we wanted to be solving.
But the instability of the mono runtime environment make it really expensive to port any non trivial software without spending most of the time debugging Mono. Even if the error is in our code, the only way to verify that is to actually debug Mono itself. And that puts a much higher cost on the actual porting effort. More to the point, this also means frightening things for trying to support this in production. I don’t relish the thought of having to go to a customer and tell them that a particular issue is caused by a GC bug and that fixing that will require a new custom runtime. Leaving aside the actual cost of such a support call.
So we stopped, and looked at the CoreCLR. I have a much higher expectation of quality from Microsoft, given the track record of the .NET framework. The problem is that the CoreCLR, while it is supposed to have an RC out in November, it still quite problematic on Linux. For example, you can’t use Unix paths in Uris, which broke us pretty early in the process. That was annoying, but expected for the time being, and while this issue (and other stuff we run into, I’m only pointing this out as a simple example, nitpickers, don’t get hang up on this) are surmountable, the major issue is that the CoreCLR require a lot more than just a few #ifdef, it require quite a lot of work. Probably restructuring the project (some dependencies are now nuget packages, different structure for projects and build system, etc).
Therefor, what we intend to do now is wait a bit until the CoreCLR next release, probably September, and then start seeing what it takes to run RavenDB (both server & client) under the CoreCLR on Windows. The idea is that we have much better tooling for working with .NET code on Windows, and hopefully the bulk of the work to actually run on the CoreCLR runtime there, then just deal with everything we need to run on Linux, and not have to worry about debugging the runtime as well.
Now, to deal with the nitpickers:
Mono is Open Source, you can fix any issues you find and submit pull requests.
That is correct, but let me talk about a few of the issues we have run into.
The class library is partial / bad. I think that we would have done much better if we were building on Mono from the get go, we would know what pieces to avoid. But when a call to a ZipFile.Open will crash with SIGSEGV and no indication of why that is happening (ended up being stack overflow in the Mono BCL implementation), the fix was a single line but finding the root cause took a long while.
With the actual runtime, a GC issue caused a segmentation fault, that took about a month to figure out.
Now, we aren’t expert in Mono. So when we got in trouble we reached out to the members in the Mono team, and were recommended a company that had the required expertise. I’m very happy with the service provided by them, but the fact of the matter is, it took a month to narrow down a big problem to something that would be fixed by a couple of lines of code. Not because they weren’t good, but because the problem was pretty tough to figure out.
So we contributed some stuff back, and we would be happy to continue doing so if the process wasn’t so hard (and quite expensive). Even relative simple issues had a very high threshold.
The RavenDB codebase is close to a million lines of code. It is doing some pretty advanced stuff, and changing the foundation underneath it means that we invalidate a lot of silent assumptions. If the trivial stuff breaks so badly, I don’t want to think about the cost of the really complex stuff. The GC issue was bad enough.
If we were working from scratch, we would know what to avoid, but with such a big codebase, that isn’t feasible. And again, even if we did manage to get it working properly, we still have the issue of how to support it. We provide production support to our customers, and the ability to accurately and quickly pinpoint problems and troubleshoot them is key. Taking on the support burden on Mono is very risky.
Note that the worry I have isn’t based around vague fears. The GC issue we had would cause random crashes, typically far after the relevant code would run, and only under very specific scenarios. In 99% of the cases, it would run just fine (well, not fine, but it wouldn’t be crashing). A production system with this type of behavior would be a nightmare, even escalating to the Mono core team, figuring out where and how this is happening is nasty, and it won’t happen in the time frames that we want to provide to customers who have production support contracts with us.
What about the CoreCLR, it is Open Source too, why not contribute to it?
The current plan, as I said, is to first see what it takes to move to the CoreCLR on Windows. That is a much smaller step, but it will let us get familiar both with the way the CoreCLR is structured and the new dependencies. We are actively looking at the project, and to say that I’m excited is not nearly enough. This require a lot of work on our side, before we get to the point where we are actually getting to figure out if there is stuff that needs fixing there too.
Once we have that, we are going to go back to Linux and get to working on running on a different OS, with all the implications of that. I’m actually looking forward to this. That is the kind of problem that I want to tackle.
If you want to look at the current state of our work (RavenDB on Linux using Mono), you can check it out here. We still have a full time developer for this, we are just going to divert him until the next beta of CoreCLR is out, and then we are going to restart the process as I described above.
Comments
I'm with you on the bad WebSockets support in Mono: https://bugzilla.xamarin.com/show_bug.cgi?id=32575
I hate to say I told you so :-) http://ayende.com/blog/170241/buffer-managers-production-code-and-alternative-implementations
For gc issues try setting clear-at-gc mono debug flag it got rid of most of issues we saw.
Also I would look into say vim or emacs + omnisharp as opposed to monodevelop for everything aside from step debugging.
Great post, If this is the case, I would wait for the coreclr
This is a database, should work 100 percent of the time. Readability and performance are the keys for servers Seems like things not matured enough
Linux is nice to have, but not a must
We started some projects on IOS/Android - mono-xamarin- and monoDevelop and I can really relate to the monoDevelop pain. We left their .NET mobile strategy because of mono bugs, Xamarin Form bugs, PCL version crap etc. Fortunately there are other mobile options out there.
( bit of topic here) It all did me wonder about the whole .NET dev mobile strategy. MS does not have a decent one or they do and not telling it yet. At least for not for .NET.
When I read such (great) posts about Linux etc. I know what I pay for with Windows Server. The licensing costs now seem insignificant.
I never have really left the Microsoft .NET ecosystem. Do you think that most other open source systems that are developed by the community are in a similar state of quality?
"Do you think that most other open source systems that are developed by the community are in a similar state of quality?" That's sort of like looking at Comcast and asking "Do you think that most other companies run by the business world are in a similar state of quality?"
@Rob, this is an honest question. The answer might be yes or no. Both are, I think, plausible.
Greg, Most of the work so far has actually been to do debugging, to try to figure out why something doesn't work :-)
Also, this seems to indicate that clear-at-gc shouldn't be there: https://github.com/EventStore/EventStore/issues/225
Oren,
Thanks for linking back to my own project :). And yes as of now it still needs to be there.
Cheers,
Greg
Mark, A lot of that depend on the culture of the project, regardless of whatever it is OSS or not. Microsoft is pretty fanatic about stability, to the point where it can really hinder innovation at times. Mono is much faster at getting things out the door, but with very different bar of quality.
OSS projects, like commercial projects, run the gamut of great to garbage.
Greg, What exactly does this do? And I know this is your project, you were saying that that it isn't needed.
@Oren: can you please tell more about that GC issue you encountered? One of our guys is struggling with what seems to be a Mono GC bug, and he's having a hard time tracking it down. I believe the way you got to the root of this can be not only an interesting story to read, but also a big help for the people dealing with similar issues.
HellBrick, The fix is here: https://github.com/mono/mono/commit/41c1a773f95c5cc05c8350e8fbeba144341780e2
I talked about the issue here: http://ayende.com/blog/169729/current-status-voron-on-linux
Thanks, I guess I missed that post ;)
I love Windows/.NET and run a large development team using it (and RavenDB) for large-scale e-commerce. However, I have been picking up quite a bit more Linux love these days. Lots of the best tools and platforms out there run better on Linux. In general, Linux requires fewer resources and scales better too. I find the DevOps easier on Linux as well mostly thanks to more varied and mature tooling. The communities around things like Java and Node are full of awesome people and amazing projects. It's a very developer-friendly world. That doesn't mean .NET is mature there yet or that Xamarin has done a great job of making it so. Just be careful about using problems with Mono as an excuse to call Windows clearly superior.
Oren - I reread Greg's comment and I think it was just a misunderstanding, he suggested you try enabling that setting to see if that helps.
Interesting. I remember some of the painful things you ran into with Mono.
Moving to CoreCLR sounds like a reasonable step. Let me know if I can help out with that.
Shalom to RavenDB developers and especially Oren Eini,
my name is Jan Sichula and I live in Slovak Republic where I pastor an independent Baptist church located in capital city Bratislava. We are in need to develop an all new church web site from scratch as the old one from some 15 years ago is not worth the investment to modify. I personally have some essential experience with .NET and have worked on some hobbyist WebForms projects in the past. The plan is to go with ASP.NET 5 as this is clearly the future and it looks like Microsoft was able to recast the platform for the era of cloud. As far as data store goes, after extensive search, I am inclined to go with RavenDB as I like the architecture and believe that it is best for our project to be in full charge of data store in contrast to PaaS solutions like Cloudant. Now I expect to finalize the decision around the November which is the time when ASP.NET5 RC 1 should arrive. All of this leads me to following questions that I would kindly ask.
The announcement of CoreCLR port for RavenDB is important to us since it will be good to have a choice to move between Windows and Linux. I understand that extensive porting work will probably require a lot of time but are you irrevocably committed to such a port?
May I kindly suggest that you would first focus on porting the client library as this should be easier to port. This way many can move the client apps to .NET Core and drop the dependence on full .NET Framework while servers would continue to be operated on full .NET for the near future.
Now I also wonder how you will handle the lucene.net part of your product as I was not able to find any mentions of lucene.net being ported to CoreCLR. I would sincerely be interested to hear on what will be your strategy with lucene.net as you are realizing the port of RavenDB to CoreCLR.
Also tangentially to the question above, is there any plan B for RavenDB future should lucene.net by eventually abandoned and no new .NET port of Lucene ever produced? Or in other words, are there any alternatives considered beyond lucene.net for the search functionality?
Now the last question can sting a bit and yet it is a respectful enquiry like all the others. Why is it that this very blog site is using Google custom search instead of demonstrating search delivered by RavenDB? Is this the sign of significant limitations of search functionality in RavenDB or not so?
Now I like and enjoy your product so far and yet I would first of all appreciate brutally honest answers to questions above. Thank you in advance.
Best regards, Jan Sichula
https://www.facebook.com/jan.sichula https://twitter.com/JanSichula
Ján,
1) We fully intend to have RavenDB running within the CoreCLR, yes.
2) The client library should be working on the CoreCLR now, there isn't major stuff there.
3) We'll probably need to port it to it as well.
4) There is active development of Lucene by the community, see http://code972.com/blog/2015/01/79-lucene-nets-new-future-and-status-update
5) It was easier to just wire Google search for already indexed data than write everything from scratch. I want to write database software, not blog engines.
ערב טוב or good evening (at least in my country it is an evening now :-))
Thank you for your prompt answers
@1 I am very happy to hear that as it looks like CoreCLR is not only the future of .NET on Linux but also the future of .NT on Windows which can be seen from the fact that upcoming Windows Nano Server will only support CoreCLR and not “legacy” .NET Framework.
@2 Then would you please prepare a build for those of us who have not yet gained enough expertise in this field.
@3 May I suggest that you consider hiring Itamar Syn-Hershko to help you with that task.
@4 I am very pleased to learn this great piece of news.
@5 Well, this is a pity somewhat, as your textually heavy blog would be a great demonstration of RavenDB search capabilities for potential adopters like us who are considering to utilize it in a very similar textually heavy scenario.
Overall I am very encouraged by your answers and will all the more eagerly continue to evaluate RavenDB until the final decision will reached sometime in coming weeks.
Best regards, Jan Sichula
Hello,
here are three excellent articles on how to target .NET Core when porting libraries formerly targeting .NET Framework 4.x. May be these resources will come handy in the effort to port RavenDB client library to .NET Core.
http://oren.codes/2015/07/29/targeting-net-core/ http://oren.codes/2015/06/16/demystifying-pcls-net-core-dnx-and-uwp-redux/ http://oren.codes/2015/06/09/pcls-net-core-dnx-and-uwp/
Comment preview