NoSQL without web-scale
The application data is one of the most precious assets that we have. And for a long time, there wasn't any question about where we are going to put this data. The RDBMS was the only game in town. The initial drive away
from the RDBMS was indeed driven by the need to scale. But that was just the original impetuous to start developing the NoSQL solutions. Once those solutions came into being and matured, it isn't just the "we need web-scale" players
that benefited.
Proven & Mature NoSQL solutions aren't applicable just at high end of scaling. NoSQL solutions provide a lot of benefits even for applications that will never need to scale higher than a single machine. Document databases drastically
simplify things like user defined fields, or working with Aggregates. The performance of a NoSQL solution can often exceed a comparable RDBMS solution, because the NoSQL solution will usually focus on a very small subset of the
featureset that RDMBS has.
Comments
That's why CouchDB is targeting mobile devices which is anything but webscale...
BTW: What's the state of the managed file storage for RavenDB? ;)
Louis,
We have a sort of working impl in a branch.
"Proven & Mature NoSQL solutions" ? come on, you know better...
Another one NoSQL-better-than-RDB post...
Let's try a real solution.
A web site - StackOverflow clone.
Entites:
-Questions
-Answers
-Users
Questions can be tagged with zero or more tags. Questions and Answers can be commented. User can vote (+1 or -1) for Question\Answer\Comment.
Use cases:
1)Show recent Questions
2)Show popular Questions (with most votes)
3)Show Question with Answers and Comments
4)Show Questions by Tag (with sorting by date and popularity)
5)Show "tag cloud"
6)Show user Questions, Answers, Comments
7)Show user votes
8)Show comments to user Questions and Answers
9)Create\Update\Delete Question
10)Create\Update\Delete Answer
11)Add comments
12)Each Question\Answer\Comment should be displayed with author and sum of votes.
In RDB "world" solution is trivial.
Create table for Questions\Answers\Comments\Users\Tags\Votes.
Create junction tables for relations if needed.
Use joins to query data. Use indexed\materialized views for votes\answers\comments\questinos-for-tags count.
What about NoSQL?
It is true that NoSql is less flexible in case of heavily interconnected data. I suspect, without having tried, that using raven indexes will help greatly because you do not have to maintain all those different representations, that you need for querying, yourself.
gandjustas,
Now scale your solution...
It is actually very easy to handle each of your scenarios with RavenDBl
They pretty much translate directly to an index.
@gandjustas
I have some documentation available on how you would build a simple blog using Redis available here:
code.google.com/.../DesigningNoSqlDatabase (There's also a refactored version that puts all redis access behind a repository pattern: http://bit.ly/9niEHU).
This essentially mimics what ayende is doing with RavenDB in his series of blog posts here:
ayende.com/.../...al-modeling-anti-pattern-in.aspx
Using Redis Sets / Sorted Sets takes care of voting in a single, super-fast operation, in-fact Jeff Atwood (@CondingHorror of StackOverflow fame) has said that they are making use of Redis now in StackOverflow and all their StackExchange sites: http://twitter.com/#!/codinghorror/status/22417440038
At the moment it looks like its only used for their shared-caching solution although this is just another scenario where NoSQL db's provide superior solutions over RDBMS's.
@Ayende Rahien
Scaling is not necessary. Database with 1M Questions and 10M Answers, Comments and Votes will fit into 50Gb. It's not-so-large database. One server can easely handle this data.
A want to see a NoSQL solution for StackOverflow cases.
PS. StackOverflow has less than 1M Questions.
@Demis Bellot
Cache is not a primary data store. For caching there is no requirements for Consistency and Durability.
Funny how a blog post about how NoSQL can also be a good choice besides scaling requirements turns into comments about 'scaling stackoverflow.com'.
@qandjusta
Just because you find a case where a document database is not the best fit for a problem from a modeling perspective doesn't invalidate the claim that some problems are better modeled with it.
With the little I know, I would take a look into graph databases for highly connected data like stackoverflow.
And there are requirements for consistency for a caching solution. You don't want the user not to see the update he just made. Remember that eventual consistency doesn't mean no consistency and results in delays at computer scale. For most scaling databases (be that NoSQL or YesSQL) the human reaction time is already considered as laughably slow.
NoSQL isn't about scaling, it's about NoACID. dbmsmusings.blogspot.com/.../...w-to-fix-them.html
And before that guy is butchered to death, read his resume.
What I also find funny is the remark "nosql db's are easier with aggregates", you really mean "storing readonly denormalized data", I think. Aggregates calculated on the fly really require a set-based language, not a document graph.
Btw, it's not that RDBMS-s always are the best choice, it's just that claiming RDBMS-s are 'old news because something better came along' as NoSQL solved all the problems is simply naive.
@gandjustas
I provided those examples because it is existing documentation available that closely matches what you want to achieve. You should be able to extrapolate based on those approaches to meet the other requirements.
There is nothing inherently difficult about creating a StackOverflow using a NoSQL database. Of those features mentioned, what do you think would be the most difficult of those to maintain in a NoSQL db?
IMHO the most unnatural part of building a db with NoSQL is to identify your querying requirements upfront so you can maintain indexes on them. You can always add the indexes after and doc db's like RavenDB and MongoDB also allow you to perform adhoc querying after the fact. So there is a little of thinking different and my first link on my previous comment should hopefully help with designing a NoSQL db from a RDBMS background.
Included as test data for my Redis Admin UI demo I've imported the entire Northwind Relational Database you can see here: www.servicestack.net/RedisAdminUI/AjaxClient/
From a list of POCO's importing the entire Northwind DB literally took around 11 lines of code (see bottom: code.google.com/.../ServiceStackRedis) 1 LOC per table.
@Patrick Huizinga
Can you tell what problems better modeled with document database?
Frans,
RavenDB is fully ACID.
There is nothing in NoSQL that says tat you need to lose ACID.
Huh?!
Let us talk about the Order Aggregate, what do you mean about it from there?
@Demis Bellot:
I'm already identified querying requirements in my first comment. But no one offered solution in NoSQL (I posted this requirement in other blogs and forums).
@gandjustas
I would say anything that is non-relational would be a candidate.
I actually view NoSQL dbs as a complimentary rather than a supplementary technology. It definitely isn't the right choice in all cases although there are clear scenarios where it holds advantages over a RDBMS - I list a few of them in my blog post here: http://www.servicestack.net/mythz_blog/?p=129
As is the case with any new technology there is sometimes a fear of the unknown when dealing with NoSQL db's, however I would approach NoSQL db's like learning a new language, once spending some time to get familiar with it you will learn different approaches to solving the same problem which will at the very least make you a better all-round developer. This will give you a better idea to assess where it makes sense to use it or not.
The beauty of NoSQL is that most of them are free and are very easy to get started (typically just download the server and run). For a quick taste, Google App Engine actually provides a general purpose free hosting web development environment in Python or Java. It uses BigTable as its primary storage and I think you will be surprised how quick and frictionless it is to develop and deploy apps based on it.
@gandjustas
I'm asking you what you think is the most difficult feature so I can explain how you would achieve that particular functionality in detail. Providing a complete Stack Overflow solution is not the best of use of our time especially since the existing documentation I provided should give you a general idea on how you would use NoSQL to model the solution.
How does the failover story read in RavenDB?
Frank,
If you have replication setup, you have automatic failover.
@gandjustas:
I believe the NoSQL option has its benefits, but also its drawbacks. One of the big benefits for me is the fact that you no longer need complex mapping schemes (NHibernate) or ugly ActiveRecord-style code (or plain old SQL in your code, ugh).
NoSQL does force you to construct your domain model nicely, with aggregate roots and such, but that can be seen as something positive.
You will have to solve problems where one aggregate needs to reference another, or part of another.
But even then, you get to focus on your domain, business and UI logic, which should be your core business.
Most difficult feature is implement ALL cases with good-enough performance without by-hand denormalization, aggregetion etc.
gandjustas,
Not at all. For that matter, take a look at Raven's MVC Music Store example
@Peter Morlion,
It's will be a biggest problem, with most performance impact.
Oh, MVC Music Store is a good example.
It's not good data access with EF. There are lack of projections.
In RavenDB version it's completly unextensible. What if I want to display
"top sales artist" widget on pages? Or I need create Rating for Artists: user rates artists, rating affects catalog sorting etc?
tobi mentioned "It is true that NoSql is less flexible in case of heavily interconnected data". But data can become "heavily interconnected" after applications shipped first time. It completly kills NoSQL for majority of applications.
Create an index, query the index, done.
Creating index on each query prevents dynamic query composition in application code, eg no Linq.
gandjustas,
No, it doesn't.
And RavenDB certainly supports linq.
@Ayende
Does Raven DB support queries defined at run time?
One advantage relational systems has over no SQL databases, is that you can create queries at run time, and they can create a query plan that can use existing indexes.
It seems that with Raven DB you need a predefined index, without that intex it is is difficult to have a decent performant query defined at runtime.
On the other hand, with no SQL databases, It seems to be a must to know query nedds up front. But query needs change over time. Furthermore, query defined at runtime cannot be predicted.
I see no SQL databases can help in current days. But they cannot substitute RDBMS for now.
Jesus,
That is a relatively new feature, but yes, it supports that.
Ayende,
yes you wrote about that, but you also said that a runtime query is the equivalent of a full table scan. This should be different to RDBMS'possiblity of reusing existing indices.
btw, at some point all this stuff shouldn't be called NoSQL anymore. For all I know you could introduce a SQL parser to RavenDB to define your indices, and what then? Subsequent renaming of hundreds of blog posts! When you are young it's nice to differentiate yourself by saying what you are not, but that can't be the end of the road :)
gandjustas, I think it is easy to query the same table in different ways with raven (just add an index). I am in favor of that approach. But the problems start to arise (IMHO) when you need a new query that joins two tables. Then you are forced to do manual maintainance because indexes in raven do not support the join clause.
In raven you can create an index that reformats an existing table but you cannot combine data from different tables automatically. I would be so happy if this feature was in the product. Manual maintainance of data structures _sucks_.
Seemingly, the only support of this scenario to some extent is the "include" feature, but that always does a nested loops join. You cannot get hash or merge join with it. I believe that join indexes can be maintained efficiently as well because sql server can do it. Ayende, why do you not implement it? Do you want to set the right mindset in raven and discourage the use of joins?
Frank,
No, not really. That is because the feature is so new, I haven't had the chance to blog about it :-)
There are 3 ways to query RavenDB:
Indexes
Linear query (which is what I blogged about, table scan).
Auto Query - uses same syntax as usual indexing, but doesn't require an index, and very efficient.
And actually, we DO support set based update/deletes :-)
That is what the map/reduce indexes are for. You can do that there.
As for how this is implemented, we probably need to ask this in the mailing list.
"you cannot combine data from different tables automatically"
I have seen your example ( ayende.com/.../...ting-the-homecontroller-the.aspx) but only one table is involved in the map reduce index. If the order lines were a separate entity the index could not be constructed.
Tobi,
Yes, it could.
It would be somewhat more awkward, but...
// map
from orderOrLine in docs.With("Orders", "OrderLines")
let order = orderOrLine.Is("Orders") ? orderOrLine : null
let line = orderOrLine.Is("OrderLines") ? orderOrLine : null
select new
{
}
// reduce
from result in results
group result by result.Album into g
select new
{
}
Hm I did not know about the With method. I believe your query does not work because result.Album can be null so grouping by it makes no sense. It can be fixed however and that is all that counts. In general you can get a joined index with the following steps:
var map = orders.Select(x => new { o = x, c = null }).Concat(customers.Select(x => new { o = null, c = x }));
var reduce = map.GroupBy(x => x.o, (order, group) => new { order, group.FirstOrDefault() };
Probably you can construct a helper method that constructs such a query by using the expression api. That would restore the convenience factor.
Comment preview