Notes on RavenDB vs. PostgreSQL
A common question I field on many customer inquiries is comparing RavenDB to one relational database or another. Recently we got a whole spate of questions on RavenDB vs. PostgreSQL and I though that it might be easier to just post about it here rather than answer the question each time. Some of the differences are general, for all or most relational databases, but I’m also going to touch on specific features of PostgreSQL that matter for this comparison.
The aim of this post is to provide highlights to the differences between RavenDB and PostgreSQL, not to be an in depth comparison of the two.
PostgreSQL is a relational database, storing the data using tables, columns and rows. The tabular model is quite entrenched here, although PostgreSQL has the notion of JSON columns.
RavenDB is a document database, which store JSON documents natively. These documents are schema-free and allow arbitrarily complex structure.
The first key point that distinguish these databases is with the overall mindset. RavenDB is meant to be a database for OLTP systems (business applications) and has been designed explicitly for this. PostgreSQL is trying to achieve both OLTP and OLAP scenarios and tends to place a lot more emphasis on the role of the administrator and operations teams. For example, PostgreSQL requires VACUUM, statistics refresh, manual index creation, etc. RavenDB, on the other hand, it design to run in a fully automated fashion. There isn’t any action that an administrator needs to take (or schedule) to ensure that RavenDB will run properly.
RavenDB is also capable of configuring itself dynamically, adjusting to the real world load it has based on feedback from the operational environment. For example, the more queries a particular index has, the more resources it will be granted by RavenDB. Another example is how RavenDB processes queries in general. Its query analyzer will run through the incoming queries and figure out what is the best set of indexes that you need to answer them. RavenDB will then go ahead and create these indexes on the fly. Such an action tends to be scary for users coming from relational databases, but RavenDB was designed upfront for specifically these scenarios. It is able to build the new indexes without adding too much load to the server and without taking any locks. Other tasks that are typically handled by the DBA, such as configuring the system, are handled dynamically by RavenDB based on actual operational behavior. RavenDB will also cleanup superfluous indexes and reduce the resources available for indexes that aren’t in common use. All of that without a DBA to perform acts of arcane magic.
Another major difference between the databases is the notion of schema. PostgreSQL requires you to define your schema upfront and adhere to that. The fact that you can use JSON at times to store data provides an important escape hatch, but while PostgreSQL allows most operations on JSON data (including indexing them), it is unable to collect statistics information on such data, leading to slower queries. RavenDB uses a schema-free model, documents are grouped into collections (similar to tables, but without the constraint of having the same schema), but have no fixed schema. Two documents at the same collection can have distinct structure. Typical projects using JSON columns in PostgreSQL will tend to pull specific columns from the JSON to the table itself, to allow for better integration with the query engine. Nevertheless, PostgreSQL’s ability to handle both relational and document data gives it a lot of brownie points and enable a lot of sophisticated scenarios for users.
RavenDB, on the other hand, is a pure JSON database, which natively understand JSON. It means that the querying language is much nicer for querying that involve JSON and comparable for queries that don’t have a dynamic structure. In addition to being able to query the JSON data, RavenDB also allows you to run aggregation using Map/Reduce indexes. These are similar to materialized views in PostgreSQL, but unlike those, RavenDB is going to update the indexes automatically and incrementally. That means that you can query on large scale aggregation in microseconds, regardless of data sizes.
For complex queries, that touch on relationships between pieces of data, we have very different behaviors. If the relations inside PostgreSQL are stored as columns and using foreign keys, it is going to be efficient to deal with them. However, if the data is dynamic or complex, you’ll want to put it in a JSON column. At this point, the cost of joining relations skyrockets for most data sets. RavenDB, on the other hand, allow you to follow relationships between documents naturally, at indexing or querying time. For more complex relationships work, RavenDB also has graph querying which allow you to run complex queries on the shape of your data.
I mentioned before that RavenDB was designed explicitly for business applications, that means that it has a much better feature set around their use case. Consider the humble Customers page, which needs to show the Customers details, Recent Orders (and their total), Recent Support Calls, etc.
When querying PostgreSQL, you’ll need to make multiple queries to fetch this information. That means that you’ll have to deal with multiple network roundtrips, which in many cases can be the most expensive piece of actually querying the database. RavenDB, on the other hand, has the Lazy feature, which allow you to combine multiple separate queries into a single network roundtrip. This seemingly simple feature can have a massive impact on your overall performance.
A similar feature is related to the includes feature. It is very common when you load one piece of data that you want to get related information. With RavenDB, you can indicate that to the database engine, which will send you all the results in one shot. With a relational database, you can use a join (with the impact on the shape of the results, Cartesian products issue and possible performance impact) or issue multiple queries. Simple change, but significant improvement over the alternative.
RavenDB is a distributed database by default while PostgreSQL is a single node by default. There exists features and options (log shipping, logical replication, etc), which allow PostgreSQL to run as a cluster, but they tend to be non trivial to setup, configure and maintain. With RavenDB, even if you are running a single node, you are actually running a cluster. And when you have multiple nodes, it is trivial to join them into a cluster, from which point on, you can just manage everything as usual. Features such as multi-master, the ability for disconnected work and widely distributed clusters are native parts of RavenDB and integrate seamlessly, while they tend to be of the “some assembly required” in PostgreSQL.
The two databases are very different from one another and tend to be used for separate purposes. RavenDB is meant to be the application database, it excels in being the backend of OTLP systems and focus on that to the exclusion of all else. PostgreSQL tend to be more general, suitable for dynamic queries, reports and exploration as well as OLTP scenarios. It may not be a fair comparison, but I have literally built RavenDB specifically to be better than a relational database for the common needs of business applications, and ten years in, I think it still shows significant advantages in that area.
Finally, let’s talk about performance. RavenDB was designed based on failures in the relational model. I spent years as a database performance consultant, going from customer to customer fixing the same underlying issues. When RavenDB was designed, we took that to account. The paper OLTP – Through the Looking Glass, and What We Found There has some really interesting information. Including the issue of about 30% of a relational database performance is spent on locking.
RavenDB is using MVCC, just like PostgreSQL. Unlike PostgreSQL, RavenDB doesn’t need to deal with transaction id wraparound, VACUUM costs, etc. Instead, we maintain MVCC not on the row level, but on the page level. There is a lot less locks to manage and deal with and far less complexity internally. This means that read transactions are completely lock free and don’t have to do any coordination with write transactions. That has an impact on performance and RavenDB can routinely manage to achieve benchmark numbers on commodity hardware that are usually reserved for expensive benchmark machines.
One of our developers got a new machine recently and did some light benchmarking. Running in WSL (Ubuntu on Windows), RavenDB was able to exceed 115,000 writes / sec and 275,000 reads / sec. Hare the specs:
And let’s be honest, we weren’t really trying hard here, but we still got nice numbers. A lot of that is by designing how we interact internally to have a much simpler architecture and shape, and it shows. And the nice thing is that these advantages are cumulative. RavenDB is fast, but you also gain the benefits of the protocol allowing you to issue multiple queries in a single roundtrip, the ability to include additional results and dynamically adjusting to the operation environment.
It Just Works.
Comments
Some more important difference: I can use PostgreSQL with full performance for free in commercial products. RavenDB is restricted to a single core in a cluster, which is.... really poor.
@Alex still... apples and oranges
@Alex, I'm not sure how specifically that comment is helpful as regards the points Oren brought up in this post. If you need free, then probably performance, admin, development etc also need to be free...
@Alex, if someone doesn't have the budget for the commercial version of RavenDB, they will also have limitations, perhaps even more serious, budgeting for hosting, admin, development, customer service etc.
@Travis Comparing the performance of SQL with NoSQL is no apples and oranges, but comparing the price of two products is?
@Peter I can get PostgreSQL as SaaS or self hosted as well as RavenDB. Where is the difference? $749 per core is a really high price tag for some support, if I didn't need it. And I didn't need it with hosting PostreSQL as well.
Don't get me wrong, RavenDB is a wonderful database. But comparing it with some other database and its performance, it should be notices, that with RavenDB you get it only for thousand of dollars or only a really stripped configuration otherwise, that can't compete with PostgreSQL performance for free
if $749 per year per core is considered a really high price tag, then we are indeed discussing apples and oranges.
In our firm, they cannot possibly look at any "free" product, only vendor-supported products are considered for both legal and common-sense business reasons; being responsible for the funds of thousands of customers it would be absurdly irresponsible to not have support. Theoretically we could divert or hire resources to support it ourselves... but then "free" is now quite expensive. In a multi-million dollar IT budget (certainly when including wages) $749 per core is not significant, at most merely average. I would hope RavenDB is targeting customers in that tier, not the mom-and-pop shops who would anyway end up tying up support lines due to the lack of dedicated skilled devs.
Oren, on a related note, any opinion on this?
https://info.enterprisedb.com/rs/069-ALB-339/images/PostgreSQL_MongoDB_Benchmark-WhitepaperFinal.pdf
I'm not sure how fair that comparison is, since mongodb wasn't written (originally) for strong ACID/OLTP roles.
I once took a mongodb course for 2-3 days to see what it was about. I felt the main thing it was for was the powerful query and data-shaping capabilities. The queries became very complex, but you would not attempt (or even think of) the possibilities with a relational db.
Alex,
RavenDB is meant to be used in professional settings. In such cases, you aren't going to run without at least some vendor support. If you don't need that, great, and we have a free option that answer that need. Note that there are no artificial limits in the RavenDB Community edition on number of requests or size of the database. You can run small to medium services on it with ease. For that matter, given the usage we see for our RavenDB Cloud offering, quite a lot of people are able to run awesome services on specs that are far below what you can get in the community.
All of that said, RavenDB needs to pay the bills, and yes, we expect you to pay if your usage is going to be beyond the community limits. We find that our pricing is usually competitive or lower than the alternatives.
And when you compare, look at the total cost of ownership. That is, if the database is free, but your DBAs time is not. And again, for most scenarios, you need commercial support. A couple of minutes of searching showed me that you have to call to get a quote for such support for Postgres. (Meaning, _expensive_).
Peter,
Thanks for the reference, I wasn't aware of it. It's really interesting to see these numbers. I was interested to see the OLAP queries. If I was doing that in RavenDB, it wouldn't be fair for either database :-) You'll get results in milliseconds.
Note that the sysbench that they run is not using transactions, but still showed a major slowdown for MongoDB.
I'm not exactly sure about their numbers, because it hides multiple operations, but 2 - 3K ops / second seems very low. I wonder how this is accounted for, because that seems like sysbench operations, and the load on the DB should have been much higher.
Classic SQL Servers are today more than relational databases. They try to embrace NOSQL. JSON handling is better than tablescan+read blob.
I sincerely hope that RavenDB still can beat the Goliaths in Documentdb territory :-)
After your interesting blog, I googled and read the following:
https://bitnine.net/blog-postgresql/postgresql-internals-jsonb-type-and-its-indexes/
https://erthalion.info/2017/12/21/advanced-json-benchmarks/
https://heap.io/blog/engineering/when-to-avoid-jsonb-in-a-postgresql-schema
Carsten,
Take into account, the JSON vs. tabular issue is only part of things.
The distribution model is also very different, as are the kind of operations that are supported. For example, storing the data in a single location as JSON is often better than doing a JOIN, but when you are running distributed, you need to worry about how you'll be able to manage data consistency. There are also a lot to be said about the ergonomics of things. Compare a non trivial query involving JSON using Postgres and RavenDB, for example.
But for the most part, I think you'll find that it is the core philosophy is different, in many respects.
When you say "When querying PostgreSQL, you’ll need to make multiple queries to fetch this information" - that's not necessarily true. If you work with Dapper - it has an option called QueryMultiple that allows you to fetch multiple result sets in one network roundtrip (e.g. fetch calls, orders and customer details). I can't claim it works 100% with Postgres, but it worked fine with Sql Server last time I used it. So the problem exists, sure, but only for raw ADO.NET code which I haven't seen used in production in a while.
Gleb,
I am familiar with this feature. It is the baseline for NHibernate's
Future
feature. This give you _some+ capabilities, but they aren't really comparable. In SQL Server, you are limited to 2,100 parameters per the overall query. It sounds like a lot, but you'll be surprised how easy it is to reach this. The other issue is that you can't easily do chained queries. Give me the user, their recent orders and the recent products based on the recent orders. You'll have to execute the same query multiple times to get this, and the ergonomics just aren't there.Regarding maximum parameter count - I understand you can hit this, but that would be more like a corner case than a issue you get on a regular basis. In analytics scenarios that can be an issue though, especially if you're accessing multiple datastores to get finals results. For your second point - can you elaborate, does Raven allow that because of the agressive caching on the client? If all your data in one document and you already fetched it from the database, you don't need to access a network for the 2nd, 3rd and further queries?
Gleb,
You run into the param limit a lot more than you would expect. See SharePoint and Lists, as a great example there.
For RavenDB, you can do that because we have the includes features. See this discussion on the feature: https://ravendb.net/learn/inside-ravendb-book/reader/4.0/2-zero-to-ravendb#includes
I'll look into includes, thanks
Comment preview