Beginning the RavenDB 4.0 book
You might have seen me talking about how close we are to a RavenDB beta release. Today marked a very important step along the route to an actual release. I’ve shifted my focus. Instead of going head down in the code and pushing things forward and doing all the sort of crazy stuff that you have seen me talking about for the past year and a half, I got started on the Inside RavenDB 4.0 book.
I say started because just the rough table of contents took me almost the entire day to complete. I’m expecting that this will take the majority of my time for the next few months, which means that you’ll get all the drib and drabs from the raw drafts as they are composed. I’m also using this as a pretty nice way to go over the entire product and see how it all comes together as a cohesive whole, instead of looking at just a single piece every time.
Given that the period of putting bugs in the code is almost over, I feel that I can safely let the rest of the team fish out all the oopsies hat I managed to get in and focus on the product, rather than the code. This is the second time that I have made such a shift (and the third time I’m writing a book), and it still feels awkward. On the other hand, there is a great sense of accomplishment when you see how things just click together and all that hard work is finally real in a way that no code review or artificial scenario can replicate.
Here is what I have planned so far for the book. Your comments are welcome as always.
One of the major challenges in writing this book came in considering how to structure it. There are so many concepts that relate to one another that it can be difficult to try to understand them in isolation. We can't talk about modeling documents before we understand the kind of features that we have available for us to work with, for example. Considering this, I'm going to introduce concepts in stages.
Part I - The basics of RavenDB
Focus: Developers
This part contains a practical discussion on how to build an application using RavenDB, and we'll skip theory and concepts in favor of getting things done. This is what you'll want new hires to read before starting to work with an application using RavenDB, we'll keep the theory and the background information for the next part.
- Chapter 2 - Zero to RavenDB - focuses on setting you up with a RavenDB instance, introduce the studio and some key concepts and walk you through the Hello World equivalent of using RavenDB by building a very simple To Do app.
- Chapter 3 - CRUD operations - discusses RavenDB the basics of connecting from your application to RavenDB and the basics that you need to know to do CRUD properly. Entities, documents, attachments, collections and queries.
- Chapter 4 - The Client API - explores more advanced client scenarios, such as lazy requests, patching, bulk inserts, and streaming queries and using persistent subscriptions. We'll talk a bit about data modeling and working with embedded and related documents.
Part II - Ravens everywhere
Focus: Architects
This part focuses on the theory of building robust and high performance systems using RavenDB. We'll go directly to working with a cluster of RavenDB nodes on commodity hardware, discuss distribution of data and work across the cluster and how to best structure your systems to take advantage of what RavenDB brings to the table.
- Chapter 5 - Clustering Setup - walks through the steps to bring up a cluster of RavenDB nodes and working with a clustered database. This will also discuss the high availability and load balancing features in RavenDB.
- Chapter 6 - Clustering Deep Dive - takes you through the RavenDB clustering behavior, how it works and how the both servers & clients are working together to give you a seamless distributed experience. We'll also discuss error handling and recovery in a clustered environment.
- Chapter 7 - Integrating with the Outside World - explores using RavenDB along side additional systems, for integrating with legacy systems, working with dedicated reporting databases, ETL process, long running background tasks and in general how to make RavenDB fit better inside your environment.
- Chapter 8 - Clusters Topologies - guides you through setting up several different clustering topologies and their pros and cons. This is intend to serve as a set of blueprints for architects to start from when they begin building a system.
Part III - Indexing
Focus: Developers, Architects
This part discuss how RavenDB index data to allow for quick retrieval of information, whatever it is a single document or aggregated data spanning years. We'll cover all the different indexing methods in RavenDB and how you can should use each of them in your systems to implement the features you want.
- Chapter 9 - Simple Indexes - introduces indexes and their usage in RavenDB. Even though we have performed queries and worked with the data, we haven't actually dealt with indexes directly so far. Now is the time to lift the curtain and see how RavenDB is searching for information and what it means for your applications.
- Chapter 11 - Full Text Search - takes a step beyond just querying the raw data and shows you how you can search your entities using powerful full text queries. We'll talk about the full text search options RavenDB provides, using analyzers to customize this for different usages and languages.
- Chapter 13 - Complex indexes - goes beyond simple indexes and shows us how we can query over multiple collections at the same time. We will also see how we can piece together data at indexing time from related documents and have RavenDB keep the index consistent for us.
- Chapter 13 - Map/Reduce - gets into data aggregation and how using Map/Reduce indexes allows you to get instant results over very large data sets with very little cost. Making reporting style queries cheap and accessible at any scale. Beyond simple aggregation, Map/Reduce in RavenDB also allows you to reshape the data coming from multiple source into a single whole, regardless of complexity.
- Chapter 14 - Facet and Dynamic Aggregation - steps beyond static aggregation provided by Map/Reduce and give you the ability to run dynamic aggregation queries on your data, or just facet your search results to make it easier for the user to find what they are looking for.
- Chapter 15 - Artificial Documents and Recursive Map/Reduce - guides you through using indexes to generate documents, instead of the other way around, and then use that both for normal operations and to support recursive Map/Reduce and even more complex reporting scenarios.
- Chapter 16 - The Query Optimizier - takes you into the RavenDB query optimizer, index management and how RavenDB is treating indexes from the inside out. We'll see the kind of machinery that is running behind the scenes to get everything going so when you make a query, the results are there at once.
- Chapter 17 - RavenDB Lucene Usage - goes into (quite gory) details about how RavenDB is making use of Lucene to implement its indexes. This is intended mostly for people who need to know what exactly is going on and how does everything fit together. This is how the sausage is made.
- Chapter 18 - Advanced Indexing Techniques - dig into some specific usages of indexes that are a bit... outside the norm. Using spatial queries to find geographical information, generating dynamic suggestions on the fly, returning highlighted results for full text search queries. All the things that you would use once in a blue moon, but when you need them you really need them.
Part IV - Operations
Focus: Operations
This part deals with running and supporting a RavenDB cluster or clusters in production. From how you spina new cluster to decommissioning a downed node to tracking down performance problems. We'll learn all that you need (and then a bit more) to understand what is going on with RavenDB and how to customize its behavior to fit your own environment.
- Chapter 19 - Deploying to Production - guides you through several production deployment options, including all the gory details about how to spin up a cluster and keep it healthy and happy. We'll discuss deploying to anything from a container swarm to bare metal, the networking requirements and configuration you need, security implications and anything else that the operation teams will need to comfortably support a RavenDB cluster in that hard land called production.
- Chapter 20 - Security - focuses solely on security. How you can control who can access which database, running an encrypted database for highly sensitive information and securing a RavenDB instance that is exposed to the wild world.
- Chapter 21 - High Availability - brings failure to the table, repeatedly. We'll discuss how RavenDB handles failures in production, how to understand, predict and support RavenDB in keeping all of your databases accessible and high performance in the face of various errors and disasters.
- Chapter 22 - Recovering from Disasters - covers what happens after disaster strikes. When machines melt down and go poof, or someone issues the wrong command and the data just went into the incinerator. Yep, this is where we talk about backups and restore and all the various reasons why operations consider them sacred.
- Chapter 23 - Monitoring - covers how to monitor and support a RavenDB cluster in production. We'll see how RavenDB externalize its own internal state and behavior for the admins to look at and how to make sense out of all of this information.
- Chapter 24 - Tracking Performance - gets into why a particular query or a node isn't performing up to spec. We'll discuss how one would track down such an issue and find the root cause responsible for such a behavior, a few common reasons why such things happen and how to avoid or resolve them.
Part V - Implementation Details
Focus: RavenDB Core Team, RavenDB Support Engineers, Developers who wants to read the code
This part is the orientation guide that we throw at new hires when we sit them in front of the code. It is full of implementation details and background information that you probably don't need if all you want to know is how to build an awesome system on top of RavenDB.
On the other hand, if you want to go through the code and understand why RavenDB is doing something in a particular way, this part will likely answer all those questions.
- Chapter 25 - The Blittable Format - gets into the details of how RavenDB represents JSON documents internally, how we go to this particular format and how to work with it.
- Chapter 26 - The Voron Storage Engine - breaks down the low-level storage engine we use to put bits on the disk. We'll walk through how it works, the various features it offers and most importantly, why it had ended up in this way. A lot of the discussion is going to be driven by performance consideration, extremely low-level and full of operating system and hardware minutiae.
- Chapter 27 - The Transaction Merger - builds on top of Voron and comprise one of the major ways in which RavenDB is able to provide high performance. We'll discuss how it came about, how it is actually used and what it means in terms of actual code using it.
- Chapter 28 - The Rachis Consensus - talks about how RavenDB is using the Raft consuensus protocol to connect together different nodes in the cluster, how they are interacting with each other and the internal details of how it all comes together (and fall apart and recover again).
- Chapter 31 - Cluster State Machine - brings the discussion one step higher by talking about how the RavenDB uses the result of the distributed consensus to actually manage all the nodes in the cluster and how we can arrive independently on each node to the same decision reliably.
- Chapter 30 - Lording over Databases - peeks inside a single node and explores how a database is managed inside that node. More importantly, how we are dealing with multiple databases on the same node and what kind of impact each database can have on its neighbors.
- Chapter 31 - Replication - dives into the details of how RavenDB manages multi master distributed database. We'll go over change vectors to ensure conflict detection (and aid in its resolution) how the data is actually being replicated between the different nodes in a database group.
- Chapter 32 - Internal Architecture - gives you the overall view of the internal architecture of RavenDB. How it is built from the inside, and the reasoning why the pieces came together in the way they did. We'll cover both high-level architecture concepts and micro architecture of the common building blocks in the project.
Part VI - Parting
This part summarizes the entire book and provide some insight about what our future vision for RavenDB is.
- Chapter 33 - What comes next - discusses what are our (rough) plans for the next major version and our basic roadmap for RavenDB.
- Chapter 34 - Are we there yet? Yes! - summarize the book and let you go and start actually using all of this awesome information.
Comments
Enjoyed the previous book a lot and believe that the new one will also be fantastic. Do we need to expect less number of blog posts for next several months?))
Will it be published? Early access?
Do you have any information on the book that was being written by Itamar Syn-Hershko, "RavenDB In Action"? I purchased early access to that book a couple years ago and it appears to be making no progress at all. The last update was in early 2015!
eqr, We intend to publish it, and we'll have early access, yes. I will try to keep it on a post a day as usual during the same time.
Tom, You need to talk to Manning about that, I'm afraid.
Oren, I figured that was the case. Thanks for responding at least.
Testing testing testing. How to use it in a number of scenarios (like an simple MVC web app or azure function or something) and how using an InMemory RavenDb vs hitting a real one, etc.
ref: RavenTestBase
-PK- :)
Pure Krome, Good point, I'll add that.
bought!
you are focusing a lot on features from RavenDB, which is obviously very nice, since it's such an awesome product! using document databases implies a lot of non relational-db know, like how to model things, real world things! (not the blog post example). would' love to see a chapter for the "modelling" topic. sometimes we duplicate data, how can we keep this data in sync? when is a fanout index wrong? when is it ok? what about include? and yes, RavenDB doesn't have an explicit schema, but an implicit one! so we need to do migrations. again different strategies.
more on migration (but offtopic): i'd love to get something more official than RavenMigrations from github. You guys really take care about all the ops & maintenance which is very commendable. the migration topic is somehow lacking behind. Also the RavenMigraiton is based on Streams which won't work if you try to migrate more than 300k documents, because the stream keeps a transaction open which explodes. there are some workarounds, but again: would love to see more of a official solution (300k is really not a number in RavenDB terms). maybe in 4.0 there is a solution to stream over a complete collection and do some migration work?
back to the book: i'd also love to see more details for the authentication bundle. the documentation we get right now is very short. this is something i see a lot in our enterprise project where we start to use RavenDB more and more.
Tobias, I'm certainly going to spend at least a chapter on modeling. The CRUD operations chapter is going to be talking a lot about that.
The problem is that you have to provide some information about RavenDB first, then talk about modeling in the context of RavenDB. Maybe I'll push deeper discussion on modeling toward the end of the book.
Migrations are best handled via PatchByIndex operation, and that is the easiest way to handle it.
Are you talking about authorization or authentication? We'll certainly cover the later, but authorization is a really complex topic, and we are looking into whatever we can do something meaningfully different in 4.0 or if we should leave it.
Hello! Would be great to have a raven book from an author. But for now, I am very interested in "Chapter 15 - Artificial Documents and Recursive Map/Reduce". I couldn't find documentation for v4, think also is being written. I will now download alpha version, and try to figure it out my self, and I will be very glad if you could navigate me.
Merdan, The book is still currently being written :-)
For 3.5, you can look at Scripted Index Results to get similar behavior.
Comment preview