Modeling temporal data with RavenDB
We got an interesting modeling question from a customer: “What is the optimal manner to represent time sensitive information in RavenDB?”
The initial thought was that they would use revisions and they asked about querying those. The issue is that this isn’t really the purpose for revisions, they are great if you want to see what the state of the system at a particular point in time, but not so good if your business logic has meaning over time.
The best scenario for temporal data that I’m aware of is payroll. You have a lot of data that make sense only in the context of the time if was relevant for. For example, consider an employee that is hired at a given salary level, then given raises over time. The data in this case is divided into several layers.
I’m going to use paper documents as the model here, because it makes it much easier to consider the modeling implications than when talking about JSON or class structures.
On the most basic model, we have the Payslip document, which represent the amount (work, deductions, taxes, etc) that was paid to an employee at a particular point in time. This is similar to this:
Once created, such a document will not change. It represent an action that happened in the past and is immutable. From this you can figure out taxes, overall payments, etc.
The Payslip is computed from the Timesheet document, which is similar to this one:
A Timehseet document is updated during the Payroll Period whenever an employee signs in or out. At the end of the Payroll Period, a manager will sign off on the Timesheet and approve it for payment. At this point, all the relevant business rules are run and the final Payslip is generated for each employee. Once the Timesheet is signed off and paid, it is no longer mutable and will not change. This make sense, since it represent something that already happened.
In some cases, you’ll have new information, such as an employee that worked, but didn’t report their hours. They will need to do so in a new Timesheet and a new Payslip will be generated.
Using the real world analogy, the Timesheet document is stored at the head office, and you cannot go and update that once it was submitted.
So far, we haven’t seen anything related to things that change over time. In fact, the fact that we have separate documents for Payslips and Timesheets means that we can ignore a lot of the complexity that you’ll usually have to deal with in temporal databases.
We can’t completely get away from it, however. We need to consider the employee’s Contract, however. Usually when we think about the employment contract we think about something like this:
The contract specify details such as the hourly rate, overtime payment, vacation time, etc. In payroll systems, contracts are actually more complex than that, because you have to take into account that they change.
For example, consider the following scenario:
- 1996 – Hired as mailroom clerk – 4.75$ / hour
- 1998 – Promoted to junior clerk – 5.25$ / hour
- 1999 – Promoted to clerk – 5.40$ / hour
- 2002 – Promoted to senior clerk – 6.20$ / hour
How do you handle something like that, in terms of modeling?
The answer depends quite heavily on how your business logic handles this. One way to handle this is to create revisions. Using the real world logic, we are talking about signing a new contract and expiring the old one. But in reality, that isn’t how things are done. You’ll usually just update the payment terms.
How does this looks like in terms of the data modeling when using RavenDB,however? Well, there are two options. We can represent the data as simple values, like so:
When the data changes, we update those values (which generates revisions for the old data). However, that isn’t usually ideal, because business logic usually want to access the past values. For example, your contract may change mid payroll period, so your hourly rate is different depending if the hours worked on Monday or Thursday.
In this case, you’ll want to represent the values changing in the model directly, like so:
In most cases, this is the best option for modeling data where the temporal aspects of the data needs to be directly exposed to the business logic.
Comments
So what's the best way given that structure to turn the pay data over time of all Sales Representatives currently working for the company?
I.e. I want to see all of the records where the contract Title at the time was "Sales Representative" inclusive if they got a raise without a title change. I don't care about the contract itself or presumably the Employee, just the data so that I can analyze trends on the subset of the document. If I do this with a standard select I'm going to get an Enumerable of an Enumerable and have to flatten it in memory.
James,
If you want to get the pay for a particular role, you can do that using something like:
from Payslips group by Date.Year, Date.Month, Role select sum(Amount), count(), Date.Year, Date.Month, Role
And that would give it to you.
Note that in this model, each Payslip is independent document for a particular payroll period.
Not the pay nor the aggregate. Consider a chart that shows the change in pay rate over time for all sales representatives on average at any given point AND shows the payrate of a given employee over time.
I need a dataset that looks like this:
[ { "HourlyRate": 5.23 },
<more> ]
This would be traditionally be done in Linq with SelectMany.
James,
In RavenDB, each document can modify the output it has directly, so a call like SelectMany will work, in a case like this:
{ "Name": "Joe", "Contracts": [ { "Name": "Nurse", "Rates": [ { "Type": "Regular", "Hourly": 10.0 }, { "Type": "Overnight", "Hourly": 15.0 } ] }, { "Name": "Midwife", "Rates": [ { "Type": "Regular", "Hourly": 12.0 }, { "Type": "Overnight", "Hourly": 17.5 } ] }, } }
You can issue a query like:
session.Query<Employees>().SelectMany(x=>x.Contracts.SelectMany(x=>x.Rates));
In which case you'll get a flat list of all the rates for that particular document.
Note that this happens on a per document basis. There is no way to process the entire result set across documents.
On the other hand, this is trivial to do client side, since you have all the information.
Some time ago I wrote how one could model bitemporality n a graph database. If I remember correctly, RavenDB got some graph features along the way?
Comment preview