Time transitions should be explicit
Let us talk about time for a second, okay? We deal with in just about every application we write, but we treat it quite dismissively. But let me give an example first. We need to build a notification system, the system is based on timed notifications that should be displayed in a web page.
Thinking about it, I came up with the following design:
And this query:
SELECT TOP 3 Id, PublishAt, Title, Content FROM Notifications WHERE PublishAt > GETDATE() ORDER BY PublishAt DESC
That seems to satisfy the requirements, it is simple and it works. Done.
Not quite, this system design suffer from a pretty important problem, the time transitions are implicit. But why is that important?
Because the state transition from waiting-to-be-published and published is a meaningful transition in our domain. As a simple example, I can’t post a notification to Twitter when a notification is published, simply because I have absolutely no idea when that is going to happen. In many real applications, silent state transitions are going to lead to a lot of hacks. Likely something like adding WasPublished flag that we can check and then do some action if we get a notification that wasn’t published yet.
A much better plan is to model things so that time is an explicit state transition, instead of just checking for PublishAt, we will check the IsPublished flag, and we have a background process that will check for the PublishAt and the current date and explicitly set the IsPublished flag. That is also the place where we will place logic relating to the state transition. It also means that we aren’t depending on a side affect (someone viewing the page to cause the publication process) to make something important happen in our application.
You might have noticed a theme here, I like making things explicit, it means that it is easier to handle them.
Comments
"we will check was IsPublished" doesn't seem like a logical sentence. And I also don't really get the context of the story. Is the website gathering the top 3 of items it should show? Than what does the background process do?
In code I do the same. I always specify the visiblity of a member explicit. It makes reading the code much easier. In Resharper I disabled the 'redudant qualifier' rule. It makes reading back the code much easier after a year. However I usually don't work with IsXXX field in my databases. I often need more states than just true or false.
For example we have an mail system that send out personalized mail based on a query that define the template fields. Let's say the appliction database needed to execute the mail query is down (someone stripped over a network cable or the database server needs a reboot after installing updates). If I keep the state IsSend false, than I have to try every message at every run. If I give it a more specified state (ApplicationDatabaseUnreachable) I can skip the message until anoter message from the same application was send succesfully or the (error) state is older than 30 minutes.
These states are defined in enumerations which makes it also easier to read code. Readable code really brings back the number of bugs.
Errr, just by looking at the sql query I can't see if the system has any problem or not. If the query is used only for displaying some item it is perfectly OK. You can always add some background processing if you need to run some logic exactly at the publish date, but this query doesn't have to be changed at all.
Dennis,
Thanks for catching that, I edited the statement and now it should be clearer.
Rafal,
The problem is the implicit state transitions, not the query.
The query works, but it results in hacks elsewhere.
Ok, I just pointed out that you are talking about problems without showing them, the query you gave as an illustration has no problems at all. For example, if you used RSB and scheduled messages to handle time transitions explicitly, you wouldn't need IsPublished column at all - all information would be in a message.
Rafal,
Read the paragraph after: "But why is that important?", it explains the problem.
This is fine if your server and all your users are in a single timezone...
[)amien
IsPublished is a bit more like 'HasBeenProcessedByTwitterNotificationService' right? what if you had multiple independent services that wanted to process the items?
Would you just keep adding columns to the table? (ie, 'HasBeenProcessedByFacebookNotificationService') is there standard way to do this where the notification service remembers which items have been processed? such as remembering the last processed pk, or date of last processed item?
The post make sense. It took me a couple read but I like it. I think this is going back to the design phase of where to put your application logic and how to invoke it.
Rather than depend on the view to invoke our application logic it make sense that we are doing it ourselves. It is also much easier to scale which is vital in an enterprise application.
I don't see why you would have multiple service wanting to process the item? I thought the idea is that to have one service process the item so all the other services can use it. Once IsPublished is set all the other service can check that flag. It shouldn't matter which service set the flag in the first place.
One other thing to note is that if you live in a place where they recognize daylight savings time, your publication could trip twice since the time between 1 AM and 2 AM occurs twice in the fall. I've seen more than one person get tripped up by DST by having a nightly process scheduled between 1 and 2 on Sunday mornings.
There is another reason for explicitness. The ability to differ between published and not is a feature of your system. Making this implicit means that any maintainer can easily miss that this is actually a feature and might e.g. inadvertedly destroy it. Which, I admit, equates to making things simpler :)
Nice post, Ayende. Essentially, using PublishedAt in queries and reports could also be seen as duplicating business logic. And that logic keeps on keeping duplicated everytime a new query or report gets introduced, thus introducing maintainability problems later on.
But I see a likewise problem for other things than time related stuff. Many times certain actions on an entity are only possible when all kinds of conditions are met on that entity and/or related entities. I see that kind of logic being replicated in queries and reports, introducing the same maintainability problem when the logic changes even so slightly.
I am more and more leaning towards exposing those computed 'states' (can't think of a better word at the moment) in the database, instead of duplicating the logic. In essence something that CQRS seems to be advocating.
Ayende,
I think you have a typo in the WHERE clause:
WHERE PublishAt > GETDATE()
should probably be:
WHERE PublishAt < GETDATE()
We run into the same problem at work all the time and we usually deal with it in a similar way.
One of the biggest problems with having just a publishAt field is that if the time on the computer changes the action can be performed twice. This will also effectively happen at the point the system changes between daylight savings time and 'normal' time. Obviously this is more of a problem when the time you care about is the time on the user's computer rather than a server time.
Damien,
Presumably you use UTC for that. But that still doesn't help with the implicit state transitions
Stephen,
No, that is not the same.
IsPublished means that the background service run and EVERYTHING that was interested in this run.
Frank,
Great point, in 2 days a post about just that topic is going to show up.
Stan,
You see how dangerous this is :-)
I agree with the comments about state transitions.
But as far as talking about time you could take things so much further.
First things first here we aren't using UTC (which I know was already discussed), but in a lot of of time zones a given date/time combination can occur MULTIPLE times.
You may want something published at 2:30 AM when a timezone rolls its clock back. When should it be published? Just checking the time would of course be broken and one of the reason an explicit state transition is so important.
Don't even get me started on dates versus times.
I agree. When an entity's state depends on something external like the time, whether or not a 3rd party has responded to a message, or a user action elsewhere in the system then i'd rather have a background process make an explicit state change than check for those things in the entity itself.
I think the transition is good. But as John said, we could bring it further.
Comment preview