RavenDB and complex tagging
In the RavenDB mailing list, we got a question about tagging. In this case, the application need:
1. Tags have identity ("set" has a different meaning if I'm talking math, music or sports).
2. I want to know who tagged what and when.
2. I want to do this once, as a service, so i don’t need have ids in each document i want to tag. In my app, there are many such document types.
Let us see how we can approach this in RavenDB. We are going to do it like so:
Note that because tags have identity, we store only the tag id inside the tagged object, along with the required information about who & when it was tagged.
Now, let us try to have some fun with this. Let us say that I want to be able to show, given a specific album, all the albums that have any of the same tags as the specified album.
We start by defining the following index:
Note that the naming convention matches what we would expect using the default Linq convention, so we can easily query this index using Linq.
And now we want to query it, which, assuming that we are starting with albums/1, will look like:
This translate to “show me all of the albums that share any of the specified tags, except albums/1”.
And this is pretty much it, to be fair. Oh, if you want to show the tags names you’ll have to include the actual tags documents, but there really isn’t anything complex going on.
But what about the 3rd requirement?
Well, it isn’t really meaningful. You can move this Tags collection to a the layer super type, but if you want to be able to do nice tagging with RavenDB, this is probably the easiest way to go.
Comments
I thought you don't like when document child elements have ids...
What if 5000 users add 5 tags to a single doc. Isn't the album doc becoming pretty large (and slow to load/deserialize) with a Tags array of 25000 items then?
@alexidsa - the reason ids are needed is that tag names can be ambiguous. If i add the tag "rock", am i talking about geology or music. Imagine you're a newspaper, with multiple sections (music, outdoors, etc). Searching for "rock" without context can return incorrect results.
The ids in my case point to a Term which has a name, like "rock", but also a Vocabulary ("sports", "entertainment", or whatever) which helps to provide context and disambiguate the tag name.
@henry - definitely a consideration. In my case, i don't anticipate there being more than 5 per document.
But you are correct in that this can drive the decision of how documents are structured. For instance, i have a few document types which can be voted on, and there is a much higher probability that the number of votes could exceed the amount i'd like to keep in a single document. So voting information is split into a separate document.
alexidsa, That is a reference to another document, not an internal id.
But with such design you have to perform a join/distinct operation to display tags for an album. Imho it would be better to modify the structure of the document like:
Tags: [ { Id: 'tags/1', Name: 'Tag 1', Tagged: [{By: 'user/1', When: '2010-01-01'}, ...]}, (... and so on) ]
Rafal, I won't have to do ANY join/distinct operation. Those are relational concepts. I can include the related docs, and that is a cheap operation
That's nice - I thought the include function works only for single doc references.
In this scheme, what is in the Tag document? Does it have a "purpose" here? Why not just store the tag name in the album instead of the id and remove the need for the tag document? I am guessing it has something to do with the first requirement, but I am not getting it.
Chanan, For example, it might have fields like: "Name", "Description", "IsAdult", etc. Users might want to follow a tag, and you want to track how many are doing so. It is an actual entity in the system
I would denormalize the label here as well. A tags label is something that almost never changes and when it does, it is ok to have a batch operation that goes through all the albums and updates the label accordingly.
Is it also possible to return the count of matching tags from the index?
John, You already have the list of tags, just count them.
Comment preview