Lucene as a data repository

time to read 3 min | 432 words

The issue of user driven entity extensibility came up in the castle users mailing list, and a very interesting discussion has started. The underlying problem is fairly well known, we want to allow the extensibility of the schema by our end users.

The scenarios that I usually think of are about extending the static schema of the application. Like adding a CustomerExternalNumber field to the Customer entity, or adding MyOwnEntity custom entity. This can be solved in a number of ways, from meta tables a schema that looks like this:

image

I am usually suspicious of such methods, and would generally prefer to go with the option of simply extending the schema at runtime by adding additional tables for the use extensions.

image The issue that came up in the list was quite different, the need was to extend each entity instance. Let us take bug tracking for instance. We need to allow the user to add different fields per each bugs. Then we need to allow to search on those extra fields, and each user can define their own fields.

Lucene came up as a way to store those extra fields, and then I had a light bulb moment. Lucene, by its nature, is a good place to store semi structure data. The basic unit of storage in Lucene is the Document. And a document is compromised of a set of fields, which can be indexed, stored or both. Hibernate Search (and NHibernate Search) uses this ability to allow us to store entity information in Lucene, which mean that we can retrieve information directly from Lucene, hitting the DB only for the missing information.

Extending this idea to also allow extra information in the Lucene store is a fairly natural extension, and extremely interesting to me. It means that I can give my users what they want (full extensibility) while keeping things very simple & clean from my point of view. Searching is built in, and easy enough that you can give the users the ability to do direct queries against that. In fact, you can even use NHibernate Search to allow even better scaling of the searching capabilities.

Reporting is also easy enough, you pull the data out, and into your entities, and report off of that, but if you want to do something more generic, it is very easy to build a Lucene query to a DataSet, which you can then hand to the reporting engine.

Exciting idea.