What is new in RavenDB 3.0Query diagnostics
	 We talked a lot about the changes we made for indexing, now let us talk about the kind of changes we are talking about from the query side of things. More precisely, this is when we start asking questions about our queries.
We talked a lot about the changes we made for indexing, now let us talk about the kind of changes we are talking about from the query side of things. More precisely, this is when we start asking questions about our queries.
Timing queries. While it is rare that we have slow queries in RavenDB, it does happen, and when it does, we treat it very seriously. However, in the last few cases that we have seen, the actual problem wasn’t with RavenDB, it was with sending the data back to the client when we had a large result set and large number of documents.
In RavenDB 3.0, we have added the ability to get detailed statistics about what is the cost of the query in every stage of the pipeline.
RavenQueryStatistics stats; var users = session.Query<Order>("Orders/Totals") .Statistics(out stats) .Customize(x => x.ShowTimings()) .Where(x=>x.Company == "companies/11" || x.Employee == "employees/2") .ToList(); foreach (var kvp in stats.TimingsInMilliseconds) { Console.WriteLine(kvp.Key + ": " + kvp.Value); } Console.WriteLine("Total: " + stats.DurationMilliseconds);
We can now ask RavenDB to explain us its reasoning when doing so:
- Lucene search: 10
- Loading documents: 2
- Transforming results: 0
- Total: 21
As you can see, the total time for this query is 21 ms, and we have 12 ms accounted for in the actual search time. The rest is network traffic. This can help you diagnose more easily where the problem is, and hence, how to solve it.
Query timeout and cancellation. As I mentioned, we don’t really have long queries in RavenDB very often. But that is actually is something that happens, and we need a way to deal with that. RavenDB now places a timeout on the amount of time a query gets to run (including querying Lucene, loading documents or transforming the results). A query that doesn’t complete in time will be cancelled, and an error will be returned to the user.
You can also view the currently executing queries and kill a long running query (if you have specified a high timeout, for example).
Explaining queries. Sometimes it is easy to understand why RavenDB has decided to give you documents in a certain order. You asked them sorted by date, and you get them sorted by date. But when you are talking about complex queries, that is much harder. RavenDB will sort the results by default based on relevancy, and that can sometimes be a bit puzzling to understand.
Here is how we can do this:
session.Advanced.DocumentQuery<Order>("Orders/Totals") .Statistics(out stats) .WhereEquals("Company", "companies/11") .WhereEquals("Employee", "employees/3") .ExplainScores() .ToList(); var explanation = stats.ScoreExplantaions["orders/759"];
The result of this would be something that looks like this:
0.6807194 = (MATCH) product of: 1.361439 = (MATCH) sum of: 1.361439 = (MATCH) weight(Employee:employees/3 in 469), product of: 0.4744689 = queryWeight(Employee:employees/3), product of: 2.869395 = idf(docFreq=127, maxDocs=830) 0.165355 = queryNorm 2.869395 = (MATCH) fieldWeight(Employee:employees/3 in 469), product of: 1 = tf(termFreq(Employee:employees/3)=1) 2.869395 = idf(docFreq=127, maxDocs=830) 1 = fieldNorm(field=Employee, doc=469) 0.5 = coord(1/2)
And if we were to ask for the explanation for orders/237, we will get:
6.047595 = (MATCH) sum of: 4.686156 = (MATCH) weight(Company:companies/11 in 236), product of: 0.8802723 = queryWeight(Company:companies/11), product of: 5.32353 = idf(docFreq=10, maxDocs=830) 0.165355 = queryNorm 5.32353 = (MATCH) fieldWeight(Company:companies/11 in 236), product of: 1 = tf(termFreq(Company:companies/11)=1) 5.32353 = idf(docFreq=10, maxDocs=830) 1 = fieldNorm(field=Company, doc=236) 1.361439 = (MATCH) weight(Employee:employees/3 in 236), product of: 0.4744689 = queryWeight(Employee:employees/3), product of: 2.869395 = idf(docFreq=127, maxDocs=830) 0.165355 = queryNorm 2.869395 = (MATCH) fieldWeight(Employee:employees/3 in 236), product of: 1 = tf(termFreq(Employee:employees/3)=1) 2.869395 = idf(docFreq=127, maxDocs=830) 1 = fieldNorm(field=Employee, doc=236)
In other words, we can see that orders/237 is ranked much higher than orders/759. That is because is matched both clauses of the query. And a match on Company is a much stronger indication for relevancy because Companies/11 appears only in 10 documents out out 830, while Employees/3 appears in 127 out of 830.
For details about this format, see this presentation, it actually talks about Solr here, but this data comes from Lucene, so it applies to both.
That is it about queries diagnostics, next, we’ll deal with transformers and another important optimization, the staleness reduction system.
More posts in "What is new in RavenDB 3.0" series:
- (24 Sep 2014) Meta discussion
- (23 Sep 2014) Operations–Optimizations
- (22 Sep 2014) Operations–the nitty gritty details
- (22 Sep 2014) Operations–production view
- (19 Sep 2014) Operations–the pretty pictures tour
- (19 Sep 2014) SQL Replication
- (18 Sep 2014) Queries improvements
- (17 Sep 2014) Query diagnostics
- (17 Sep 2014) Indexing enhancements
- (16 Sep 2014) Indexing backend
- (15 Sep 2014) Simplicity
- (15 Sep 2014) JVM Client API
- (12 Sep 2014) Client side
- (11 Sep 2014) The studio
- (11 Sep 2014) RavenFS
- (10 Sep 2014) Voron
 

Comments
when is 3 out? :P
Milke, We are going to go to RC tomorrow :-)
Is there a reason why network isn't included in stats.TimingsInMilliseconds? Feels like it would be more intuitive if all of stats.DurationMilliseconds was accounted for in stats.TimingsInMilliseconds.
Jonas, Because the value is computed server side, so we don't know what the client will see
I like the trick you pulled with
I had to think for a moment how you made an eager API structure work for a lazy unfinished query.
Ok, but couldn't TimingsInMilliseconds be updated at the same time as DurationMilliseconds is calculated (because the later is calculated client side if I understand correctly)?
Yes, but that would require us to do more work, because we also need to handle that for lazy & async stuff. We haven't gotten around to it
What else could the difference between the total and DurationMilliseconds be other than network?
Andrew, The most common reason would be debugging.
Ahah okay. Thanks. Eagerly awaiting the RC. I am going to work on a personal project integrating this with the new ASP.Net Identity in ASP.Net vNext
There's probably a typo in "Explaining queries" example code:
"var explanation = stats.ScoreExplantaions["orders/759"];"
Should probably be ScoreExplantaions. Or is the property name also misspelled? ;)
ScoreExplanations, I meant. Easy to misspell.
Comment preview