ChallengeQuerying relative information with RavenDB
Originally posted at 11/30/2010
Let us say that we have the following document:
{ "Name": "Ayende", "LastScore": 239.2, "MaxScore": 392.6 }
And we want to find all documents whose last score is within 10% from their max score.
If I was using SQL, I could express the query like this:
select * from GameResults where (LastScore + MaxScore/100 )> MaxScore
How would you create such a query using RavenDB?
- RavenDB has the notion of linear queries, which would allow executing the above query, but I don’t want to use that.
- RavenDB’ indexes don’t support any way to compare two fields to one another.
There is a solution, though, and it is quite an elegant one.
More posts in "Challenge" series:
- (03 Feb 2025) Giving file system developer ulcer
- (20 Jan 2025) What does this code do?
- (01 Jul 2024) Efficient snapshotable state
- (13 Oct 2023) Fastest node selection metastable error state–answer
- (12 Oct 2023) Fastest node selection metastable error state
- (19 Sep 2023) Spot the bug
- (04 Jan 2023) what does this code print?
- (14 Dec 2022) What does this code print?
- (01 Jul 2022) Find the stack smash bug… – answer
- (30 Jun 2022) Find the stack smash bug…
- (03 Jun 2022) Spot the data corruption
- (06 May 2022) Spot the optimization–solution
- (05 May 2022) Spot the optimization
- (06 Apr 2022) Why is this code broken?
- (16 Dec 2021) Find the slow down–answer
- (15 Dec 2021) Find the slow down
- (03 Nov 2021) The code review bug that gives me nightmares–The fix
- (02 Nov 2021) The code review bug that gives me nightmares–the issue
- (01 Nov 2021) The code review bug that gives me nightmares
- (16 Jun 2021) Detecting livelihood in a distributed cluster
- (21 Apr 2020) Generate matching shard id–answer
- (20 Apr 2020) Generate matching shard id
- (02 Jan 2020) Spot the bug in the stream
- (28 Sep 2018) The loop that leaks–Answer
- (27 Sep 2018) The loop that leaks
- (03 Apr 2018) The invisible concurrency bug–Answer
- (02 Apr 2018) The invisible concurrency bug
- (31 Jan 2018) Find the bug in the fix–answer
- (30 Jan 2018) Find the bug in the fix
- (19 Jan 2017) What does this code do?
- (26 Jul 2016) The race condition in the TCP stack, answer
- (25 Jul 2016) The race condition in the TCP stack
- (28 Apr 2015) What is the meaning of this change?
- (26 Sep 2013) Spot the bug
- (27 May 2013) The problem of locking down tasks…
- (17 Oct 2011) Minimum number of round trips
- (23 Aug 2011) Recent Comments with Future Posts
- (02 Aug 2011) Modifying execution approaches
- (29 Apr 2011) Stop the leaks
- (23 Dec 2010) This code should never hit production
- (17 Dec 2010) Your own ThreadLocal
- (03 Dec 2010) Querying relative information with RavenDB
- (29 Jun 2010) Find the bug
- (23 Jun 2010) Dynamically dynamic
- (28 Apr 2010) What killed the application?
- (19 Mar 2010) What does this code do?
- (04 Mar 2010) Robust enumeration over external code
- (16 Feb 2010) Premature optimization, and all of that…
- (12 Feb 2010) Efficient querying
- (10 Feb 2010) Find the resource leak
- (21 Oct 2009) Can you spot the bug?
- (18 Oct 2009) Why is this wrong?
- (17 Oct 2009) Write the check in comment
- (15 Sep 2009) NH Prof Exporting Reports
- (02 Sep 2009) The lazy loaded inheritance many to one association OR/M conundrum
- (01 Sep 2009) Why isn’t select broken?
- (06 Aug 2009) Find the bug fixes
- (26 May 2009) Find the bug
- (14 May 2009) multi threaded test failure
- (11 May 2009) The regex that doesn’t match
- (24 Mar 2009) probability based selection
- (13 Mar 2009) C# Rewriting
- (18 Feb 2009) write a self extracting program
- (04 Sep 2008) Don't stop with the first DSL abstraction
- (02 Aug 2008) What is the problem?
- (28 Jul 2008) What does this code do?
- (26 Jul 2008) Find the bug fix
- (05 Jul 2008) Find the deadlock
- (03 Jul 2008) Find the bug
- (02 Jul 2008) What is wrong with this code
- (05 Jun 2008) why did the tests fail?
- (27 May 2008) Striving for better syntax
- (13 Apr 2008) calling generics without the generic type
- (12 Apr 2008) The directory tree
- (24 Mar 2008) Find the version
- (21 Jan 2008) Strongly typing weakly typed code
- (28 Jun 2007) Windsor Null Object Dependency Facility
Comments
I would create an index using that where clause and query the index.
The example sql given would only find those within 1% of the maxscore, I think, not the 10% mentioned in the paragraph. Sorry to nitpick.
Create a map of the percentage difference between the two fields and query based upon that.
Would this work?
select * from GameResults
where (LastScore + MaxScore/100 ) - MaxScore > 0
heh, did you write this after my question at the Vic.NET Usergroup meeting?
Can we create a field with the relation between LastScore an MaxScore in the index MAP function?
Example:
function map(document) {
}
}
So every time I want to write a query I have to add fields or add indexes? Ugh
If the field is pretty constant -- ie, 10% is a hard and fast rule, you could just add and persist a calculated field or boolean to say "this score was within last 10%"
I'm with Oded: create an index on the percentage calculation, and query that. I'm a little hazy on the nuances of indexes in RavenDB, but I'm thinking something like "from g in GameResults where (g.LastScore + g.MaxScore/10) - g.MaxScore > 0 select g". Then you could just enumerate the all items in the index.
Alternatively, if RavenDB supports it efficiently, you could do a more general index like "from g in GameResults select new { Name=g.Name, Metric = (g.LastScore + g.MaxScore/10) - g.MaxScore }", and then just grab items from the index where the value of Metric is greater than zero.
None of these answers can be right. What if the user types the % value, and they could put anything between say 1 and 50? You going to create 50 indexes or create 50 new fields!
Remember that Ayende said it was an ELEGANT solution, so I don't think it's going to involve adding fields or indexes.
Los Guapos, what's wrong with calculating LastScore/MaxScore? You'll be able to query for any result between 0 and 100%. Juan Perez has shown it already.
Maybe we should add a next task to a competition: Please calculate 80th percentile of MaxScore for the whole document set (or find me a person with MaxScore better than 80% of population and worse than the remaining 20%). I wonder what's the simplest way of calculating such statistic...
You write a distributed map/reduce function in C#/linq?
You write a distributed map/reduce function in Erlang? :)
In SQL you would probably write somethling like:
select * from GameResults
where (LastScore - MaxScore*0.90 )> 0;
What happen?
so, is solution going to be posted?..
Torvin, see the blog post Querying relative information with RavenDB.
ah, thanks
Comment preview