Graphs in RavenDBThe query language
Pretty much all our early discussions about graphs in RavenDB focused on how to build the actual graph implementation. How to allow fast traversal, etc. When we started looking at the actual implementation, we realized that we seriously neglected a very important piece of the puzzle, the query interface for the graphs.
This is important for several reasons. First, ergonomics matter, if we end up with a query language that is awkward, it won’t see much use and complicate the users’ lives (and our own). Second, the query language effectively dictate how the user think about the model, so making low level decisions that would have impact on how the user is actually using this feature is probably not a good idea yet. We need to start from the top, what do we give to the user, and then see how we can make that a reality.
The most common use case of graph queries is the friends of friends query. Let’s see how this query is handled in various existing implementation, shall we?
Neo4J, using Cypher:
OrientDB doesn’t seem to have an easy way to do this. The following shows how you can find the 2nd degree friends, but it doesn’t exclude friends of friends who are already your friends. StackOverflow questions on that show scary amount of code, so I’m going to skip them.
Gremlin, which is used in a wide variety of databases:
We looked at other options, but it seems that graph query languages fall into the following broad categories:
- ASCII art to express the relationship between the nodes.
- SQL extensions that express the relationships as nested queries.
- Method calls to express the traversal.
Of the three options, we found the first option, using ASCII Art / Cypher as the easier one to work with. This is true both in terms of writing the query and actually executing it.
Let’s look at how friends of friends query will look like in RavenDB:
Graph queries are composed of two portions:
- With clauses, which determine source point for the graph traversal.
- Match clause (singular) that contain the graph pattern that we need to match on.
In the case, above, we are starting the graph traversal from start, this is defined as a with clause. A query can have multiple with clauses, each defining an alias that can be used in the match clause. The match clause, on the other hand, uses these aliases to decide how to process the query.
You can see that we have two clauses in the above query, and the actual processing is done by pattern matching (to me, it make sense to compare it to regular expressions or Prolog). It would probably be easier to show this with an example. Here is the relationship graphs among a few people:
We’ll set the starting point of the graph as Arava and see how this will be processed in the query.
For the first clause, we’ll have:
- start (Arava) –> f1 (Oscar) –> f2 (Phoebe)
- start (Arava) –> f1 (Oscar) –> f2 (Sunny)
- start (Arava) –> f1 (Sunny) –> f2 (Phoebe)
- start (Arava) –> f1 (Sunny) –> f2 (Oscar)
For the second clause, of the other hand, have:
- start (Arava) –> f2 (Oscar)
- start (Arava) –> f2 (Sunny)
These clauses are joined using and not operator. What this means is that we need to exclude from the first clause anything that matches on the second cluase. Match, in this case, means the same alias and value for any existing alias.
Here is what we need up with:
- start (Arava) –> f1 (Oscar) –> f2 (Phoebe)
start (Arava) –> f1 (Oscar) –> f2 (Sunny)- start (Arava) –> f1 (Sunny) –> f2 (Phoebe)
start (Arava) –> f1 (Sunny) –> f2 (Oscar)
We removed two entries, because they matched the entries from the second clause. The end result being just friends of my friends who aren’t my friends.
The idea with behind the query language is that we want to be high level and allow you to express what you want, and we’ll be in charge of actually making this work properly.
In the next post, I’ll talk a bit more about the query language, what scenarios it enables and how we are going to go about processing queries.
More posts in "Graphs in RavenDB" series:
- (08 Nov 2018) Real world use cases
- (01 Nov 2018) Recursive queries
- (31 Oct 2018) Inconsistency abhorrence
- (29 Oct 2018) Selecting the syntax
- (26 Oct 2018) What’s the role of the middle man?
- (25 Oct 2018) I didn’t mean to build this feature!
- (22 Oct 2018) Query results
- (21 Sep 2018) Graph modeling vs. document modeling
- (20 Sep 2018) Pre-processing the queries
- (19 Sep 2018) The query language
- (18 Sep 2018) The overall design
Comments
I really like this syntax, it makes the queries really clear.
Just wondering though, shouldn't the first clause in your example also pick up:
SQL is like wine - gains taste, elegance and quality just by sitting there and doing nothing
Do you really need all that syntax?
For example, couldn't you change from:
to something simpler, like:
Or, if queries that repeatedly use the same edge kind are common:
Though I understand that basing your language on an existing language has its advantages too.
Is the second naming of f2 required or could it just have been f3? (I know it's a copy from Cypher but just wondering if it needs to be the same) Orient and Gremlin have an explicit dedup/distinct, will this be needed in RavenDB as well or is it implicitly distinct?
Daniel, The query wouldn't go back to a node that it already visited
Svick, I probably could, yes. However, note that this is the simplest query possible, a lot more complexity isn't discussed yet, so you need to account for that as well. And using something that builds on existing stuff means that people in the field can much more easily grok it.
Steve, You need
f2
to be the same, otherwise, we won't know how to filter the already existing friends.I thought that might be the case.
Assuming a model like Twitter where follows is directional, is it still possible with this query language to write a query like find everyone I follow that follows me back?
Daniel, You want something like, find all the users that follow me that I also follow? You can do that using:
match (me:Account)-[:Follows]->(other:Account)-[:Follows]->(me)
Awesome.
The default restriction that
f2 != start
makes perfect sense now.@Rafal, no doubt tongue-in-cheek, but still SQL is still fermenting it seems, nowadays supports graph queries, e.g.:
@peter then SQL-92 is the standard :) and it already has the syntax for joins to do graph traversal. I realize its a language for processing particular model of data, not applicable to every graph database, but still it's built on top of well defined abstractions and syntax reflects that in a clear way. And here we're talking about some ad-hoc ASCII art for expressing some connections between data, but without building the 'algebra' out of it. So pretty soon you will have to add new symbols to the language because it's so simple. For example, how do you express that some node is reachable/unreachable from another node in any number of steps? or there is a cycle?
Rafal, Actually, you pretty much have answers (and syntax) for all of these in languages like Cypher. Providing a range of steps is something like:
(user)-[:FriendOf {*..2}]->(friend)
Graph problems are a very well studied field.
Rafal, Actually, you pretty much have answers (and syntax) for all of these in languages like Cypher. Providing a range of steps is something like:
(user)-[:FriendOf {*..2}]->(friend)
Graph problems are a very well studied field.
Comment preview