Queries++ in RavenDBI suggest you can do better
Sometimes, there is no escaping it, and you must accept some user’s input into your system. That sad state of affairs is problematic, because all users have one common tendencies, they are going to be entering weird stuff into your system. And when stuff doesn’t work, you’ll need to make it work, even it make no sense.
Let us take a simple example with this user, shall we?
Assuming that you have a search form for users, and a call goes to the call center, where you need to have the helpdesk staff search for that particular user.
Here are the queries that they tried:
- Stefanie
- Stephenie
- Stefanee
- Stepfanie
- Stephany
Now, you may think poorly of the helpdesk staff (likely outsource and non native speakers), but the same thing can happen for Yasmin, Jasmyne and Jasmyn (I literally took the first two examples I found in Twitter, to show that this is a real issue).
How do you handle something like this? Well, if you are Google, you do this:
What happens if you aren’t Google? Well, since we are talking about RavenDB here, you are going to run a suggestion query, like so:
This requires us to have defined the “Users/Search” index and mark the Name field as viable for suggestions. This tend to be quite computing intensive during indexing time, but it allow us to make a suggestion query on the field, which will give us this result:
What is going on here? During indexing, RavenDB is going to generate a list of permutations of the data that is being indexed. Then, when you run a suggestion query, we can compare the user’s input to the data that has been indexed and suggest possible alternatives to what the user actually entered. This isn’t a generic selection, it is based on what you actually have in your system.
A more serious case is the international scene. When you have a user such as “André Sørina”:
How do you search for them? On my keyboard, I don’t know how to type this marks (diacritic, I had to search for that). If someone tried to tell me these over the phone, I would be completely lost. It’s a good thing that we have a good solution for that:
Which will give us:
And now we can search for that, and find the user very easily.
This is a feature that we had since 2010, but it got a serious face lift and made easier to use in RavenDB 4.0.
More posts in "Queries++ in RavenDB" series:
- (18 Dec 2017) Spatial searches
- (15 Dec 2017) I suggest you can do better
- (11 Dec 2017) Gimme more like this
- (07 Dec 2017) Facets of information
Comments
Is it deliberate that you misspelled Yasmine's name?
Name searching gets real fun when you have to accounts for the different transliterations from say Arabic or Russian, typically systems would use something like Soundex/Metaphone but this only works for Western names. The other problem is contractions e.g. Robert/Bob, Charles/Chuck for these you can introduce the idea of a "name cycle" which allows you to cluster values into semantic groups
Damien, No, but that makes for a great point for the post, no? I'll leave it like this.
Paul, Soundex was specifically developed to deal with Polish names immigrating to the states, it held up surprisingly well, considering. What you are talking about is more in the sense of synonyms for the names. Similar to how you'll deal with "doctor", "dr", "doc" as the same thing.
We have been using a custom analyzer to search for spanish user names without taking into account accents and other special characters. Our analyzer changed diacritic marks áéí... into ascii equivalent.
The only thing that is a bit cumbersome is to make the analyzer/plugin and ship it to the plugins folder.
Since this is a common problem for a lot pf locales, I think it would be great that Raven shipped out of the box with an analyzer that did just that. Maybe a DiacriticInsensitiveStandardAnalyzer or so.
Germán, A PR for that would be great.
double metaphone, and even more so metaphone 3, should be able to handle any of your examples easily
Comment preview