You know too much
 Just asked a candidate to do a task that requires an O(1) retrieval semantics.  Off the top of his head, he answered that he would us a Trie.  That is an exciting new development in phone interviews, since this data structure doesn’t come up all too often.
Just asked a candidate to do a task that requires an O(1) retrieval semantics.  Off the top of his head, he answered that he would us a Trie.  That is an exciting new development in phone interviews, since this data structure doesn’t come up all too often.
I asked him to explain why he didn’t chose a dictionary as the underlying data structure. His answer was… illuminating.
He explained (in detail, and correctly) how hash maps work, that they rely on probability to ensure their O(1) guarantees and that he didn’t feel that he could select such a data structure for O(1) performance before analyzing the distribution of the data and figuring out if there are too many expected collisions in the data set.
All pretty much text book correct. In fact, if one is writing their own hash map (don’t do that) that is a concern you need to take into account. But in the real world, this is usually a solved problem. There are certain malicious inputs that can get you into trouble and there are certain things you need to take into account if you are writing your own hash function. But for the vast majority of cases and scenarios, you don’t need to worry about this at all.
The candidate is a student in the last stages of his degree and it shows. He gave a wonderful answer, if this was a test. It was just so wonderfully wrong.
 

Comments
Good article, right up until the end. Why? Because you assume that the reader will simply agree. However, I have questions, and in my experience, the answers to these questions are more important than the 'stump the geek' questions, where the candidate needs to hop that they match the interviewers' prejudices.
What analysis would be needed to choose a trie over a dictionary? When is it actually worth doing that analysis? Perhaps more imprtantly (and this is where Ayende is both right and wrong): when should you NOT do the analysis and just pick the standard off the shelf tool? What follow-up should you do? How do you know you made the right choice? When should you check?
THAT's the thinking you should be looking at for in a technical interview. Can the candidate think? Can they accept feedback? Most importantly, can you and your team get work done with them?
Dak
Dak: a hashtable has very good cache performance, whereas it takes a lot of work to get a prefix tree with good cache performance. And hashtables work with more types of data. That means they're generally better as your go-to datastructure.
It's also suggesting that a theoretical problem is common enough to be a practical problem, which should mean that nobody uses a hashmap in the real world. But people do use hashmaps in the real world quite often and generally don't hate their experiences.
There is no Dictionary as a data structure per se. A Dictionary might be implemented with a trie based on certain keys data-type FWIW.
The topic should have been hash vs trie.
cheers, </wqw>
I've been on hiring loops where I've said the candidate has a tendency for over-engineering because they wouldn't just use a simple hash table like they should. And I've also been on other ones where someone said "and they just said hashtable without even asking about the data distribution!"
Our industry is super inconsistent and everyone thinks THEIR specific level of knowledge of computer science fundamentals and software engineering best practices is the perfectly pragmatic one - just deep enough - but not too much.
I think the answer is follow up questions. Yous tart with "What data structure would you use here?"
If they say "hash table", you follow-up with "What are the situations where that may be a poor decision" and see if they understand the theoretical limitations
If they say "trie", you follow up with "What are the trade-offs of doing that in a production environment vs a simpler structure"?
Dak, Getting an answer like that means that the candidate in question may have theoretical knowledge, but has never actually applied that in real world situation. It like means that he never got to write enough code to actually have to deal with something a dictionary, and considering how ubiquitous they are...
The answer on when you shouldn't analyse the default is pretty much always, until you have an actual problem. YAGNI isn't just for writing more code, you know.
wqw, Dictionary (in .NET, at least) is understood to be hash based. We actually used the term hashmap in the phone interview, if that helps
Well, it's generally considered trie is O(log(n)) whereas hashmap O(1). Even outside of library availability trie doesn't quite meet the requirements.
However!
Oren your approach is flawed (lacking other red flags you might have glossed over). You're selecting people attuned in to your views.
Downside is: missing original thinkers, such as people who in C# code might employ memory mapping instead of tried and tested IO classes. That's where I agree with Nik P. Make them speak, probe with 'why's and 'when's.
Oleg, Not selecting hash table because no proper analysis was done on the field is an issue. Because unless you run into this rare scenario where you have collisions, that shouldn't be something you care about. And if you did run into it, I would expect you to talk about it. I probed into that, actually, and the issue was purely theoretical from the candidate perspective.
In the same sense that if I hire a driver, I expect him to be able to get the car from A to B to C without having to solve the traveling salesman for all possible itineraries.
At some point, this screams too loudly: "Never actually used this".
@Neia Actually no, hash-tables UNLESS you implement some very esoteric data structures are probably the worst data structure cache performance wise.
Comment preview