Figure this code?
I have just finished writing this code, and it occurred to me that on its own, it makes very little sense.
private static int RedactedFunctionName(List<int> a, List<int> b) { List<int> n,m; if (a.Count > b.Count) { n = a; m = b; } else { n = b; m = a; } int nSize = n.Count; int mSize = m.Count; double o1 = nSize + mSize; double o2 = nSize * Math.Log(mSize, 2); int result = 0; if (o1 < o2) { int mi = 0, ni = 0; while (mi < mSize && ni < nSize) { if (n[ni] > m[mi]) { mi++; } else if (n[ni] < m[mi]) { ni++; } else { ni++; mi++; result++; } } } else { for (int i = 0; i < mSize; i++) { if (n.BinarySearch(m[i]) >= 0) result++; } } return result; }
Can you figure out what this function does, why is it behaving in this manner, and what preconditions it expects?
Comments
Sets intersection count
You believe a binarysearch is better when n*log(m)<n+m where m is the smallest set.
And you expect both list to be sorted.
Is the *.count cached or do the program use the painters algoritm?
See also:
http://stackoverflow.com/questions/7574311/efficiently-compute-intersection-of-two-sets-in-java
http://stackoverflow.com/questions/7164572/c-sharp-fastest-intersection-of-2-sets-of-sorted-numbers
Preconditions -Lists must be sorted(for binary search to have a meaning) -Than it checks for size of the lists and it will take the most effective in terms of memory consumption The result is: The intersection count.
Carsten, I'm not sure that I understand you. The sets that this method is called with aren't the same all the time
double o2 = mSize * Math.Log(nSize, 2);
It looks like its finding equalites in the set, but I would say it could probably be markedly more efficient by dropping the list structure for arrays. And I think that Count is an enumerating operation.
Count
is a property and is cached. It is cheap (as a property should be).The Linq
Count()
tries to cast the IEnumerable to a few types that have a cached count/length and only enumerates if there is no other obvious way. (In the original Linq implementation Count() always enumerated)I'm with Martijn:
o2 = mSize * Math.Log(nSize,2)
List's count is not Linq Count(). Completely different thing, http://www.dotnetperls.com/list Count, clear. To get the number of elements, access the Count property. This is fast—just avoid the Count extension method. Count, on the List type, is equal to Length on arrays.
This function tries to optimize intersection when m is far smaller than n.
IE: intersecting m:100 n:1000000 will result in only 1993 compares on average due to binary search, while comparing all items will result in 1000100 compares worst case.
Also Martijn is corect.
n.BinarySearch(m[i]) is Log(n,2) average and you are iterating through m.
Therefore o2 should be double o2 = mSize * Math.Log(nSize, 2);
I happily stand corrected.
The fact that you're using one letter variable names is terrible! I wouldn't even hire you if you wrote code like that in an interview... have you not read Clean Code (http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882)
Dear Sagan Marketing,
Well if you would not hire Ayende, I doubt you would be able to spot any kind of talented programmers.
Good job in finding real talent by looking at superficial details instead of essence. I'm guessing you would hire a programmer that writes a O N^2 complexity algorithm for intersection that uses multi letter variables instead of someone who writes an O (Log N) one with one letter variables.
One letter variable names are perfectly acceptable in a function like this.
It's small, self contained, has no obvious side effects, and performs a single operation.
It's basically a low level maths function, and using the idiom of a mathematical function is entirely appropriate.
Slavishly following coding style dictats like "goto considered harmful", "use long variable names always" and projecting those onto interviewees is so closed minded as to be bordering on bigotry and is a sure fire way to shoot yourself in the foot.
Suffice to say that many talented thoughtful programmers probably wouldn't want to work for someone like that.
Working on and maintaining a high level code base is an entirely different thing, where good code hygiene is essential. But It should still be practiced with thought, consideration, and appropriate strictness.
Comment preview