The bug in the random sort
We needed to randomize a list of values, so I quickly wrote the following code:
What do you expect the output to be?
- A –> 2431
- B –> 2571
- C -> 4998
The number vary slightly, but they are surprisingly consistent overall. C is about 50% likely to be at the head of the list, but I wanted the probability to be 33% obviously.
Can you figure out why this is the case? In order to sort, we need to compare the values. And we do that in a random fashion. We start by comparing A and B, and they have 50% change of either one of them being first.
Then we compare the winner (A | B) to C, and there is a 50% chance of C being first. In other words, because we compare C to the top once, it has a 50% chance of being first. But for A and B, they need to pass two 50% chances to get to the top, so they are only there 25% of the time.
When we figured out why this is happening, I immediately thought of the Monty Hall problem.
The right solution is the Fisher-Yates algorithm, and here is how you want to implement it.
Comments
List.Sort also requires stable ordering, if once you have said a>b it must always return that. Linqs OrderBy(x=>random.Next()) will work however. Fisher-Yates is useful if you just want n random elements from a list that is much larger than n elements.
I realize this is quick-and-dirty and broken code, but I still think there are few things worth pointing out about it:
new Random()
will work fine on .Net Core 2.0+, but not on older frameworks (including .Net Framework).List<T>.Sort()
can cause all sorts of problems and does not guarantee a reasonable result.list.OrderBy(_ => new Random().Next())
. (And non-deterministic delegate is okay in this case, becauseEnumerable.OrderBy
caches the keys, whileList<T>.Sort
does not cache the results of thecomparison
.)The reverse for for Fisher-Yates, you picked as an example, is such a weird implementation. Works perfectly fine with an increasing for:
Dennis,
OrderBy
is streaming, so it needs to keep track of all items. It basically create a separate list of indexes and run based on that.see:https://github.com/dotnet/runtime/blob/master/src/libraries/System.Linq/src/System/Linq/OrderedEnumerable.cs#L28
The stack overflow example you are pointing to is quite badly broken. It used to be correct implementation, but as of the last edit on Nov 18, 2019 it's completely broken. What's especially funny, "Try this code" link contains test case which does a nice job to hide the problem with the code.
In the linked code, after 1 shuffle last element of the array will be either N with probability of (N-1)/N) or 1 (1/N)
list = list.OrderBy(_ => Guid.NewGuid()).ToList()
Comment preview