[erlang-questions] Speedy unsort:shuffle/1,2 ?

Håkan Huss huss01@REDACTED
Thu May 27 16:23:03 CEST 2010


On Thu, May 27, 2010 at 15:07, Johan Montelius <johanmon@REDACTED> wrote:
> On Thu, 27 May 2010 13:12:52 +0200, Håkan Huss <huss01@REDACTED> wrote:
>
>> However, since there are n! possible permutations, this can
>> only happen if n! = 2^m, which is not the case (unless n<=2).
>
> Hmm, but m in the split case is n*lg(n) (assuming the splits are well
> balanced) and 2^(n*lg(n)) is a quite big number. 2^(n*lg(n)) = (2^lg(n))^n =
> n^n  > n!  Or am I totally lost (known to have been)
>
The reasoning doesn't in any way relate to the specifics of the
algorithm, eg., whether splits are done or not. In general, what we
are doing is mapping a sequence of random events to a specific
permutation of the original list. Using coin flips, the random
sequence could be heads, tails, tails, heads, tails,... Each such
sequence will yield a specific permutation of the original list.

For instance, for a three-element list [a,b,c] we could provide the
following mapping (H: heads, T: tails):

HH -> [a,b,c]
HTH -> [a,c,b]
HTT -> [b,a,c]
THH -> [b,c,a]
THT -> [c,a,b]
TT -> [c,b,a]

This can be displayed as a tree with edges that each correspond to an
H or a T, and the leaves are the resulting permutations. Regardless of
the details of the algiorithm, if it uses coin flips it can be
depicted using such a tree.

Since the probability of choosing a specific branch in this tree is
1/2, the probability of obtaining any specific permutation is (1/2)^m
where m is the path length from the root of the tree to the
permutation in question. In the above example, m is 2 for the first
and the last permutation and 3 for the others. Thus, it is not a fair
shuffle.

In order for it to be fair, all path lengths need to be equal, but for
this to be possible the number of leaves must be an exact power of 2.
But since the leaves are the permutations, there are n! of them. This
is never a power of two unless n <= 2. This means that there will be
permutations with different path lengths, and thus there are
permutations which are more likely than others.

The only way to fix this is to make sure that the edges do not have
the same probability of being taken. This is what the merge version I
provided earlier did. But then you get an algorithm which is not based
on coin flipping.

(There is another possibility, of course, and that is to add dummy
leaves to fill out the tree. In the above example, you can replace HH
-> [a,b,c] with HHH -> [a,b,c]  HHT -> invalid and TT -> [c,b,a] with
TTH -> invalid, TTT -> [c,b,a]. If you also add the rule that hitting
an invalid combination causes the algorithm to restart you will get a
fair shuffle. However, applying this to an actual algorithm isn't
necessarily easy. You also lose any deterministic time bounds since
the algorithm may restart indefinitely).

Regards,
/Håkan


More information about the erlang-questions mailing list