[erlang-questions] Speedy unsort:shuffle/1,2 ?

Wed May 26 19:22:55 CEST 2010

I had a method in Javascript that was based on the Fisher-Yates shuffle 
that I tried converting to Erlang. It essentially zips the given list 
with a sequence 1-N, and uses a combination of lists:nth/2 and 
proplists:delete/2 to create the shuffled list. I've not looked into the 
entropy or bias as I've just thrown this together but the simple testing 
seems to be positive.

%% unsort
-module(unsort)

-export([shuffle/1]).

shuffle(L) ->
     Len = length(L),
     PropL = lists:zip(lists:seq(1, Len), L),
     shuffle(PropL, Len, []).

shuffle([], _, L) -> lists:reverse(L);
shuffle([{_,V}], _, L) -> lists:reverse([V | L]);
shuffle([{_,V} | T]=Ps, Len, L) ->
     case swap_idx(Len) of
         1 ->
             shuffle(T, Len-1, [V | L]);
         SwapIdx ->
             {K1, V1} = lists:nth(SwapIdx, Ps),
             shuffle(proplists:delete(K1, Ps), Len-1, [V1 | L])
     end.

swap_idx(Len) ->
     trunc(random:uniform() * Len) + 1.

%% N length list
%% Create X shuffled lists
test(N, X) ->
     crypto:start(),
     L = lists:seq(1,N),
     Md5s = lists:map(fun(_) ->

string:to_upper(lists:flatten([io_lib:format("~2.16.0b", [Digest]) || 
<<Digest>> <= crypto:md5(shuffle:shuffle(L))]))
                      end, lists:seq(1,X)),
     gb_trees:to_list(
       lists:foldl(fun(Hash, Tree) ->
                           case gb_trees:lookup(Hash, Tree) of
                               {value, Count} ->
                                   gb_trees:update(Hash, Count+1, Tree);
                               _ -> gb_trees:insert(Hash, 1, Tree)
                           end
                   end, gb_trees:empty(), Md5s)).

%% Sample output
1> unsort:test(2, 100000).
[{"050D144172D916D0846F839E0412E929",50325},
  {"0CB988D042A7F28DD5FE2B55B3F5AC7A",49675}]
2> unsort:test(3, 100000).
[{"3BAC2889507C76C73913B5CACEB75B6D",16668},
  {"5289DF737DF57326FCDD22597AFB1FAC",16674},
  {"5D582C1341F4783E2FDE0F07D0AF9A06",16581},
  {"9DE228839B2CCC8D08071BCADA88CA6D",16797},
  {"A867F6E8EBD4E3A90C6084C9738B506B",16656},
  {"AE0F84A71C987607257309BF36DFB41A",16624}]
3> unsort:test(4, 100000).
[{"08D6C05A21512A79A1DFEB9D2A8F262F",4217},
  {"0A27E477F545F21001C975A0DA5826CF",4244},
  {"0AA9093A8C91D9BAF3DDB40613BF6052",4092},
  {"0AFB83810CCDA4270CCBB36128D7258B",4152},
  {"0E4717467434E3A8AF70C101789434D4",4202},
  {"21C49141D9D5062E9C8A3FA8591507A7",4081},
  {"26E5799C9C28C4F750DE5289644CCBCB",4190},
  {"3044356A01283AC9BF0E83A69A76DAFA",4303},
  {"32E2F90B64F81FEDB13FD92F554194EF",4069},
  {"428C536DB5E72C45CF09CCD7A229D482",4185},
  {"46DEB0BCE99B407029591EC8858F8D76",4185},
  {"5DA92947151FE75785B296B4EE680265",4109},
  {"6F3FF712FAE51C9E6A34E5649B1DF983",4058},
  {"A9380E383BA79F5FCB90CF0F3A6F61C1",4164},
  {"C73CABEB6558ABA030BBA9CA49DCDD75",3996},
  {"C7E1C6D5AE3E1AC8B876F500F26BF37B",4132},
  {"CBEED65578A453CDA5B7F9899F33F63F",4124},
  {"DC7C46E4A1969FAF25ED7F53A0E7993B",4337},
  {"DE5FE0700986F246278524D88F24DB10",4200},
  {"DF2CEBAAE30811FFA20BA50D8E3EF365",4066},
  {"E182CD2D19E8C3555B7D63454FC2D1BB",4218},
  {"E64FEF4E93468D853B99662B25D37193",4240},
  {"E78F3DE3AAE140076643E5D6F8E4C4B9",4204},
  {"E94E7EE783C0AEE87D19D87F071FA2FD",4232}]
4>

On 05/26/2010 12:09 PM, Björn-Egil Dahlberg wrote:
> On 2010-05-26 17:14, Johan Montelius wrote:
>> On Wed, 26 May 2010 14:41:19 +0200, Björn-Egil Dahlberg
>> <egil@REDACTED> wrote:
>>
>>> My suggestion is included in this mail, but my question is: does anyone
>>> have a faster, more generic and less memory consuming shuffle function?
>>
>> I haven't done any test but I would think the execution time will be
>> very much depending on the number of random:uniform/1 calls that you do.
>> As it is now you make a lot of calls but do not utilize the entropy. A
>> don't think a call to random:uniform(2) is much cheaper than
>> random:uniform(8098809898) nor do I know how well random:uniform is
>> implemented.
>
> You are of course completely right. There is a lot of unnecessary 
> calls to random:uniform/1. One can always redefine the constraints for 
> the fun. Instead of returning a number, return a binary. The fun is 
> there to provide the user with a alternative random-function to the 
> shuffle function, mainly for creating pseudo-random shuffles. 
> shuffle/1 uses random:uniform/1 as default.
>
>> Here is an idea:
>>
>> generate a SHA-512 binary, use the binary bit by bit to divide the list
>> into two lists, if you run out of bits, generate a new pattern. Return
>> the remaining bit pattern and the two lists (this is similar to split in
>> qsort). Recursively apply to the first list and then the second list and
>> if we do it right we can avoid doing the last append (me think).
>
> +1, I like this idea. Easy to hook in crypto:rand_bytes/1 also.
>
> // Björn-Egil
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
>

-- 
James Aimonetti
mobile: 314.809.6307
work: 540.459.2220
email: james.aimonetti@REDACTED
website: http://jamesaimonetti.com