[erlang-questions] Speedy unsort:shuffle/1,2 ?

Wed May 26 20:07:08 CEST 2010

Author: John Haugeland
MIT LICENSE 

svn://crunchyd.com/scutil/erl/src/sc_random.erl

shuffle(List) ->
   WeightedAndShuffled  = lists:map( fun(Item) -> { random:uniform(), Item } end, List ),
   { _, SortedAndDeweighted } = lists:unzip(lists:keysort(1, WeightedAndShuffled)),
   SortedAndDeweighted.


>I had a method in Javascript that was based on the Fisher-Yates shuffle 
>that I tried converting to Erlang. It essentially zips the given list 
>with a sequence 1-N, and uses a combination of lists:nth/2 and 
>proplists:delete/2 to create the shuffled list. I've not looked into the 
>entropy or bias as I've just thrown this together but the simple testing 
>seems to be positive.
>
>%% unsort
>-module(unsort)
>
>-export([shuffle/1]).
>
>shuffle(L) ->
>     Len = length(L),
>     PropL = lists:zip(lists:seq(1, Len), L),
>     shuffle(PropL, Len, []).
>
>shuffle([], _, L) -> lists:reverse(L);
>shuffle([{_,V}], _, L) -> lists:reverse([V | L]);
>shuffle([{_,V} | T]=Ps, Len, L) ->
>     case swap_idx(Len) of
>         1 ->
>             shuffle(T, Len-1, [V | L]);
>         SwapIdx ->
>             {K1, V1} = lists:nth(SwapIdx, Ps),
>             shuffle(proplists:delete(K1, Ps), Len-1, [V1 | L])
>     end.
>
>swap_idx(Len) ->
>     trunc(random:uniform() * Len) + 1.
>
>%% N length list
>%% Create X shuffled lists
>test(N, X) ->
>     crypto:start(),
>     L = lists:seq(1,N),
>     Md5s = lists:map(fun(_) ->
>                              
>string:to_upper(lists:flatten([io_lib:format("~2.16.0b", [Digest]) || 
><<Digest>> <= crypto:md5(shuffle:shuffle(L))]))
>                      end, lists:seq(1,X)),
>     gb_trees:to_list(
>       lists:foldl(fun(Hash, Tree) ->
>                           case gb_trees:lookup(Hash, Tree) of
>                               {value, Count} ->
>                                   gb_trees:update(Hash, Count+1, Tree);
>                               _ -> gb_trees:insert(Hash, 1, Tree)
>                           end
>                   end, gb_trees:empty(), Md5s)).
>
>%% Sample output
>1> unsort:test(2, 100000).
>[{"050D144172D916D0846F839E0412E929",50325},
>  {"0CB988D042A7F28DD5FE2B55B3F5AC7A",49675}]
>2> unsort:test(3, 100000).
>[{"3BAC2889507C76C73913B5CACEB75B6D",16668},
>  {"5289DF737DF57326FCDD22597AFB1FAC",16674},
>  {"5D582C1341F4783E2FDE0F07D0AF9A06",16581},
>  {"9DE228839B2CCC8D08071BCADA88CA6D",16797},
>  {"A867F6E8EBD4E3A90C6084C9738B506B",16656},
>  {"AE0F84A71C987607257309BF36DFB41A",16624}]
>3> unsort:test(4, 100000).
>[{"08D6C05A21512A79A1DFEB9D2A8F262F",4217},
>  {"0A27E477F545F21001C975A0DA5826CF",4244},
>  {"0AA9093A8C91D9BAF3DDB40613BF6052",4092},
>  {"0AFB83810CCDA4270CCBB36128D7258B",4152},
>  {"0E4717467434E3A8AF70C101789434D4",4202},
>  {"21C49141D9D5062E9C8A3FA8591507A7",4081},
>  {"26E5799C9C28C4F750DE5289644CCBCB",4190},
>  {"3044356A01283AC9BF0E83A69A76DAFA",4303},
>  {"32E2F90B64F81FEDB13FD92F554194EF",4069},
>  {"428C536DB5E72C45CF09CCD7A229D482",4185},
>  {"46DEB0BCE99B407029591EC8858F8D76",4185},
>  {"5DA92947151FE75785B296B4EE680265",4109},
>  {"6F3FF712FAE51C9E6A34E5649B1DF983",4058},
>  {"A9380E383BA79F5FCB90CF0F3A6F61C1",4164},
>  {"C73CABEB6558ABA030BBA9CA49DCDD75",3996},
>  {"C7E1C6D5AE3E1AC8B876F500F26BF37B",4132},
>  {"CBEED65578A453CDA5B7F9899F33F63F",4124},
>  {"DC7C46E4A1969FAF25ED7F53A0E7993B",4337},
>  {"DE5FE0700986F246278524D88F24DB10",4200},
>  {"DF2CEBAAE30811FFA20BA50D8E3EF365",4066},
>  {"E182CD2D19E8C3555B7D63454FC2D1BB",4218},
>  {"E64FEF4E93468D853B99662B25D37193",4240},
>  {"E78F3DE3AAE140076643E5D6F8E4C4B9",4204},
>  {"E94E7EE783C0AEE87D19D87F071FA2FD",4232}]
>4>
>
>
>
>On 05/26/2010 12:09 PM, Björn-Egil Dahlberg wrote:
>> On 2010-05-26 17:14, Johan Montelius wrote:
>>> On Wed, 26 May 2010 14:41:19 +0200, Björn-Egil Dahlberg
>>> <egil@REDACTED> wrote:
>>>
>>>> My suggestion is included in this mail, but my question is: does anyone
>>>> have a faster, more generic and less memory consuming shuffle function?
>>>
>>> I haven't done any test but I would think the execution time will be
>>> very much depending on the number of random:uniform/1 calls that you do.
>>> As it is now you make a lot of calls but do not utilize the entropy. A
>>> don't think a call to random:uniform(2) is much cheaper than
>>> random:uniform(8098809898) nor do I know how well random:uniform is
>>> implemented.
>>
>> You are of course completely right. There is a lot of unnecessary 
>> calls to random:uniform/1. One can always redefine the constraints for 
>> the fun. Instead of returning a number, return a binary. The fun is 
>> there to provide the user with a alternative random-function to the 
>> shuffle function, mainly for creating pseudo-random shuffles. 
>> shuffle/1 uses random:uniform/1 as default.
>>
>>> Here is an idea:
>>>
>>> generate a SHA-512 binary, use the binary bit by bit to divide the list
>>> into two lists, if you run out of bits, generate a new pattern. Return
>>> the remaining bit pattern and the two lists (this is similar to split in
>>> qsort). Recursively apply to the first list and then the second list and
>>> if we do it right we can avoid doing the last append (me think).
>>
>> +1, I like this idea. Easy to hook in crypto:rand_bytes/1 also.
>>
>> // Björn-Egil
>>
>> ________________________________________________________________
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>>
>>
>
>
>-- 
>James Aimonetti
>mobile: 314.809.6307
>work: 540.459.2220
>email: james.aimonetti@REDACTED
>website: http://jamesaimonetti.com
>
>
>________________________________________________________________
>erlang-questions (at) erlang.org mailing list.
>See http://www.erlang.org/faq.html
>To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
>