[erlang-questions] Speedy unsort:shuffle/1,2 ?
wde
wde@REDACTED
Wed May 26 20:07:08 CEST 2010
Author: John Haugeland
MIT LICENSE
svn://crunchyd.com/scutil/erl/src/sc_random.erl
shuffle(List) ->
WeightedAndShuffled = lists:map( fun(Item) -> { random:uniform(), Item } end, List ),
{ _, SortedAndDeweighted } = lists:unzip(lists:keysort(1, WeightedAndShuffled)),
SortedAndDeweighted.
>I had a method in Javascript that was based on the Fisher-Yates shuffle
>that I tried converting to Erlang. It essentially zips the given list
>with a sequence 1-N, and uses a combination of lists:nth/2 and
>proplists:delete/2 to create the shuffled list. I've not looked into the
>entropy or bias as I've just thrown this together but the simple testing
>seems to be positive.
>
>%% unsort
>-module(unsort)
>
>-export([shuffle/1]).
>
>shuffle(L) ->
> Len = length(L),
> PropL = lists:zip(lists:seq(1, Len), L),
> shuffle(PropL, Len, []).
>
>shuffle([], _, L) -> lists:reverse(L);
>shuffle([{_,V}], _, L) -> lists:reverse([V | L]);
>shuffle([{_,V} | T]=Ps, Len, L) ->
> case swap_idx(Len) of
> 1 ->
> shuffle(T, Len-1, [V | L]);
> SwapIdx ->
> {K1, V1} = lists:nth(SwapIdx, Ps),
> shuffle(proplists:delete(K1, Ps), Len-1, [V1 | L])
> end.
>
>swap_idx(Len) ->
> trunc(random:uniform() * Len) + 1.
>
>%% N length list
>%% Create X shuffled lists
>test(N, X) ->
> crypto:start(),
> L = lists:seq(1,N),
> Md5s = lists:map(fun(_) ->
>
>string:to_upper(lists:flatten([io_lib:format("~2.16.0b", [Digest]) ||
><<Digest>> <= crypto:md5(shuffle:shuffle(L))]))
> end, lists:seq(1,X)),
> gb_trees:to_list(
> lists:foldl(fun(Hash, Tree) ->
> case gb_trees:lookup(Hash, Tree) of
> {value, Count} ->
> gb_trees:update(Hash, Count+1, Tree);
> _ -> gb_trees:insert(Hash, 1, Tree)
> end
> end, gb_trees:empty(), Md5s)).
>
>%% Sample output
>1> unsort:test(2, 100000).
>[{"050D144172D916D0846F839E0412E929",50325},
> {"0CB988D042A7F28DD5FE2B55B3F5AC7A",49675}]
>2> unsort:test(3, 100000).
>[{"3BAC2889507C76C73913B5CACEB75B6D",16668},
> {"5289DF737DF57326FCDD22597AFB1FAC",16674},
> {"5D582C1341F4783E2FDE0F07D0AF9A06",16581},
> {"9DE228839B2CCC8D08071BCADA88CA6D",16797},
> {"A867F6E8EBD4E3A90C6084C9738B506B",16656},
> {"AE0F84A71C987607257309BF36DFB41A",16624}]
>3> unsort:test(4, 100000).
>[{"08D6C05A21512A79A1DFEB9D2A8F262F",4217},
> {"0A27E477F545F21001C975A0DA5826CF",4244},
> {"0AA9093A8C91D9BAF3DDB40613BF6052",4092},
> {"0AFB83810CCDA4270CCBB36128D7258B",4152},
> {"0E4717467434E3A8AF70C101789434D4",4202},
> {"21C49141D9D5062E9C8A3FA8591507A7",4081},
> {"26E5799C9C28C4F750DE5289644CCBCB",4190},
> {"3044356A01283AC9BF0E83A69A76DAFA",4303},
> {"32E2F90B64F81FEDB13FD92F554194EF",4069},
> {"428C536DB5E72C45CF09CCD7A229D482",4185},
> {"46DEB0BCE99B407029591EC8858F8D76",4185},
> {"5DA92947151FE75785B296B4EE680265",4109},
> {"6F3FF712FAE51C9E6A34E5649B1DF983",4058},
> {"A9380E383BA79F5FCB90CF0F3A6F61C1",4164},
> {"C73CABEB6558ABA030BBA9CA49DCDD75",3996},
> {"C7E1C6D5AE3E1AC8B876F500F26BF37B",4132},
> {"CBEED65578A453CDA5B7F9899F33F63F",4124},
> {"DC7C46E4A1969FAF25ED7F53A0E7993B",4337},
> {"DE5FE0700986F246278524D88F24DB10",4200},
> {"DF2CEBAAE30811FFA20BA50D8E3EF365",4066},
> {"E182CD2D19E8C3555B7D63454FC2D1BB",4218},
> {"E64FEF4E93468D853B99662B25D37193",4240},
> {"E78F3DE3AAE140076643E5D6F8E4C4B9",4204},
> {"E94E7EE783C0AEE87D19D87F071FA2FD",4232}]
>4>
>
>
>
>On 05/26/2010 12:09 PM, Björn-Egil Dahlberg wrote:
>> On 2010-05-26 17:14, Johan Montelius wrote:
>>> On Wed, 26 May 2010 14:41:19 +0200, Björn-Egil Dahlberg
>>> <egil@REDACTED> wrote:
>>>
>>>> My suggestion is included in this mail, but my question is: does anyone
>>>> have a faster, more generic and less memory consuming shuffle function?
>>>
>>> I haven't done any test but I would think the execution time will be
>>> very much depending on the number of random:uniform/1 calls that you do.
>>> As it is now you make a lot of calls but do not utilize the entropy. A
>>> don't think a call to random:uniform(2) is much cheaper than
>>> random:uniform(8098809898) nor do I know how well random:uniform is
>>> implemented.
>>
>> You are of course completely right. There is a lot of unnecessary
>> calls to random:uniform/1. One can always redefine the constraints for
>> the fun. Instead of returning a number, return a binary. The fun is
>> there to provide the user with a alternative random-function to the
>> shuffle function, mainly for creating pseudo-random shuffles.
>> shuffle/1 uses random:uniform/1 as default.
>>
>>> Here is an idea:
>>>
>>> generate a SHA-512 binary, use the binary bit by bit to divide the list
>>> into two lists, if you run out of bits, generate a new pattern. Return
>>> the remaining bit pattern and the two lists (this is similar to split in
>>> qsort). Recursively apply to the first list and then the second list and
>>> if we do it right we can avoid doing the last append (me think).
>>
>> +1, I like this idea. Easy to hook in crypto:rand_bytes/1 also.
>>
>> // Björn-Egil
>>
>> ________________________________________________________________
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>>
>>
>
>
>--
>James Aimonetti
>mobile: 314.809.6307
>work: 540.459.2220
>email: james.aimonetti@REDACTED
>website: http://jamesaimonetti.com
>
>
>________________________________________________________________
>erlang-questions (at) erlang.org mailing list.
>See http://www.erlang.org/faq.html
>To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
>
More information about the erlang-questions
mailing list