[erlang-questions] Speedy unsort:shuffle/1,2 ?
James Aimonetti
james.aimonetti@REDACTED
Wed May 26 19:22:55 CEST 2010
I had a method in Javascript that was based on the Fisher-Yates shuffle
that I tried converting to Erlang. It essentially zips the given list
with a sequence 1-N, and uses a combination of lists:nth/2 and
proplists:delete/2 to create the shuffled list. I've not looked into the
entropy or bias as I've just thrown this together but the simple testing
seems to be positive.
%% unsort
-module(unsort)
-export([shuffle/1]).
shuffle(L) ->
Len = length(L),
PropL = lists:zip(lists:seq(1, Len), L),
shuffle(PropL, Len, []).
shuffle([], _, L) -> lists:reverse(L);
shuffle([{_,V}], _, L) -> lists:reverse([V | L]);
shuffle([{_,V} | T]=Ps, Len, L) ->
case swap_idx(Len) of
1 ->
shuffle(T, Len-1, [V | L]);
SwapIdx ->
{K1, V1} = lists:nth(SwapIdx, Ps),
shuffle(proplists:delete(K1, Ps), Len-1, [V1 | L])
end.
swap_idx(Len) ->
trunc(random:uniform() * Len) + 1.
%% N length list
%% Create X shuffled lists
test(N, X) ->
crypto:start(),
L = lists:seq(1,N),
Md5s = lists:map(fun(_) ->
string:to_upper(lists:flatten([io_lib:format("~2.16.0b", [Digest]) ||
<<Digest>> <= crypto:md5(shuffle:shuffle(L))]))
end, lists:seq(1,X)),
gb_trees:to_list(
lists:foldl(fun(Hash, Tree) ->
case gb_trees:lookup(Hash, Tree) of
{value, Count} ->
gb_trees:update(Hash, Count+1, Tree);
_ -> gb_trees:insert(Hash, 1, Tree)
end
end, gb_trees:empty(), Md5s)).
%% Sample output
1> unsort:test(2, 100000).
[{"050D144172D916D0846F839E0412E929",50325},
{"0CB988D042A7F28DD5FE2B55B3F5AC7A",49675}]
2> unsort:test(3, 100000).
[{"3BAC2889507C76C73913B5CACEB75B6D",16668},
{"5289DF737DF57326FCDD22597AFB1FAC",16674},
{"5D582C1341F4783E2FDE0F07D0AF9A06",16581},
{"9DE228839B2CCC8D08071BCADA88CA6D",16797},
{"A867F6E8EBD4E3A90C6084C9738B506B",16656},
{"AE0F84A71C987607257309BF36DFB41A",16624}]
3> unsort:test(4, 100000).
[{"08D6C05A21512A79A1DFEB9D2A8F262F",4217},
{"0A27E477F545F21001C975A0DA5826CF",4244},
{"0AA9093A8C91D9BAF3DDB40613BF6052",4092},
{"0AFB83810CCDA4270CCBB36128D7258B",4152},
{"0E4717467434E3A8AF70C101789434D4",4202},
{"21C49141D9D5062E9C8A3FA8591507A7",4081},
{"26E5799C9C28C4F750DE5289644CCBCB",4190},
{"3044356A01283AC9BF0E83A69A76DAFA",4303},
{"32E2F90B64F81FEDB13FD92F554194EF",4069},
{"428C536DB5E72C45CF09CCD7A229D482",4185},
{"46DEB0BCE99B407029591EC8858F8D76",4185},
{"5DA92947151FE75785B296B4EE680265",4109},
{"6F3FF712FAE51C9E6A34E5649B1DF983",4058},
{"A9380E383BA79F5FCB90CF0F3A6F61C1",4164},
{"C73CABEB6558ABA030BBA9CA49DCDD75",3996},
{"C7E1C6D5AE3E1AC8B876F500F26BF37B",4132},
{"CBEED65578A453CDA5B7F9899F33F63F",4124},
{"DC7C46E4A1969FAF25ED7F53A0E7993B",4337},
{"DE5FE0700986F246278524D88F24DB10",4200},
{"DF2CEBAAE30811FFA20BA50D8E3EF365",4066},
{"E182CD2D19E8C3555B7D63454FC2D1BB",4218},
{"E64FEF4E93468D853B99662B25D37193",4240},
{"E78F3DE3AAE140076643E5D6F8E4C4B9",4204},
{"E94E7EE783C0AEE87D19D87F071FA2FD",4232}]
4>
On 05/26/2010 12:09 PM, Björn-Egil Dahlberg wrote:
> On 2010-05-26 17:14, Johan Montelius wrote:
>> On Wed, 26 May 2010 14:41:19 +0200, Björn-Egil Dahlberg
>> <egil@REDACTED> wrote:
>>
>>> My suggestion is included in this mail, but my question is: does anyone
>>> have a faster, more generic and less memory consuming shuffle function?
>>
>> I haven't done any test but I would think the execution time will be
>> very much depending on the number of random:uniform/1 calls that you do.
>> As it is now you make a lot of calls but do not utilize the entropy. A
>> don't think a call to random:uniform(2) is much cheaper than
>> random:uniform(8098809898) nor do I know how well random:uniform is
>> implemented.
>
> You are of course completely right. There is a lot of unnecessary
> calls to random:uniform/1. One can always redefine the constraints for
> the fun. Instead of returning a number, return a binary. The fun is
> there to provide the user with a alternative random-function to the
> shuffle function, mainly for creating pseudo-random shuffles.
> shuffle/1 uses random:uniform/1 as default.
>
>> Here is an idea:
>>
>> generate a SHA-512 binary, use the binary bit by bit to divide the list
>> into two lists, if you run out of bits, generate a new pattern. Return
>> the remaining bit pattern and the two lists (this is similar to split in
>> qsort). Recursively apply to the first list and then the second list and
>> if we do it right we can avoid doing the last append (me think).
>
> +1, I like this idea. Easy to hook in crypto:rand_bytes/1 also.
>
> // Björn-Egil
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>
>
--
James Aimonetti
mobile: 314.809.6307
work: 540.459.2220
email: james.aimonetti@REDACTED
website: http://jamesaimonetti.com
More information about the erlang-questions
mailing list