[erlang-questions] [ANN] Erlang UUID

Sat Mar 3 03:53:17 CET 2012

On 03/02/2012 04:32 PM, Michael Truog wrote:
> On 03/01/2012 10:48 PM, Kenji Rikitake wrote:
>> FYI:
>> A crude implementation in pure Erlang is available at
>> https://github.com/jj1bdx/sfmt-erlang/blob/master/src/random_wh06.erl
>> For the 2006 Wichmann-Hill RNG which Richard mentioned below.
>>
>> ++> Richard O'Keefe <ok@REDACTED> [2012-03-02 15:59:58 +1300]:
>> [snip] 
>>
>>> Wichmann & Hill have since published a four-generator algorithm, see
>>>     Wichmann and Hill have a new generator with a longer period.
>>>     [2] B. A. Wichmann, I. D. Hill,
>>>     Generating good pseudo-random numbers,
>>>     Computational Statistics & Data Analysis, 51, 1614-1622 (2006).
>> BTW
>>
>>> In fact generating full-precision random doubles is something RNG writers often neglect.
>> Indeed true. Providing enough bits for the mantissa digits is required.
>>
>>> As for generating random integers in the range [0,N) where N is a bignum, I would love
>>> to be pointed to a paper on the subject.
>> Me too - so far all I see is for fixed-length integers.
>>
>> Kenji Rikitake
> It seems like we could use the Wichmann and Hill algorithm, in a way that utilizes big-integers.  I noticed that the current usage of floating point should limit the uniform distribution that would otherwise be available, so the big-integers implementation should be superior.  There just would be a small change required to the code, summarized below:
>
>     % original usage of floating point
>     %R = (C1 * 0.0000000004656613022697297188506231646486) +
>     %    (C2 * 0.0000000004656613100759859932486569933169) +
>     %    (C3 * 0.0000000004656613360968421314794009471615) +
>     %    (C4 * 0.0000000004656614011489951998100056779817),
>
>     % transformation into big-integers without common factors
>     %R = ((C1 * 4656613022697297188506231646486) +
>     %     (C2 * 4656613100759859932486569933169) +
>     %     (C3 * 4656613360968421314794009471615) +
>     %     (C4 * 4656614011489951998100056779817)) / 1.0e40,
>
>     % usage as big-integers
>     I = ((C1 * 100974236070208004586355635975308882395008510460084959275529243190275451744887731102642177895) +
>          (C2 * 100974234377495411077165331191698869941147322702626405281612091220218253091371910599190635130) +
>          (C3 * 100974228735120099379864315246347981874512822184904954644794292108235670371042659321702493478) +
>          (C4 * 100974214629181820136611775382944281323036229785128918455846504909124232305107255133165006410)) rem 470197942641441751328779943957359348820645151704843242145900685403753713168019478536992537786552036455690641240694763626970,
>
> The numbers do get a bit big and ugly, but it is just transforming R into a fraction, which represents a single number within the large distribution provided by the algorithm (i.e., a single I within [0..470197942641441751328779943957359348820645151704843242145900685403753713168019478536992537786552036455690641240694763626970)).  This is basically the same as the (R - trunc(R)), just that it avoids the division.  So, if I is uniformly distributed with this large value of N, it should also be uniformly distributed within a subset of the range.  So, all we should need to do is take the remainder based on the N we supply to the module, so:
>
> uniform(N) when is_integer(N), N >= 1, N < 470197942641441751328779943957359348820645151704843242145900685403753713168019478536992537786552036455690641240694763626970 ->
>     (uniform() rem N) + 1.
>
> If you use this implementation, then it would provide 408 bits of randomness, uniformly distributed.  I know you mentioned the period of this algorithm as "~ 2^120 (~ 10^36)" in a previous post, so it doesn't compare to a Mersenne Twister solution.  However, this provides a decent way of avoiding a port_driver (crypto) or NIF call, which would have poorer performance (like crypto:rand_uniform/2), and it provides normal fault-tolerance (since it is all Erlang code).
>
> It would be nice if small N, i.e., less than 1000000 could used erlang:now/0 instead, since it is quicker than even random:uniform/1, in my testing, and it appears to be more uniform.  However, I assume it would become less uniform with usage on timers, so that is probably a bad case to add to this type of a random module.
>
> I added the modified module to my speed test, just to look at its performance here:
> https://github.com/okeuday/erlbench/blob/master/src/random_wh06.erl
>
> The speed test gives:
> TEST pseudo_randomness
> N == 10000 (10 runs)
> crypto:rand_uniform/ get:    68999.4 µs ( 26.0)
>         erlang:now/0 get:     2658.6 µs (  1.0)
>     random:uniform/1 get:     7011.5 µs (  2.6)
> random:uniform_wh06/ get:    21636.6 µs (  8.1)
>
> That is with the random N == 10, performed 10000 times, during 10 runs that are averaged with Erlang R14B04 (non-HiPE).
I am sorry, I confused that algorithm with the one within the random module.  The random module algorithm required more work to use as a fraction (it was late on a Friday, is my only excuse), however, this one is simpler to treat as a fraction.  The algorithm changes as shown below:

    % original usage of floating point
    %R = (C1 * 0.0000000004656613022697297188506231646486) +
    %    (C2 * 0.0000000004656613100759859932486569933169) +
    %    (C3 * 0.0000000004656613360968421314794009471615) +
    %    (C4 * 0.0000000004656614011489951998100056779817),

    % transformation into big-integers without common factors
    %R = ((C1 * 4656613022697297188506231646486) +
    %     (C2 * 4656613100759859932486569933169) +
    %     (C3 * 4656613360968421314794009471615) +
    %     (C4 * 4656614011489951998100056779817)) / 1.0e40,

    % usage as big-integers
    I = ((C1 * 4656613022697297188506231646486) +
         (C2 * 4656613100759859932486569933169) +
         (C3 * 4656613360968421314794009471615) +
         (C4 * 4656614011489951998100056779817)) rem 10000000000000000000000000000000000000000,

This should make more sense.  However, this means that this algorithm actually provides a maximum of 133 random bits:
1> (math:log(10000000000000000000000000000000000000000) / math:log(2)) + 1.
133.8771237954945

The same explanation/idea is at work though, so that isn't wrong.  It could be applied to the random module in the same way to avoid using floating point.  The speed test is pretty much the same, since only the constants changed:
 TEST pseudo_randomness
N == 10000 (10 runs)
crypto:rand_uniform/ get:    67394.1 µs ( 25.5)
        erlang:now/0 get:     2642.8 µs (  1.0)
    random:uniform/1 get:     7006.6 µs (  2.7)
random:uniform_wh06/ get:    17741.2 µs (  6.7)

However, please tell me if you think I have a mistake in there still.  Updated the link:
https://github.com/okeuday/erlbench/blob/master/src/random_wh06.erl

Thanks,
Michael