[erlang-questions] [ANN] Erlang UUID

Sat Mar 3 01:32:36 CET 2012

On 03/01/2012 10:48 PM, Kenji Rikitake wrote:
> FYI:
> A crude implementation in pure Erlang is available at
> https://github.com/jj1bdx/sfmt-erlang/blob/master/src/random_wh06.erl
> For the 2006 Wichmann-Hill RNG which Richard mentioned below.
>
> ++> Richard O'Keefe <ok@REDACTED> [2012-03-02 15:59:58 +1300]:
> [snip] 
>
>> Wichmann & Hill have since published a four-generator algorithm, see
>>     Wichmann and Hill have a new generator with a longer period.
>>     [2] B. A. Wichmann, I. D. Hill,
>>     Generating good pseudo-random numbers,
>>     Computational Statistics & Data Analysis, 51, 1614-1622 (2006).
> BTW
>
>> In fact generating full-precision random doubles is something RNG writers often neglect.
> Indeed true. Providing enough bits for the mantissa digits is required.
>
>> As for generating random integers in the range [0,N) where N is a bignum, I would love
>> to be pointed to a paper on the subject.
> Me too - so far all I see is for fixed-length integers.
>
> Kenji Rikitake
It seems like we could use the Wichmann and Hill algorithm, in a way that utilizes big-integers.  I noticed that the current usage of floating point should limit the uniform distribution that would otherwise be available, so the big-integers implementation should be superior.  There just would be a small change required to the code, summarized below:

    % original usage of floating point
    %R = (C1 * 0.0000000004656613022697297188506231646486) +
    %    (C2 * 0.0000000004656613100759859932486569933169) +
    %    (C3 * 0.0000000004656613360968421314794009471615) +
    %    (C4 * 0.0000000004656614011489951998100056779817),

    % transformation into big-integers without common factors
    %R = ((C1 * 4656613022697297188506231646486) +
    %     (C2 * 4656613100759859932486569933169) +
    %     (C3 * 4656613360968421314794009471615) +
    %     (C4 * 4656614011489951998100056779817)) / 1.0e40,

    % usage as big-integers
    I = ((C1 * 100974236070208004586355635975308882395008510460084959275529243190275451744887731102642177895) +
         (C2 * 100974234377495411077165331191698869941147322702626405281612091220218253091371910599190635130) +
         (C3 * 100974228735120099379864315246347981874512822184904954644794292108235670371042659321702493478) +
         (C4 * 100974214629181820136611775382944281323036229785128918455846504909124232305107255133165006410)) rem 470197942641441751328779943957359348820645151704843242145900685403753713168019478536992537786552036455690641240694763626970,

The numbers do get a bit big and ugly, but it is just transforming R into a fraction, which represents a single number within the large distribution provided by the algorithm (i.e., a single I within [0..470197942641441751328779943957359348820645151704843242145900685403753713168019478536992537786552036455690641240694763626970)).  This is basically the same as the (R - trunc(R)), just that it avoids the division.  So, if I is uniformly distributed with this large value of N, it should also be uniformly distributed within a subset of the range.  So, all we should need to do is take the remainder based on the N we supply to the module, so:

uniform(N) when is_integer(N), N >= 1, N < 470197942641441751328779943957359348820645151704843242145900685403753713168019478536992537786552036455690641240694763626970 ->
    (uniform() rem N) + 1.

If you use this implementation, then it would provide 408 bits of randomness, uniformly distributed.  I know you mentioned the period of this algorithm as "~ 2^120 (~ 10^36)" in a previous post, so it doesn't compare to a Mersenne Twister solution.  However, this provides a decent way of avoiding a port_driver (crypto) or NIF call, which would have poorer performance (like crypto:rand_uniform/2), and it provides normal fault-tolerance (since it is all Erlang code).

It would be nice if small N, i.e., less than 1000000 could used erlang:now/0 instead, since it is quicker than even random:uniform/1, in my testing, and it appears to be more uniform.  However, I assume it would become less uniform with usage on timers, so that is probably a bad case to add to this type of a random module.

I added the modified module to my speed test, just to look at its performance here:
https://github.com/okeuday/erlbench/blob/master/src/random_wh06.erl

The speed test gives:
TEST pseudo_randomness
N == 10000 (10 runs)
crypto:rand_uniform/ get:    68999.4 µs ( 26.0)
        erlang:now/0 get:     2658.6 µs (  1.0)
    random:uniform/1 get:     7011.5 µs (  2.6)
random:uniform_wh06/ get:    21636.6 µs (  8.1)

That is with the random N == 10, performed 10000 times, during 10 runs that are averaged with Erlang R14B04 (non-HiPE).

Tell me if you think my modifications to random_wh06 are erroneous in some way.

Thanks,
Michael