[erlang-questions] [ANN] Erlang UUID
Sat Mar 3 01:32:36 CET 2012
On 03/01/2012 10:48 PM, Kenji Rikitake wrote:
> A crude implementation in pure Erlang is available at
> For the 2006 Wichmann-Hill RNG which Richard mentioned below.
> ++> Richard O'Keefe <> [2012-03-02 15:59:58 +1300]:
>> Wichmann & Hill have since published a four-generator algorithm, see
>> Wichmann and Hill have a new generator with a longer period.
>>  B. A. Wichmann, I. D. Hill,
>> Generating good pseudo-random numbers,
>> Computational Statistics & Data Analysis, 51, 1614-1622 (2006).
>> In fact generating full-precision random doubles is something RNG writers often neglect.
> Indeed true. Providing enough bits for the mantissa digits is required.
>> As for generating random integers in the range [0,N) where N is a bignum, I would love
>> to be pointed to a paper on the subject.
> Me too - so far all I see is for fixed-length integers.
> Kenji Rikitake
It seems like we could use the Wichmann and Hill algorithm, in a way that utilizes big-integers. I noticed that the current usage of floating point should limit the uniform distribution that would otherwise be available, so the big-integers implementation should be superior. There just would be a small change required to the code, summarized below:
% original usage of floating point
%R = (C1 * 0.0000000004656613022697297188506231646486) +
% (C2 * 0.0000000004656613100759859932486569933169) +
% (C3 * 0.0000000004656613360968421314794009471615) +
% (C4 * 0.0000000004656614011489951998100056779817),
% transformation into big-integers without common factors
%R = ((C1 * 4656613022697297188506231646486) +
% (C2 * 4656613100759859932486569933169) +
% (C3 * 4656613360968421314794009471615) +
% (C4 * 4656614011489951998100056779817)) / 1.0e40,
% usage as big-integers
I = ((C1 * 100974236070208004586355635975308882395008510460084959275529243190275451744887731102642177895) +
(C2 * 100974234377495411077165331191698869941147322702626405281612091220218253091371910599190635130) +
(C3 * 100974228735120099379864315246347981874512822184904954644794292108235670371042659321702493478) +
(C4 * 100974214629181820136611775382944281323036229785128918455846504909124232305107255133165006410)) rem 470197942641441751328779943957359348820645151704843242145900685403753713168019478536992537786552036455690641240694763626970,
The numbers do get a bit big and ugly, but it is just transforming R into a fraction, which represents a single number within the large distribution provided by the algorithm (i.e., a single I within [0..470197942641441751328779943957359348820645151704843242145900685403753713168019478536992537786552036455690641240694763626970)). This is basically the same as the (R - trunc(R)), just that it avoids the division. So, if I is uniformly distributed with this large value of N, it should also be uniformly distributed within a subset of the range. So, all we should need to do is take the remainder based on the N we supply to the module, so:
uniform(N) when is_integer(N), N >= 1, N < 470197942641441751328779943957359348820645151704843242145900685403753713168019478536992537786552036455690641240694763626970 ->
(uniform() rem N) + 1.
If you use this implementation, then it would provide 408 bits of randomness, uniformly distributed. I know you mentioned the period of this algorithm as "~ 2^120 (~ 10^36)" in a previous post, so it doesn't compare to a Mersenne Twister solution. However, this provides a decent way of avoiding a port_driver (crypto) or NIF call, which would have poorer performance (like crypto:rand_uniform/2), and it provides normal fault-tolerance (since it is all Erlang code).
It would be nice if small N, i.e., less than 1000000 could used erlang:now/0 instead, since it is quicker than even random:uniform/1, in my testing, and it appears to be more uniform. However, I assume it would become less uniform with usage on timers, so that is probably a bad case to add to this type of a random module.
I added the modified module to my speed test, just to look at its performance here:
The speed test gives:
N == 10000 (10 runs)
crypto:rand_uniform/ get: 68999.4 µs ( 26.0)
erlang:now/0 get: 2658.6 µs ( 1.0)
random:uniform/1 get: 7011.5 µs ( 2.6)
random:uniform_wh06/ get: 21636.6 µs ( 8.1)
That is with the random N == 10, performed 10000 times, during 10 runs that are averaged with Erlang R14B04 (non-HiPE).
Tell me if you think my modifications to random_wh06 are erroneous in some way.
More information about the erlang-questions