[erlang-questions] On OTP rand module difference between OTP 19 and OTP 20

Mon Sep 4 11:49:05 CEST 2017

On Mon, Sep 04, 2017 at 12:37:50PM +1200, Richard A. O'Keefe wrote:
> 
> 
> On 1/09/17 8:49 PM, Raimo Niskanen wrote:
> >> By the way, given that a common way to make random floats is to
> >> generate a bitvector, consider
> >> (0 to: 15) collect: [:each | ((each / 15) * 256) truncated].
> >> You will notice that the spacing between the values is *almost*
> >> uniform, but not at the end.
> >
> > That sounds interesting but I do not understand.  Is that Elixir code?
> 
> Nope, Smalltalk.  I wanted to use rational arithmetic.  In fact I did
> not need to.  Here it is in Haskell:
>  > [(x * 256) `div` 15 | x <- [0..15]]
> [0,17,34,51,68,85,102,119,136,153,170,187,204,221,238,256]
> 
> Let's push that a bit further.  Let's generate all possible 10-bit
> integers and map them to the range [0..63].  We find again that
> the gap sizes are not all the same.  They can't be.  If you
> consider all vectors of N bits and map them to the range
> [0..2**M] they *cannot* be uniformly distributed no matter what
> method you use because (2**M+1) does not divide 2**N.  You can
> fix this by rejecting some of the bit vectors, but that would
> be asking everyone to pay extra for a result they don't have any
> particular need for.
> 

I see, but do not quite understand what you are getting at.

The current left-closed float generator starts with 58 random bits and
puts 53 of these into the mantissa in an IEEE 754 double binary float,
so that would not be it.

I guess it is a generator for the closed interval [0.0,1.0] or the open
(0.0,1.0) you talk about.  If so:

This one-liner generates over [0.0,1.0]:
    (rand:uniform((1 bsl 53) + 1) -1) * math:pow(2, -53)
and it uses an integer range R = (2^53 + 1), which is not dividable by 2.

The implementation for that range will generate a 58 bit number and then
check if the number is 0 =< X < R*31 and if so return the number mod R
(31 repetitions of the range is all that fits completely in 58 bits).

If the generated number is R*31 =< X that is in the top truncated interval
then we start over with a new number.

The implementation may in theory retry forever before finding a good
number, but the odds for retry is about 1/32 for each round so the
accumulated time is about 32/31 times one round.  And only once in a million
it will take 4 attempts or more.

I discussed a different implementation with Prof. Vigna that is to always
generate one word too much and then use mod R on that which would bury the
skew in one word of random bits hence the difference in probability between
generated numbers should be about (2^58 - 1)/2^58, which would take quite
some effort to measure with statistical significance.  But he considered
that as a bad idea since it would get half the speed for most ranges.

So this is an already solved problem, as I see it.

We *can* write efficient and good generators for open, closed and
half-closed intervals, if we want.

So far I have only seen the need for a (0.0,1.0] generator, which can be
implemented with:
    1.0 - rand:uniform()
but in some applications such as 1.0/X and math:log(X) I think that the form
N * 2^-53 might be less than optimal, so I have a new suggestion:

    https://github.com/erlang/otp/compare/OTP-20.0...RaimoNiskanen:raimo/stdlib/rand-uniformNZ

This variant never returns exactly 0.0 and have better precision for low
values.  Comments?  Especially about the name.

And so far I have not seen any actual need for (0.0,1.0) nor [0.0,1.0].

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB