[erlang-questions] On OTP rand module difference between OTP 19 and OTP 20
Mon Sep 4 11:49:05 CEST 2017
On Mon, Sep 04, 2017 at 12:37:50PM +1200, Richard A. O'Keefe wrote:
> On 1/09/17 8:49 PM, Raimo Niskanen wrote:
> >> By the way, given that a common way to make random floats is to
> >> generate a bitvector, consider
> >> (0 to: 15) collect: [:each | ((each / 15) * 256) truncated].
> >> You will notice that the spacing between the values is *almost*
> >> uniform, but not at the end.
> > That sounds interesting but I do not understand. Is that Elixir code?
> Nope, Smalltalk. I wanted to use rational arithmetic. In fact I did
> not need to. Here it is in Haskell:
> > [(x * 256) `div` 15 | x <- [0..15]]
> Let's push that a bit further. Let's generate all possible 10-bit
> integers and map them to the range [0..63]. We find again that
> the gap sizes are not all the same. They can't be. If you
> consider all vectors of N bits and map them to the range
> [0..2**M] they *cannot* be uniformly distributed no matter what
> method you use because (2**M+1) does not divide 2**N. You can
> fix this by rejecting some of the bit vectors, but that would
> be asking everyone to pay extra for a result they don't have any
> particular need for.
I see, but do not quite understand what you are getting at.
The current left-closed float generator starts with 58 random bits and
puts 53 of these into the mantissa in an IEEE 754 double binary float,
so that would not be it.
I guess it is a generator for the closed interval [0.0,1.0] or the open
(0.0,1.0) you talk about. If so:
This one-liner generates over [0.0,1.0]:
(rand:uniform((1 bsl 53) + 1) -1) * math:pow(2, -53)
and it uses an integer range R = (2^53 + 1), which is not dividable by 2.
The implementation for that range will generate a 58 bit number and then
check if the number is 0 =< X < R*31 and if so return the number mod R
(31 repetitions of the range is all that fits completely in 58 bits).
If the generated number is R*31 =< X that is in the top truncated interval
then we start over with a new number.
The implementation may in theory retry forever before finding a good
number, but the odds for retry is about 1/32 for each round so the
accumulated time is about 32/31 times one round. And only once in a million
it will take 4 attempts or more.
I discussed a different implementation with Prof. Vigna that is to always
generate one word too much and then use mod R on that which would bury the
skew in one word of random bits hence the difference in probability between
generated numbers should be about (2^58 - 1)/2^58, which would take quite
some effort to measure with statistical significance. But he considered
that as a bad idea since it would get half the speed for most ranges.
So this is an already solved problem, as I see it.
We *can* write efficient and good generators for open, closed and
half-closed intervals, if we want.
So far I have only seen the need for a (0.0,1.0] generator, which can be
1.0 - rand:uniform()
but in some applications such as 1.0/X and math:log(X) I think that the form
N * 2^-53 might be less than optimal, so I have a new suggestion:
This variant never returns exactly 0.0 and have better precision for low
values. Comments? Especially about the name.
And so far I have not seen any actual need for (0.0,1.0) nor [0.0,1.0].
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
More information about the erlang-questions