[erlang-questions] On OTP rand module difference between OTP 19 and OTP 20
Wed Sep 6 14:30:17 CEST 2017
Raimo and all:
I got late to follow the thread.
I think the new function name should be
because I've rarely seen somethingUPPERCASE
names in Erlang functions.
(I might be wrong.)
I'm concerned about how OTP pre-20 (i.e., <= OTP 19.3.x) rand:uniform/1 code
might crash or cause bugs when running on the OTP 20 implementation.
At least how to write the OTP pre-20-equivalent code should be documented.
I have no firm idea on what should be the default behavior on ranges
and whether the borders should be inclusive/exclusive to the limit values.
In fact, the behavior differs between languages and implementations.
Some examples I've found follow:
Python random.uniform(a, b): [a, b] (when a <= b)
C++ std::uniform_real_distribution<> gen(a, b): [a, b)
Ruby 2.4.1 Random class: rand(max): [0.0, max)
"When max is a Float, rand returns a random floating point number
between 0.0 and max, including 0.0 and excluding max."
R runif(min=0.0, max=1.0): [0.0, 1.0] (See Note)
Note: "runif will not generate either of the extreme values
unless max = min or max-min is small compared to min,
and in particular not for the default arguments."
MySQL 5.7 RAND(): [0.0, 1.0)
PostgreSQL 10 random(): [0.0, 1.0)
MS SQL Server: (0.0, 1.0)
"Returns a pseudo-random float value from 0 through 1, exclusive."
dSFMT: "[1, 2), [0, 1), (0, 1] and (0, 1)"
(dSFMT: SFMT PRNG implementation for double-precision floating point only)
It took me an hour to investigate this.
Lesson learned: don't take the range definitions for granted.
On Mon, Sep 4, 2017 at 6:49 PM, Raimo Niskanen <
> On Mon, Sep 04, 2017 at 12:37:50PM +1200, Richard A. O'Keefe wrote:
> > On 1/09/17 8:49 PM, Raimo Niskanen wrote:
> > >> By the way, given that a common way to make random floats is to
> > >> generate a bitvector, consider
> > >> (0 to: 15) collect: [:each | ((each / 15) * 256) truncated].
> > >> You will notice that the spacing between the values is *almost*
> > >> uniform, but not at the end.
> > >
> > > That sounds interesting but I do not understand. Is that Elixir code?
> > Nope, Smalltalk. I wanted to use rational arithmetic. In fact I did
> > not need to. Here it is in Haskell:
> > > [(x * 256) `div` 15 | x <- [0..15]]
> > [0,17,34,51,68,85,102,119,136,153,170,187,204,221,238,256]
> > Let's push that a bit further. Let's generate all possible 10-bit
> > integers and map them to the range [0..63]. We find again that
> > the gap sizes are not all the same. They can't be. If you
> > consider all vectors of N bits and map them to the range
> > [0..2**M] they *cannot* be uniformly distributed no matter what
> > method you use because (2**M+1) does not divide 2**N. You can
> > fix this by rejecting some of the bit vectors, but that would
> > be asking everyone to pay extra for a result they don't have any
> > particular need for.
> I see, but do not quite understand what you are getting at.
> The current left-closed float generator starts with 58 random bits and
> puts 53 of these into the mantissa in an IEEE 754 double binary float,
> so that would not be it.
> I guess it is a generator for the closed interval [0.0,1.0] or the open
> (0.0,1.0) you talk about. If so:
> This one-liner generates over [0.0,1.0]:
> (rand:uniform((1 bsl 53) + 1) -1) * math:pow(2, -53)
> and it uses an integer range R = (2^53 + 1), which is not dividable by 2.
> The implementation for that range will generate a 58 bit number and then
> check if the number is 0 =< X < R*31 and if so return the number mod R
> (31 repetitions of the range is all that fits completely in 58 bits).
> If the generated number is R*31 =< X that is in the top truncated interval
> then we start over with a new number.
> The implementation may in theory retry forever before finding a good
> number, but the odds for retry is about 1/32 for each round so the
> accumulated time is about 32/31 times one round. And only once in a
> it will take 4 attempts or more.
> I discussed a different implementation with Prof. Vigna that is to always
> generate one word too much and then use mod R on that which would bury the
> skew in one word of random bits hence the difference in probability between
> generated numbers should be about (2^58 - 1)/2^58, which would take quite
> some effort to measure with statistical significance. But he considered
> that as a bad idea since it would get half the speed for most ranges.
> So this is an already solved problem, as I see it.
> We *can* write efficient and good generators for open, closed and
> half-closed intervals, if we want.
> So far I have only seen the need for a (0.0,1.0] generator, which can be
> implemented with:
> 1.0 - rand:uniform()
> but in some applications such as 1.0/X and math:log(X) I think that the
> N * 2^-53 might be less than optimal, so I have a new suggestion:
> This variant never returns exactly 0.0 and have better precision for low
> values. Comments? Especially about the name.
> And so far I have not seen any actual need for (0.0,1.0) nor [0.0,1.0].
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> erlang-questions mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions