[erlang-questions] On OTP rand module difference between OTP 19 and OTP 20

Wed Sep 6 14:30:17 CEST 2017

Raimo and all:

I got late to follow the thread.

I think the new function name should be
rand:uniform_non_zero/{1,2}
because I've rarely seen somethingUPPERCASE
names in Erlang functions.
(I might be wrong.)

On ranges:

I'm concerned about how OTP pre-20 (i.e., <= OTP 19.3.x) rand:uniform/1 code
might crash or cause bugs when running on the OTP 20 implementation.
At least how to write the OTP pre-20-equivalent code should be documented.

I have no firm idea on what should be the default behavior on ranges
and whether the borders should be inclusive/exclusive to the limit values.
In fact, the behavior differs between languages and implementations.
Some examples I've found follow:

JavaScript math.random(): [0.0, 1.0)
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random

Python random.uniform(a, b): [a, b] (when a <= b)
https://stackoverflow.com/questions/6088077/how-to-get-a-random-number-between-a-float-range

C++ std::uniform_real_distribution<> gen(a, b): [a, b)
http://en.cppreference.com/w/cpp/numeric/random/uniform_real_distribution

Ruby 2.4.1 Random class: rand(max): [0.0, max)
https://ruby-doc.org/core-2.4.1/Random.html
"When max is a Float, rand returns a random floating point number
 between 0.0 and max, including 0.0 and excluding max."

R runif(min=0.0, max=1.0): [0.0, 1.0] (See Note)
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Uniform.html
Note: "runif will not generate either of the extreme values
       unless max = min or max-min is small compared to min,
       and in particular not for the default arguments."

MySQL 5.7 RAND(): [0.0, 1.0)
https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand

PostgreSQL 10 random(): [0.0, 1.0)
https://www.postgresql.org/docs/10/static/functions-math.html#functions-math-random-table

MS SQL Server: (0.0, 1.0)
https://docs.microsoft.com/en-us/sql/t-sql/functions/rand-transact-sql
"Returns a pseudo-random float value from 0 through 1, exclusive."

dSFMT: "[1, 2), [0, 1), (0, 1] and (0, 1)"
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/#dSFMT
(dSFMT: SFMT PRNG implementation for double-precision floating point only)

It took me an hour to investigate this.
Lesson learned: don't take the range definitions for granted.

Regards,
Kenji Rikitake

On Mon, Sep 4, 2017 at 6:49 PM, Raimo Niskanen <
raimo+erlang-questions@REDACTED> wrote:

> On Mon, Sep 04, 2017 at 12:37:50PM +1200, Richard A. O'Keefe wrote:
> >
> >
> > On 1/09/17 8:49 PM, Raimo Niskanen wrote:
> > >> By the way, given that a common way to make random floats is to
> > >> generate a bitvector, consider
> > >> (0 to: 15) collect: [:each | ((each / 15) * 256) truncated].
> > >> You will notice that the spacing between the values is *almost*
> > >> uniform, but not at the end.
> > >
> > > That sounds interesting but I do not understand.  Is that Elixir code?
> >
> > Nope, Smalltalk.  I wanted to use rational arithmetic.  In fact I did
> > not need to.  Here it is in Haskell:
> >  > [(x * 256) `div` 15 | x <- [0..15]]
> > [0,17,34,51,68,85,102,119,136,153,170,187,204,221,238,256]
> >
> > Let's push that a bit further.  Let's generate all possible 10-bit
> > integers and map them to the range [0..63].  We find again that
> > the gap sizes are not all the same.  They can't be.  If you
> > consider all vectors of N bits and map them to the range
> > [0..2**M] they *cannot* be uniformly distributed no matter what
> > method you use because (2**M+1) does not divide 2**N.  You can
> > fix this by rejecting some of the bit vectors, but that would
> > be asking everyone to pay extra for a result they don't have any
> > particular need for.
> >
>
> I see, but do not quite understand what you are getting at.
>
> The current left-closed float generator starts with 58 random bits and
> puts 53 of these into the mantissa in an IEEE 754 double binary float,
> so that would not be it.
>
> I guess it is a generator for the closed interval [0.0,1.0] or the open
> (0.0,1.0) you talk about.  If so:
>
> This one-liner generates over [0.0,1.0]:
>     (rand:uniform((1 bsl 53) + 1) -1) * math:pow(2, -53)
> and it uses an integer range R = (2^53 + 1), which is not dividable by 2.
>
> The implementation for that range will generate a 58 bit number and then
> check if the number is 0 =< X < R*31 and if so return the number mod R
> (31 repetitions of the range is all that fits completely in 58 bits).
>
> If the generated number is R*31 =< X that is in the top truncated interval
> then we start over with a new number.
>
> The implementation may in theory retry forever before finding a good
> number, but the odds for retry is about 1/32 for each round so the
> accumulated time is about 32/31 times one round.  And only once in a
> million
> it will take 4 attempts or more.
>
> I discussed a different implementation with Prof. Vigna that is to always
> generate one word too much and then use mod R on that which would bury the
> skew in one word of random bits hence the difference in probability between
> generated numbers should be about (2^58 - 1)/2^58, which would take quite
> some effort to measure with statistical significance.  But he considered
> that as a bad idea since it would get half the speed for most ranges.
>
> So this is an already solved problem, as I see it.
>
> We *can* write efficient and good generators for open, closed and
> half-closed intervals, if we want.
>
> So far I have only seen the need for a (0.0,1.0] generator, which can be
> implemented with:
>     1.0 - rand:uniform()
> but in some applications such as 1.0/X and math:log(X) I think that the
> form
> N * 2^-53 might be less than optimal, so I have a new suggestion:
>
>     https://github.com/erlang/otp/compare/OTP-20.0...
> RaimoNiskanen:raimo/stdlib/rand-uniformNZ
>
> This variant never returns exactly 0.0 and have better precision for low
> values.  Comments?  Especially about the name.
>
> And so far I have not seen any actual need for (0.0,1.0) nor [0.0,1.0].
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170906/23452387/attachment.htm>