[erlang-questions] On OTP rand module difference between OTP 19 and OTP 20

Wed Sep 6 15:33:33 CEST 2017

On Wed, Sep 06, 2017 at 09:30:17PM +0900, Kenji Rikitake wrote:
> Raimo and all:
> 
> I got late to follow the thread.
> 
> I think the new function name should be
> rand:uniform_non_zero/{1,2}
> because I've rarely seen somethingUPPERCASE
> names in Erlang functions.
> (I might be wrong.)

At least it would be the first (only) in the rand module, so you surely
have a point.

Another possible name would be rand:uniform_right_closed/{1,2}, or
rand:uniform_left_open/{1,2} but the mathematical reference might be
a bit obscure.

> 
> On ranges:
> 
> I'm concerned about how OTP pre-20 (i.e., <= OTP 19.3.x) rand:uniform/1 code
> might crash or cause bugs when running on the OTP 20 implementation.
> At least how to write the OTP pre-20-equivalent code should be documented.

If you mean pre-20 code using rand:uniform/1 that later runs on 20.0 or
later, then I am not aware of any drastic changes, at least not crashes.

Code calling rand:seed/{1,2} will select the explicit algorithm from pre-20
and behave exactly as before.  Code not calling rand:seed/{1,2} will as
before get a random number sequence it has never seen before.

The same applies for code using rand:uniform/0 - it could pre-20 get
exactly 0.0 and the same on 20.0 and later.

You can, of course, only use the explicit algorithms that existed pre-20
when executing pre-20, and the same algorithms still exists but are not
documented on OTP 20.0 and should behave exactly the same.

So since there should be nothing more to writing pre-20 code than to read
the documentation for the release in question, and that such code should
work just as before on 20.0 and later, I do not think there should be any
need for documenting that.  This should be exactly as expected.

> 
> I have no firm idea on what should be the default behavior on ranges
> and whether the borders should be inclusive/exclusive to the limit values.
> In fact, the behavior differs between languages and implementations.
> Some examples I've found follow:
> 
> JavaScript math.random(): [0.0, 1.0)
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random
> 
> Python random.uniform(a, b): [a, b] (when a <= b)
> https://stackoverflow.com/questions/6088077/how-to-get-a-random-number-between-a-float-range

There is a note that the upper bound may depending on rounding not be
returned.  Really nasty.

> 
> C++ std::uniform_real_distribution<> gen(a, b): [a, b)
> http://en.cppreference.com/w/cpp/numeric/random/uniform_real_distribution
> 
> Ruby 2.4.1 Random class: rand(max): [0.0, max)
> https://ruby-doc.org/core-2.4.1/Random.html
> "When max is a Float, rand returns a random floating point number
>  between 0.0 and max, including 0.0 and excluding max."
> 
> R runif(min=0.0, max=1.0): [0.0, 1.0] (See Note)
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Uniform.html
> Note: "runif will not generate either of the extreme values
>        unless max = min or max-min is small compared to min,
>        and in particular not for the default arguments."

That is a really sloppy definition!

> 
> MySQL 5.7 RAND(): [0.0, 1.0)
> https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand
> 
> PostgreSQL 10 random(): [0.0, 1.0)
> https://www.postgresql.org/docs/10/static/functions-math.html#functions-math-random-table
> 
> MS SQL Server: (0.0, 1.0)
> https://docs.microsoft.com/en-us/sql/t-sql/functions/rand-transact-sql
> "Returns a pseudo-random float value from 0 through 1, exclusive."
> 
> dSFMT: "[1, 2), [0, 1), (0, 1] and (0, 1)"
> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/#dSFMT
> (dSFMT: SFMT PRNG implementation for double-precision floating point only)
> 
> It took me an hour to investigate this.
> Lesson learned: don't take the range definitions for granted.

You serched better than me and did find some that excludes 0.0.
Good to know!

It seems [0.0, 1.0) dominates but does not rule.

Best regards
/ Raimo

> 
> Regards,
> Kenji Rikitake
> 
> On Mon, Sep 4, 2017 at 6:49 PM, Raimo Niskanen <
> raimo+erlang-questions@REDACTED> wrote:
> 
> > On Mon, Sep 04, 2017 at 12:37:50PM +1200, Richard A. O'Keefe wrote:
> > >
> > >
> > > On 1/09/17 8:49 PM, Raimo Niskanen wrote:
> > > >> By the way, given that a common way to make random floats is to
> > > >> generate a bitvector, consider
> > > >> (0 to: 15) collect: [:each | ((each / 15) * 256) truncated].
> > > >> You will notice that the spacing between the values is *almost*
> > > >> uniform, but not at the end.
> > > >
> > > > That sounds interesting but I do not understand.  Is that Elixir code?
> > >
> > > Nope, Smalltalk.  I wanted to use rational arithmetic.  In fact I did
> > > not need to.  Here it is in Haskell:
> > >  > [(x * 256) `div` 15 | x <- [0..15]]
> > > [0,17,34,51,68,85,102,119,136,153,170,187,204,221,238,256]
> > >
> > > Let's push that a bit further.  Let's generate all possible 10-bit
> > > integers and map them to the range [0..63].  We find again that
> > > the gap sizes are not all the same.  They can't be.  If you
> > > consider all vectors of N bits and map them to the range
> > > [0..2**M] they *cannot* be uniformly distributed no matter what
> > > method you use because (2**M+1) does not divide 2**N.  You can
> > > fix this by rejecting some of the bit vectors, but that would
> > > be asking everyone to pay extra for a result they don't have any
> > > particular need for.
> > >
> >
> > I see, but do not quite understand what you are getting at.
> >
> > The current left-closed float generator starts with 58 random bits and
> > puts 53 of these into the mantissa in an IEEE 754 double binary float,
> > so that would not be it.
> >
> > I guess it is a generator for the closed interval [0.0,1.0] or the open
> > (0.0,1.0) you talk about.  If so:
> >
> > This one-liner generates over [0.0,1.0]:
> >     (rand:uniform((1 bsl 53) + 1) -1) * math:pow(2, -53)
> > and it uses an integer range R = (2^53 + 1), which is not dividable by 2.
> >
> > The implementation for that range will generate a 58 bit number and then
> > check if the number is 0 =< X < R*31 and if so return the number mod R
> > (31 repetitions of the range is all that fits completely in 58 bits).
> >
> > If the generated number is R*31 =< X that is in the top truncated interval
> > then we start over with a new number.
> >
> > The implementation may in theory retry forever before finding a good
> > number, but the odds for retry is about 1/32 for each round so the
> > accumulated time is about 32/31 times one round.  And only once in a
> > million
> > it will take 4 attempts or more.
> >
> > I discussed a different implementation with Prof. Vigna that is to always
> > generate one word too much and then use mod R on that which would bury the
> > skew in one word of random bits hence the difference in probability between
> > generated numbers should be about (2^58 - 1)/2^58, which would take quite
> > some effort to measure with statistical significance.  But he considered
> > that as a bad idea since it would get half the speed for most ranges.
> >
> > So this is an already solved problem, as I see it.
> >
> > We *can* write efficient and good generators for open, closed and
> > half-closed intervals, if we want.
> >
> > So far I have only seen the need for a (0.0,1.0] generator, which can be
> > implemented with:
> >     1.0 - rand:uniform()
> > but in some applications such as 1.0/X and math:log(X) I think that the
> > form
> > N * 2^-53 might be less than optimal, so I have a new suggestion:
> >
> >     https://github.com/erlang/otp/compare/OTP-20.0...
> > RaimoNiskanen:raimo/stdlib/rand-uniformNZ
> >
> > This variant never returns exactly 0.0 and have better precision for low
> > values.  Comments?  Especially about the name.
> >
> > And so far I have not seen any actual need for (0.0,1.0) nor [0.0,1.0].
> >
> > --
> >
> > / Raimo Niskanen, Erlang/OTP, Ericsson AB
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
> >

> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB