[erlang-questions] On OTP rand module difference between OTP 19 and OTP 20

Fri Sep 1 10:41:15 CEST 2017

On Thu, Aug 31, 2017 at 10:29:34PM -0700, Michael Truog wrote:
> On 08/31/2017 07:57 PM, Richard A. O'Keefe wrote:
> >
> > On 1/09/17 6:35 AM, Michael Truog wrote:
> >> As I argued in the original pull request for these recent 20.0 random
> >> number changes, a uniform distribution is much more intuitive if it is
> >> inclusive: [0,1]
> >
> > Intuitive?  For integers, I'll grant that.  For reals?  Not so much.
> > I certainly cannot grant "MUCH more intuitive".
> > I've had occasion to do (random next * 2 - 1) arcTanh, which of
> > course breaks down if *either* 0 or 1 is returned by (random next).
> 
> A uniform distribution should be uniformly distributed.  I understand the woes of floating-point prevent perfect uniform distribution, but we could at least try to pay attention to the limits involved, and if we did, that would make the idea much more intuitive.

If I try to be philosophical, picking a random number in the range
0.0 to 1.0 of real numbers, the probability of getting a number exactly 0.0
(or exactly 1.0) is infinitely low.  Therefore the range (0.0,1.0) is more
natural.

> 
> My belief is that the [0,1) distribution is the most common because it is the easiest to implement given the IEEE floating point standard format.  However, I would also like to be proven wrong, to have more faith in the current situation.

I think that is very possible.

We can not forget the fact that digital floating point numbers will always
be some kind of integer values in disguise.

:
> 
> I have some examples that can make this desire a bit clearer:
> 
> https://github.com/CloudI/cloudi_core/blob/a1c10a02245f0f4284d701a2ee5f07aad17f6e51/src/cloudi_core_i_runtime_testing.erl#L139-L149
> 
>      % use Box-Muller transformation to generate Gaussian noise
>      % (G. E. P. Box and Mervin E. Muller,
>      %  A Note on the Generation of Random Normal Deviates,
>      %  The Annals of Mathematical Statistics (1958),
>      %  Vol. 29, No. 2 pp. 610–611)
>      X1 = random(),
>      X2 = PI2 * random(),
>      K = StdDev * math:sqrt(-2.0 * math:log(X1)),

math:log(X1) will badarith if X1 =:= 0.0.  You need a generator for X1
that does not return 0.0, just as RO'K says.

>      Result1 = erlang:max(erlang:round(Mean + K * math:cos(X2)), 1),
>      Result2 = erlang:max(erlang:round(Mean + K * math:sin(X2)), 1),

If random() for X2 is in [0.0,1.0] then both 0.0 and 1.0 will produce the
same value after math:cos(X2) or math:sin(X2), which I am convinced will
bias the result since that particular value will have twice the probability
compared to all other values.  I think you should use a generator for X2
that only can return one of the endpoints.

Actually, it seems a generator for (0.0,1.0] would be more appropriate
here...

>      sleep(Result2),
> 
> 
> https://github.com/CloudI/cloudi_core/blob/a1c10a02245f0f4284d701a2ee5f07aad17f6e51/src/cloudi_core_i_runtime_testing.erl#L204-L210
> 
>      X = random(),
>      if
>          X =< Percent ->
>              erlang:exit(monkey_chaos);
>          true ->
>              ok
>      end,

In this kind of code, I think that (when thinking integers, since we are
talking about integers in disguise) half open intervals are more correct.

The interval [0.0,0.1] contains say N+1 numbers, the interval [0.0,0.2]
contains 2*N+1 nubers so subtracting the first interval from the second
would get the interval (1.0,2.0) which have N numbers.  So you get a bias
because you include both endpoints.

In this case I believe more in a generator that gives [0.0,1.0) and the
test X < Percent, since that is what I would have written using integers to
avoid off-by-one errors.

> 
> with:
> random() ->
>      quickrand:strong_float().
> 
> These are code segments used for the CloudI service configuration options monkey_latency and monkey_chaos so that normal distribution latency values and random service deaths can occur, respectively (with the more common names as Latency Monkey and Chaos Monkey, but the words switched to make the concepts easier to find and associate).  For the Box-Muller transformation, it really does want a definite range [0,1] and it helps make the monkey_chaos service death easier to understand at a glance.

Please explain why the Box-Muller transformation needs a definite range
[0.0,1.0].

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB