[erlang-questions] On OTP rand module difference between OTP 19 and OTP 20

Fri Sep 1 11:36:37 CEST 2017

On Fri, Sep 01, 2017 at 10:41:15AM +0200, Raimo Niskanen wrote:
> On Thu, Aug 31, 2017 at 10:29:34PM -0700, Michael Truog wrote:
> > On 08/31/2017 07:57 PM, Richard A. O'Keefe wrote:
> > >
> > > On 1/09/17 6:35 AM, Michael Truog wrote:
> > >> As I argued in the original pull request for these recent 20.0 random
> > >> number changes, a uniform distribution is much more intuitive if it is
> > >> inclusive: [0,1]
> > >
> > > Intuitive?  For integers, I'll grant that.  For reals?  Not so much.
> > > I certainly cannot grant "MUCH more intuitive".
> > > I've had occasion to do (random next * 2 - 1) arcTanh, which of
> > > course breaks down if *either* 0 or 1 is returned by (random next).
> > 
> > A uniform distribution should be uniformly distributed.  I understand the woes of floating-point prevent perfect uniform distribution, but we could at least try to pay attention to the limits involved, and if we did, that would make the idea much more intuitive.
> 
> If I try to be philosophical, picking a random number in the range
> 0.0 to 1.0 of real numbers, the probability of getting a number exactly 0.0
> (or exactly 1.0) is infinitely low.  Therefore the range (0.0,1.0) is more
> natural.

   Mathematically, there's a distinction. What you've just described
is that in a random variable over the interval [0.0, 1.0], 0.0 and 1.0
happen *almost never* (which is a very specific technical term), and
that values in the open interval (0.0, 1.0) occur *almost surely*.

   Being discrete, the computer implementation based on floating point
numbers ensures that the probability of getting 0.0 or 1.0 in that
case is measurably non-zero, whereas in the ideal version over the
reals, above, it is infinitesimally small. In that distinction lie
most of the problems that people are talking about here, I think.

> > My belief is that the [0,1) distribution is the most common
> > because it is the easiest to implement given the IEEE floating
> > point standard format.  However, I would also like to be proven
> > wrong, to have more faith in the current situation.

> I think that is very possible.

   From my relatively limited practical experience, either I've wanted
[0, 1) or I don't care. Example:

   Red = int(random() * 256)

where I don't want the value 256, because it's out of range for my
8-bit graphics mode, but I do want the probability of 255 to be the
same as every other value. So I want [0, 1) as my range.

   Alternatively:

   P = random(),
   if
      P =< 0.3 -> ...;
      P =< 0.7 -> ...;
      P > 0.7 -> ...
   end

where, in general, I don't care if I could get 0.0 or 1.0 or not,
because the differences are immeasurably small for all practical
purposes.

   I think it's clear to me that _several_ functions are needed for
different cases, with fully-closed, fully-open and half-open
intervals. IMO, the fully-closed and half-open are probably the most
useful (and, modulo any floating-point issues which I'm not qualified
to talk about, [0,1) can be turned into (0,1] with
1-random_halfopen()).

   With a fully-closed interval, it should be possible to write
helpers for generating the other three by simply calling
random_closed() again if you get an undesirable end-point. You can't
easily extend the range of the half-open or open intervals to give you
the closed ones. So I'd say at minimum, there should be a function
giving the closed interval.

    Whether the "test and retry" approach is the best implementation
or not is a matter for discussion, as is the question of whether all
or some of these functions should be in the standard lib, or they are
expected to be hacked together on an as-needed basis.

   Hugo.

-- 
Hugo Mills             | Anyone who says their system is completely secure
hugo@REDACTED carfax.org.uk | understands neither systems nor security.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                        Bruce Schneier
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170901/644c8fcb/attachment.bin>