[erlang-questions] On OTP rand module difference between OTP 19 and OTP 20
Fri Sep 1 07:29:34 CEST 2017
On 08/31/2017 07:57 PM, Richard A. O'Keefe wrote:
> On 1/09/17 6:35 AM, Michael Truog wrote:
>> As I argued in the original pull request for these recent 20.0 random
>> number changes, a uniform distribution is much more intuitive if it is
>> inclusive: [0,1]
> Intuitive? For integers, I'll grant that. For reals? Not so much.
> I certainly cannot grant "MUCH more intuitive".
> I've had occasion to do (random next * 2 - 1) arcTanh, which of
> course breaks down if *either* 0 or 1 is returned by (random next).
A uniform distribution should be uniformly distributed. I understand the woes of floating-point prevent perfect uniform distribution, but we could at least try to pay attention to the limits involved, and if we did, that would make the idea much more intuitive.
My belief is that the [0,1) distribution is the most common because it is the easiest to implement given the IEEE floating point standard format. However, I would also like to be proven wrong, to have more faith in the current situation.
>> For example, if you are dealing with probabilities, it is simpler to
>> think in percentages from 0.00 to 1.00
> Actually, when I'm dealing with probabilities, I never think
> about them as percentages. Now the interesting thing here
> is this. Suppose you want to get a true [false] outcome
> with probability p [1-p]. Then random next < p does the
> job perfectly, but ONLY if 1 is excluded.
I see this as much simpler when it is possible to have random =< p , not that it matters much in this context, only when things get more complex.
> The trick of generating a random integer from 1 to N by
> doing (in C): (int)(random() * N) + 1 can of course give
> you N+1 if random() can return 1.0, and this is a thing I very
> often do. (Yes, if 0.0 is excluded, the probability of getting
> 1 is very slightly skewed, but it's _very_ slightly.)
>> An example from the python documentation is at
>> https://docs.python.org/3/library/random.html#random.uniform though they
>> have ambiguity about the highest value due to a rounding problem they have.
> Oh, the bit where they say "The end-point value b may or may not be
> included in the range." Worst of both worlds. You cannot rely on it
> being included and you cannot rely on it being excluded.
> Let's face it, the usual expectation is that a uniform random number
> generator will return a value in the half-open range [0,1).
> I have uses for (0.0, 1.0).
> Michael Truog has uses for [0.0,1.0], although I wasn't able to tell
> from a quick scan of his code what they are.
I have some examples that can make this desire a bit clearer:
% use Box-Muller transformation to generate Gaussian noise
% (G. E. P. Box and Mervin E. Muller,
% A Note on the Generation of Random Normal Deviates,
% The Annals of Mathematical Statistics (1958),
% Vol. 29, No. 2 pp. 610–611)
X1 = random(),
X2 = PI2 * random(),
K = StdDev * math:sqrt(-2.0 * math:log(X1)),
Result1 = erlang:max(erlang:round(Mean + K * math:cos(X2)), 1),
Result2 = erlang:max(erlang:round(Mean + K * math:sin(X2)), 1),
X = random(),
X =< Percent ->
These are code segments used for the CloudI service configuration options monkey_latency and monkey_chaos so that normal distribution latency values and random service deaths can occur, respectively (with the more common names as Latency Monkey and Chaos Monkey, but the words switched to make the concepts easier to find and associate). For the Box-Muller transformation, it really does want a definite range [0,1] and it helps make the monkey_chaos service death easier to understand at a glance.
> I could personally live with a warning in the documentation that says
> that the random number generator could return 0.0, and here's a little
> loop you might use to avoid that, and another suggestion in the code
> about how to get the result Michael Truog wants.
> I just want it to be obvious that it's dangerous to assume that the
> result will not be 0.
> By the way, given that a common way to make random floats is to
> generate a bitvector, consider
> (0 to: 15) collect: [:each | ((each / 15) * 256) truncated].
> You will notice that the spacing between the values is *almost*
> uniform, but not at the end.
I agree, but I still think the use of the word uniform here is better suited to the extremes. We know it is IEEE floating-point, so we know it is inexact.
More information about the erlang-questions