<div dir="ltr">Raimo and all:<br><br>I got late to follow the thread.<br><br>I think the new function name should be<br>rand:uniform_non_zero/{1,2}<br>because I've rarely seen somethingUPPERCASE<br>names in Erlang functions.<br>(I might be wrong.)<br><br>On ranges:<br><br>I'm concerned about how OTP pre-20 (i.e., <= OTP 19.3.x) rand:uniform/1 code<br>might crash or cause bugs when running on the OTP 20 implementation.<br>At least how to write the OTP pre-20-equivalent code should be documented.<br><br>I have no firm idea on what should be the default behavior on ranges<br>and whether the borders should be inclusive/exclusive to the limit values.<br>In fact, the behavior differs between languages and implementations.<br>Some examples I've found follow:<br><br>JavaScript math.random(): [0.0, 1.0)<br><a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random">https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random</a><br><br>Python random.uniform(a, b): [a, b] (when a <= b)<br><a href="https://stackoverflow.com/questions/6088077/how-to-get-a-random-number-between-a-float-range">https://stackoverflow.com/questions/6088077/how-to-get-a-random-number-between-a-float-range</a><br><br>C++ std::uniform_real_distribution<> gen(a, b): [a, b)<br><a href="http://en.cppreference.com/w/cpp/numeric/random/uniform_real_distribution">http://en.cppreference.com/w/cpp/numeric/random/uniform_real_distribution</a><br><br>Ruby 2.4.1 Random class: rand(max): [0.0, max)<br><a href="https://ruby-doc.org/core-2.4.1/Random.html">https://ruby-doc.org/core-2.4.1/Random.html</a><br>"When max is a Float, rand returns a random floating point number<div> between 0.0 and max, including 0.0 and excluding max."<br><br>R runif(min=0.0, max=1.0): [0.0, 1.0] (See Note)<br><a href="https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Uniform.html">https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Uniform.html</a><br>Note: "runif will not generate either of the extreme values</div><div> unless max = min or max-min is small compared to min,</div><div> and in particular not for the default arguments."<br><div><br>MySQL 5.7 RAND(): [0.0, 1.0)<br><a href="https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand">https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand</a><br><br>PostgreSQL 10 random(): [0.0, 1.0)<br><a href="https://www.postgresql.org/docs/10/static/functions-math.html#functions-math-random-table">https://www.postgresql.org/docs/10/static/functions-math.html#functions-math-random-table</a><br><br>MS SQL Server: (0.0, 1.0)<br><a href="https://docs.microsoft.com/en-us/sql/t-sql/functions/rand-transact-sql">https://docs.microsoft.com/en-us/sql/t-sql/functions/rand-transact-sql</a><br>"Returns a pseudo-random float value from 0 through 1, exclusive."<div><br></div><div>dSFMT: "[1, 2), [0, 1), (0, 1] and (0, 1)" <br><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/#dSFMT">http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/#dSFMT</a><br>(dSFMT: SFMT PRNG implementation for double-precision floating point only)<br><br>It took me an hour to investigate this.</div><div>Lesson learned: don't take the range definitions for granted.<br><br>Regards,<br>Kenji Rikitake</div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 4, 2017 at 6:49 PM, Raimo Niskanen <span dir="ltr"><<a href="mailto:raimo+erlang-questions@erix.ericsson.se" target="_blank">raimo+erlang-questions@erix.ericsson.se</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Mon, Sep 04, 2017 at 12:37:50PM +1200, Richard A. O'Keefe wrote:<br>
><br>
><br>
> On 1/09/17 8:49 PM, Raimo Niskanen wrote:<br>
> >> By the way, given that a common way to make random floats is to<br>
> >> generate a bitvector, consider<br>
> >> (0 to: 15) collect: [:each | ((each / 15) * 256) truncated].<br>
> >> You will notice that the spacing between the values is *almost*<br>
> >> uniform, but not at the end.<br>
> ><br>
> > That sounds interesting but I do not understand. Is that Elixir code?<br>
><br>
> Nope, Smalltalk. I wanted to use rational arithmetic. In fact I did<br>
> not need to. Here it is in Haskell:<br>
> > [(x * 256) `div` 15 | x <- [0..15]]<br>
> [0,17,34,51,68,85,102,119,136,<wbr>153,170,187,204,221,238,256]<br>
><br>
> Let's push that a bit further. Let's generate all possible 10-bit<br>
> integers and map them to the range [0..63]. We find again that<br>
> the gap sizes are not all the same. They can't be. If you<br>
> consider all vectors of N bits and map them to the range<br>
> [0..2**M] they *cannot* be uniformly distributed no matter what<br>
> method you use because (2**M+1) does not divide 2**N. You can<br>
> fix this by rejecting some of the bit vectors, but that would<br>
> be asking everyone to pay extra for a result they don't have any<br>
> particular need for.<br>
><br>
<br>
I see, but do not quite understand what you are getting at.<br>
<br>
The current left-closed float generator starts with 58 random bits and<br>
puts 53 of these into the mantissa in an IEEE 754 double binary float,<br>
so that would not be it.<br>
<br>
I guess it is a generator for the closed interval [0.0,1.0] or the open<br>
(0.0,1.0) you talk about. If so:<br>
<br>
This one-liner generates over [0.0,1.0]:<br>
(rand:uniform((1 bsl 53) + 1) -1) * math:pow(2, -53)<br>
and it uses an integer range R = (2^53 + 1), which is not dividable by 2.<br>
<br>
The implementation for that range will generate a 58 bit number and then<br>
check if the number is 0 =< X < R*31 and if so return the number mod R<br>
(31 repetitions of the range is all that fits completely in 58 bits).<br>
<br>
If the generated number is R*31 =< X that is in the top truncated interval<br>
then we start over with a new number.<br>
<br>
The implementation may in theory retry forever before finding a good<br>
number, but the odds for retry is about 1/32 for each round so the<br>
accumulated time is about 32/31 times one round. And only once in a million<br>
it will take 4 attempts or more.<br>
<br>
I discussed a different implementation with Prof. Vigna that is to always<br>
generate one word too much and then use mod R on that which would bury the<br>
skew in one word of random bits hence the difference in probability between<br>
generated numbers should be about (2^58 - 1)/2^58, which would take quite<br>
some effort to measure with statistical significance. But he considered<br>
that as a bad idea since it would get half the speed for most ranges.<br>
<br>
So this is an already solved problem, as I see it.<br>
<br>
We *can* write efficient and good generators for open, closed and<br>
half-closed intervals, if we want.<br>
<br>
So far I have only seen the need for a (0.0,1.0] generator, which can be<br>
implemented with:<br>
1.0 - rand:uniform()<br>
but in some applications such as 1.0/X and math:log(X) I think that the form<br>
N * 2^-53 might be less than optimal, so I have a new suggestion:<br>
<br>
<a href="https://github.com/erlang/otp/compare/OTP-20.0...RaimoNiskanen:raimo/stdlib/rand-uniformNZ" rel="noreferrer" target="_blank">https://github.com/erlang/otp/<wbr>compare/OTP-20.0...<wbr>RaimoNiskanen:raimo/stdlib/<wbr>rand-uniformNZ</a><br>
<br>
This variant never returns exactly 0.0 and have better precision for low<br>
values. Comments? Especially about the name.<br>
<br>
And so far I have not seen any actual need for (0.0,1.0) nor [0.0,1.0].<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
<br>
/ Raimo Niskanen, Erlang/OTP, Ericsson AB<br>
______________________________<wbr>_________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" rel="noreferrer" target="_blank">http://erlang.org/mailman/<wbr>listinfo/erlang-questions</a><br>
</font></span></blockquote></div><br></div>