<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">On 09/01/2017 05:13 AM, Raimo Niskanen

      wrote:<br>

    </div>

    <blockquote cite="mid:20170901121346.GB11300@erix.ericsson.se"

      type="cite">

      <pre wrap="">On Fri, Sep 01, 2017 at 04:00:59AM -0700, Michael Truog wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">On 09/01/2017 01:54 AM, Raimo Niskanen wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">On Thu, Aug 31, 2017 at 10:29:34PM -0700, Michael Truog wrote:

:

</pre>

          <blockquote type="cite">

            <pre wrap="">I have some examples that can make this desire a bit clearer:

<a class="moz-txt-link-freetext" href="https://github.com/CloudI/cloudi_core/blob/a1c10a02245f0f4284d701a2ee5f07aad17f6e51/src/cloudi_core_i_runtime_testing.erl#L139-L149">https://github.com/CloudI/cloudi_core/blob/a1c10a02245f0f4284d701a2ee5f07aad17f6e51/src/cloudi_core_i_runtime_testing.erl#L139-L149</a>

      % use Box-Muller transformation to generate Gaussian noise

      % (G. E. P. Box and Mervin E. Muller,

      %  A Note on the Generation of Random Normal Deviates,

      %  The Annals of Mathematical Statistics (1958),

      %  Vol. 29, No. 2 pp. 610–611)

      X1 = random(),

      X2 = PI2 * random(),

      K = StdDev * math:sqrt(-2.0 * math:log(X1)),

      Result1 = erlang:max(erlang:round(Mean + K * math:cos(X2)), 1),

      Result2 = erlang:max(erlang:round(Mean + K * math:sin(X2)), 1),

      sleep(Result2),

</pre>

          </blockquote>

          <pre wrap="">Why not use rand:normal/3?

It uses the Ziggurat Method and is supposed to be much faster and

numerically more stable than the basic Box-Muller method.

</pre>

        </blockquote>

        <pre wrap="">The Box-Muller is simpler and producing 2 results instead of 1 .  I believe I looked at the source code for rand:normal/3 and expected the Box-Muller to be faster only because it creates 2 results, though I should check that.  I will have to investigate it more.

</pre>

      </blockquote>

      <pre wrap="">

Simpler - yes.

The basic benchmark in rand_SUITE indicates that rand:normal() is only

about 50% slower than rand:uniform(1 bsl 58) (internal word size),

which I think is a very good number.

The Box-Muller transform method needs 4 calls to the 'math' module for

non-trivial floating point functions i.e log(), sqrt(), cos() and sin(),

which is why I think that "must" be slower.

But I have also not measured... :-/

Looking forward to hear your results!

</pre>

    </blockquote>

    I have some interesting results.<br>

    <br>

    These results use <a class="moz-txt-link-freetext" href="https://github.com/okeuday/erlbench">https://github.com/okeuday/erlbench</a> which includes

    a copy of the source code at <a class="moz-txt-link-freetext" href="https://github.com/okeuday/quickrand">https://github.com/okeuday/quickrand</a> :<tt><br>

      <br>

      TEST pseudo_randomness<br>

      N == 10000 (10 runs)<br>

               18_bxor_abs get:     1612.7 us (  1.3)<br>

      18_erlang:system_tim get:     1254.1 us (  1.0)<br>

              18_monotonic get:     1372.5 us (  1.1)<br>

       18_os:system_time/1 get:     1221.7 us (  1.0)<br>

      19_os:perf_counter/1 get:     3752.2 us (  3.1)<br>

            20_rand:normal get:     6832.0 us (  5.6)<br>

             20_rand_exrop get:     3949.3 us (  3.2)<br>

          20_rand_exs1024s get:    12073.3 us (  9.9)<br>

              20_rand_exsp get:     3390.4 us (  2.8)<br>

            os:timestamp/0 get:     1392.3 us (  1.1)<br>

      os_time:perf_counter get:     4109.4 us (  3.4)<br>

      quickrand_c:floatR/0 get:     5776.0 us (  4.7)<br>

      quickrand_c:floatR/1 get:     5704.3 us (  4.7)<br>

         quickrand_c:uni/1 get:     4015.2 us (  3.3)<br>

         quickrand_c:uni/2 get:     3960.7 us (  3.2)<br>

      quickrand_c_normal/2 get:     9329.5 us (  7.6)<br>

      quickrand_c_normal/3 get:     8917.7 us (  7.3)<br>

      random_wh06_int:unif get:    10777.5 us (  8.8)<br>

      random_wh82:uniform/ get:     4750.0 us (  3.9)<br>

      random_wh82_int:unif get:     4866.4 us (  4.0)<br>

      <br>

    </tt>The function names that are relevant for a normal distribution

    are:<br>

    <tt>      20_rand:normal ->   rand:normal/0 (</tt><tt><tt>when

        using </tt><tt><span class="pl-en">rand</span>:<span

          class="pl-en">seed</span>(<span class="pl-c1">exsp</span>, _))<br>

      </tt>        20_rand_exsp ->   rand:uniform/1 (when using </tt><tt><span

        class="pl-en">rand</span>:<span class="pl-en">seed</span>(<span

        class="pl-c1">exsp</span>, _))<br>

      quickrand_c:floatR/0 ->   quickrand_cache:floatR/0<br>

      quickrand_c:floatR/1 </tt><tt><tt>->   quickrand_cache:floatR/1</tt></tt><tt><br>

      quickrand_c_normal/2 </tt><tt><tt>->   quickrand_cache_normal:box_muller/2</tt><br>

      quickrand_c_normal/3 </tt><tt><tt><tt>->  

          quickrand_cache_normal:box_muller/3</tt></tt><br>

      <br>

    </tt>The rand module exsp algorithm was used here because it is the

    fastest pseudo-random number generator in the rand module.<tt><br>

      <br>

    </tt>A rough look at the latency associated with the normal

    distribution method, ignoring the latency for random number source

    is:<br>

    <tt>rand:normal/0</tt><tt><tt><br>

          3441.6 us = </tt>6832.0 us - (</tt><tt>rand:uniform/1 </tt><tt>3390.4

      us) <br>

    </tt><tt><tt><tt>quickrand_cache_normal:box_muller/2<br>

            3553.5 us = </tt></tt></tt><tt>9329.5 us - (</tt><tt>quickrand_cache:floatR/0

    </tt><tt>5776.0 us)<br>

    </tt><tt><tt><tt><tt><tt><tt>quickrand_cache_normal:box_muller/3</tt></tt></tt></tt></tt></tt><br>

    <tt><tt><tt><tt><tt><tt><tt><tt><tt>  3213.4 us = </tt></tt></tt></tt></tt></tt></tt></tt></tt><tt><tt><tt><tt><tt><tt><tt><tt>8917.7</tt>

                  us - (</tt><tt>quickrand_cache:floatR/1 </tt></tt></tt></tt></tt></tt></tt><tt><tt><tt><tt><tt><tt><tt><tt>5704.3</tt></tt><tt>

                  us)<br>

                  <br>

                </tt></tt></tt></tt></tt></tt></tt>So, this helps to

    show that the latency with both methods is very similar if you

    ignore the random number generation.  However, it likely requires

    some explanation:  The quickrand_cache module is what I am using

    here for random number generation, which stores cached data from <span

      class="pl-en">crypto</span>:<span class="pl-en">strong_rand_bytes/1

      with a default size of 64KB for the cache.  The difference between

      the functions </span><span class="pl-en">quickrand_cache_normal:box_muller/2

      and </span><span class="pl-en">quickrand_cache_normal:box_muller/3

      is that the first uses the process dictionary while the second

      uses a state variable.  Using the large amount of cached random

      data, the latency associated with individual calls to </span><span

      class="pl-en"><span class="pl-en">crypto</span>:<span

        class="pl-en">strong_rand_bytes/1 is avoided at the cost of the

        extra memory consumption, and the use of the cache makes the

        speed of random number generation similar to the speed of

        pseudo-random number generation that occurs in the rand module.<br>

      </span></span><span class="pl-en"><span class="pl-en"><span

          class="pl-c1"><br>

        </span></span></span><span class="pl-en"><span class="pl-en">In

        CloudI, I instead use quickrand_normal:box_muller/2 to avoid the

        use of cached data to keep the memory use minimal (the use-case

        there doesn't require avoiding the latency associated with </span></span><span

      class="pl-en"><span class="pl-en"><span class="pl-en">crypto</span>:<span

          class="pl-en">strong_rand_bytes/1 because it is adding latency

          for testing (at

          <a class="moz-txt-link-freetext" href="https://github.com/CloudI/cloudi_core/blob/299df02e6d22103415c8ba14379e90ca8c3d3b82/src/cloudi_core_i_runtime_testing.erl#L138">https://github.com/CloudI/cloudi_core/blob/299df02e6d22103415c8ba14379e90ca8c3d3b82/src/cloudi_core_i_runtime_testing.erl#L138</a>)

          and it is best using a cryptographic random source to keep the

          functionality widely applicable).  However, the same function

          calls occur in the quickrand Box-Muller transformation source

          code, so the overhead is the same.<br>

        </span></span></span> <br>

    I used Erlang/OTP 20.0 (without HiPE) using the hardware below:<br>

    <code>Core i7 2670QM 2.2GHz 1 cpu, 4 cores/cpu, 2 hts/core<br>

      L2:4×256KB L3:6MB RAM:8GB:DDR3-1333MHz<br>

      Sandy Bridge-HE-4 (Socket G2)<br>

      <br>

      Best Regards,<br>

      Michael<br>

    </code>

  </body>

</html>