Faster bounded random integers

Raimo Niskanen raimo+erlang-questions@REDACTED
Thu Sep 16 19:42:00 CEST 2021

```I benchmarked a naïve implementation of Lemire's algorithm,
as I understand it:

exsss2_uniform(Range, AlgHandler, R0) ->
{V, R} = exsss_next(R0), % generates 58 bits
W = (V bsr 29) * Range,
if
exsss2_uniform(Range, AlgHandler, R);
true ->
{(W bsr 29) + 1, {AlgHandler, R}}
end.

To avoid bignum operations it only works for Range < ?BIT(29)
and uses the discard-and-reiterate approach to avoid bias.

I ran rand_SUITE:measure/1 5 times and for range 10000 the
algorithm above performed an execution time of 87, 97, 83, 94, 103 %
of the current range implementation.  So 93 % on average with quite
a spread.  That is only a 7 % improvement, on average.

I also tried to entirely remove the bias avoidance and then
got the numbers: 91, 107, 92, 91, 80 - averages to 92 %, so bias
avoidance does not seem to be a big performance problem.

Maybe it is incorrect, can be improved, performs better
for smaller ranges, the benchmark is bad, etc...

But so far that does not look very promising.

Cheers
/ Raimo

On Thu, Sep 16, 2021 at 10:15:28AM +0200, Raimo Niskanen wrote:
> It is an interesting approach, but the question is how it would work
> or be made to work for Erlang...
>
> The range algorithm in rand.erl seems to be what Lemire's paper calls
> "The Java Approach", I agree.
>
> Lemire's "Avoiding Division" approach avoids integer division and instead
> uses multiplication of the random number and the range and bit shifts the
> result, assuming multiplication and bit shift are significantly faster
> operations than integer division.
>
> The default PRNG:s in the 'rand' module generate 58 bit integers.  This
> size is carefully selected to avoid creating bignums since they are much
> slower to operate on.  You can add 3 58 bit integers to get a 59 bit,
> and it is still not a bignum.  A 60 bit integer is a bignum.  All this
> assuming a 64 bit VM with the current term tag scheme.
>
> So, multiplying a generated integer of 58 bit with any range larger than 3
> would create a bignum, and a subsequent bit shift left of 58 bits would
> have to operate on a bignum.  I reckon that would be a significant
> performance penalty.
>
> To remove the bias, the fastest I can think of right now would be to
> mask out the low 58 bits, compare if it is a biased number, and if so
> reject and reiterate.  Or generate more bits and do a new bignum
> multiplication and shift.
>
> For ranges larger than 58 bits you are in bignum land to start with,
> so Lemire's approach suggests that we could just generate one extra
> 58-bit random word, multiply, shift and say the bias is buried.
> This might be simpler and faster than the current approach.
> An interesting idea.
>
> I think the "Avoiding Division" approach might be efficient when
> the range is less than 26 bit because then we could use only 26
> bits from the random number and keep the multiplication and shifts
> in smallnum integers.  This could be interesting to try out.
>
> Those are my first impressions and guesses.  They are not the truth.
> The truth has to be measured...
>
> Also, if we should come up with a new way of generating integers in
> a range, since they would produce new sequences of integers for
> the same seed; we have to decide to create new algorithm names
> - duplicating all existing.  Either that or introduce new
> API functions besides rand:uniform/1 and rand:uniform_s/2.
>
> So the performance gain would have to merit such a change...
>
> Cheers
> / Raimo Niskanen
>
>
>
> On Tue, Sep 14, 2021 at 10:40:29AM +0200, Thomas Depierre wrote:
> > In theory, yes. You would just need to generate more bits of randomness to
> > have enough entropy.
> >
> > There is a catch though. If you begin to need so much entropy that
> > generating that is far slower than divisions, then you should look at
> > another approach. The Lemire paper offers a reference to other approaches
> > more suited to these through less well known and mostly forgotten
> > algorithms from the 70s.
> > This particular paper is long (60 pages) and is not super applicable to my
> > needs, so I have not taken the time to dive in details yet.
> >
> > In particular, you would need to have generated enough random bits to be
> > slower than the big int divisions you need to do. That is probably not
> > happening under any realistic integer range we would deal with on the BEAM
> > imho.
> >
> > Thomas Depierre
> > Schedule a meeting with me <https://calendly.com/thomas-depierre/30min>
> >
> >
> > On Tue, 14 Sept 2021 at 02:29, Richard O'Keefe <raoknz@REDACTED> wrote:
> >
> > > I am grateful for these references.  Not having had time to read them yet,
> > > I wonder if this applies equally well to large integers?
> > >
> > > On Tue, 14 Sept 2021 at 12:28, Richard O'Keefe <raoknz@REDACTED> wrote:
> > > >
> > > > On Tue, 14 Sept 2021 at 03:51, Thomas Depierre
> > > > <depierre.thomas@REDACTED> wrote:
> > > > >
> > > > > Generating an unbiased random integer in a range 0..N where N is not a
> > > power of two requires rejection sampling.
> > > > >
> > > > > `rand` uses the "Java" algorithm for this as far as i can tell (could
> > > not find a better name).
> > > https://github.com/erlang/otp/blob/master/lib/stdlib/src/rand.erl#L150
> > > > >
> > > > > Daniel Lemire presents a good analysis of the current field in
> > > https://arxiv.org/abs/1805.10941 and then propose a "near divisionless"
> > > algorithm that should provide speedup. This could already provide
> > > interesting results.
> > > > >
> > > > > A few days ago, the Swift team came up with an algorithm that:
> > > > > - never divides
> > > > > - avoids rejection sampling entirely
> > > > > - achieves a theoretically optimal bound on the amount of randomness
> > > consumed to generate a sample
> > > > > - delivers actual performance improvements for most real cases
> > > > >
> > > > > https://github.com/apple/swift/pull/39143
> > > > >
> > > > > Am I understanding this right ? As far as i can tell, our expert is
> > > Raimo Niskanen.
> > > > >
> > > > > The EEF would be interested in funding me trying to explore the
> > > potential gains and possibly implement it for `rand`. Would this be of
> > > interest ? I expect the maintenance to be relatively low, this is not a lot
> > > of SLOC
> > > > >
> > > > > Thomas Depierre
> > > > > Schedule a meeting with me
> > >
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
```