Faster bounded random integers

Thu Sep 16 10:15:28 CEST 2021

It is an interesting approach, but the question is how it would work
or be made to work for Erlang...

The range algorithm in rand.erl seems to be what Lemire's paper calls
"The Java Approach", I agree.

Lemire's "Avoiding Division" approach avoids integer division and instead
uses multiplication of the random number and the range and bit shifts the
result, assuming multiplication and bit shift are significantly faster
operations than integer division.

The default PRNG:s in the 'rand' module generate 58 bit integers.  This
size is carefully selected to avoid creating bignums since they are much
slower to operate on.  You can add 3 58 bit integers to get a 59 bit,
and it is still not a bignum.  A 60 bit integer is a bignum.  All this
assuming a 64 bit VM with the current term tag scheme.

So, multiplying a generated integer of 58 bit with any range larger than 3
would create a bignum, and a subsequent bit shift left of 58 bits would
have to operate on a bignum.  I reckon that would be a significant
performance penalty.

To remove the bias, the fastest I can think of right now would be to
mask out the low 58 bits, compare if it is a biased number, and if so
reject and reiterate.  Or generate more bits and do a new bignum
multiplication and shift.

For ranges larger than 58 bits you are in bignum land to start with,
so Lemire's approach suggests that we could just generate one extra
58-bit random word, multiply, shift and say the bias is buried.
This might be simpler and faster than the current approach.
An interesting idea.

I think the "Avoiding Division" approach might be efficient when
the range is less than 26 bit because then we could use only 26
bits from the random number and keep the multiplication and shifts
in smallnum integers.  This could be interesting to try out.

Those are my first impressions and guesses.  They are not the truth.
The truth has to be measured...

Also, if we should come up with a new way of generating integers in
a range, since they would produce new sequences of integers for
the same seed; we have to decide to create new algorithm names
- duplicating all existing.  Either that or introduce new
API functions besides rand:uniform/1 and rand:uniform_s/2.

So the performance gain would have to merit such a change...

Cheers
/ Raimo Niskanen

On Tue, Sep 14, 2021 at 10:40:29AM +0200, Thomas Depierre wrote:
> In theory, yes. You would just need to generate more bits of randomness to
> have enough entropy.
> 
> There is a catch though. If you begin to need so much entropy that
> generating that is far slower than divisions, then you should look at
> another approach. The Lemire paper offers a reference to other approaches
> more suited to these through less well known and mostly forgotten
> algorithms from the 70s.
> This particular paper is long (60 pages) and is not super applicable to my
> needs, so I have not taken the time to dive in details yet.
> 
> In particular, you would need to have generated enough random bits to be
> slower than the big int divisions you need to do. That is probably not
> happening under any realistic integer range we would deal with on the BEAM
> imho.
> 
> Thomas Depierre
> Schedule a meeting with me <https://calendly.com/thomas-depierre/30min>
> 
> 
> On Tue, 14 Sept 2021 at 02:29, Richard O'Keefe <raoknz@REDACTED> wrote:
> 
> > I am grateful for these references.  Not having had time to read them yet,
> > I wonder if this applies equally well to large integers?
> >
> > On Tue, 14 Sept 2021 at 12:28, Richard O'Keefe <raoknz@REDACTED> wrote:
> > >
> > > On Tue, 14 Sept 2021 at 03:51, Thomas Depierre
> > > <depierre.thomas@REDACTED> wrote:
> > > >
> > > > Generating an unbiased random integer in a range 0..N where N is not a
> > power of two requires rejection sampling.
> > > >
> > > > `rand` uses the "Java" algorithm for this as far as i can tell (could
> > not find a better name).
> > https://github.com/erlang/otp/blob/master/lib/stdlib/src/rand.erl#L150
> > > >
> > > > Daniel Lemire presents a good analysis of the current field in
> > https://arxiv.org/abs/1805.10941 and then propose a "near divisionless"
> > algorithm that should provide speedup. This could already provide
> > interesting results.
> > > >
> > > > A few days ago, the Swift team came up with an algorithm that:
> > > > - never divides
> > > > - avoids rejection sampling entirely
> > > > - achieves a theoretically optimal bound on the amount of randomness
> > consumed to generate a sample
> > > > - delivers actual performance improvements for most real cases
> > > >
> > > > https://github.com/apple/swift/pull/39143
> > > >
> > > > Am I understanding this right ? As far as i can tell, our expert is
> > Raimo Niskanen.
> > > >
> > > > The EEF would be interested in funding me trying to explore the
> > potential gains and possibly implement it for `rand`. Would this be of
> > interest ? I expect the maintenance to be relatively low, this is not a lot
> > of SLOC
> > > >
> > > > Thomas Depierre
> > > > Schedule a meeting with me
> >

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB