[erlang-questions] RNG algorithm choice for uniform natural numbers

Sat Aug 4 13:02:14 CEST 2018

Raimo Niskanen writes:

>> • Xorshift1024*. It fails the BigCrush test according to https://lemire.me/blog/2017/09/15/the-xorshift1024-random-number-generator-fails-bigcrush/ .
>
> It is a known fact that the lowest bits of Xors... generators with * and +
> scramblers are of lower quality than the rest.  If he had used the high 32
> bits I predict that it would have passed.
>
> The author of Xoroshiro, etc, thinks that this is a small problem.

What I actually want to implement is a randomized apportionment scheme, as described on https://rangevoting.org/Apportion.html (“Truly unbiased methods”), but more efficient, like in the code below (not tested). Therefore if the test failure won’t cause a noticeable bias, then it indeed shouldn’t be a problem.

-type score() :: ({Votes :: non_neg_integer(), Id :: any()}).

-spec apportion(list(score()), pos_integer(), pos_integer()) -> list({Id :: any(), Seats :: non_neg_integer()}).
apportion(Scores, TotalSeats, VotesSum) ->
    apportion(Scores, TotalSeats, VotesSum, []).

apportion([], 0, 0, Acc) ->
    Acc;
apportion([{Votes, Id}|ScoresTail], TotalSeats, VotesSum, Acc) ->
    WholeSeats = Votes * TotalSeats div VotesSum,
    DrawnSeats =
        case rand:uniform(VotesSum) =< Votes * TotalSeats rem VotesSum of
            true ->
                1;
            false ->
                0
        end,
    Seats = WholeSeats + DrawnSeats,
    apportion(ScoresTail, TotalSeats - Seats, VotesSum - Votes, [{Id, Seats}|Acc]).

> Any 64-bit word algorithm runs about 2..3 (or more) times slower
> than a 58-bit output word one, on 64-bit Erlang.  That's why.

Sounds plausible. Though such differences likely won’t matter in my case, it doesn’t rule out choosing a 58-bit algorithm. The inter-platform compatibility issue is hypothetical for now.

> I have never thought about implementing the new 64-bit word size generators
> since nobody has asked for inter-platform compatibility before, and you still
> have to be careful with bit compatibility.  It is only when you request
> the generator word size range that you get the raw numbers.  And for
> floating point if you are lucky that both platforms use the same bits
> (low/high/middle).

If I understand correctly, you basically mean that different RNG libraries can convert the same entropy bits to numbers differently.