[erlang-questions] RNG algorithm choice for uniform natural numbers
Krzysztof Jurewicz
krzysztof.jurewicz@REDACTED
Sat Aug 4 13:02:14 CEST 2018
Raimo Niskanen writes:
>> • Xorshift1024*. It fails the BigCrush test according to https://lemire.me/blog/2017/09/15/the-xorshift1024-random-number-generator-fails-bigcrush/ .
>
> It is a known fact that the lowest bits of Xors... generators with * and +
> scramblers are of lower quality than the rest. If he had used the high 32
> bits I predict that it would have passed.
>
> The author of Xoroshiro, etc, thinks that this is a small problem.
What I actually want to implement is a randomized apportionment scheme, as described on https://rangevoting.org/Apportion.html (“Truly unbiased methods”), but more efficient, like in the code below (not tested). Therefore if the test failure won’t cause a noticeable bias, then it indeed shouldn’t be a problem.
-type score() :: ({Votes :: non_neg_integer(), Id :: any()}).
-spec apportion(list(score()), pos_integer(), pos_integer()) -> list({Id :: any(), Seats :: non_neg_integer()}).
apportion(Scores, TotalSeats, VotesSum) ->
apportion(Scores, TotalSeats, VotesSum, []).
apportion([], 0, 0, Acc) ->
Acc;
apportion([{Votes, Id}|ScoresTail], TotalSeats, VotesSum, Acc) ->
WholeSeats = Votes * TotalSeats div VotesSum,
DrawnSeats =
case rand:uniform(VotesSum) =< Votes * TotalSeats rem VotesSum of
true ->
1;
false ->
0
end,
Seats = WholeSeats + DrawnSeats,
apportion(ScoresTail, TotalSeats - Seats, VotesSum - Votes, [{Id, Seats}|Acc]).
> Any 64-bit word algorithm runs about 2..3 (or more) times slower
> than a 58-bit output word one, on 64-bit Erlang. That's why.
Sounds plausible. Though such differences likely won’t matter in my case, it doesn’t rule out choosing a 58-bit algorithm. The inter-platform compatibility issue is hypothetical for now.
> I have never thought about implementing the new 64-bit word size generators
> since nobody has asked for inter-platform compatibility before, and you still
> have to be careful with bit compatibility. It is only when you request
> the generator word size range that you get the raw numbers. And for
> floating point if you are lucky that both platforms use the same bits
> (low/high/middle).
If I understand correctly, you basically mean that different RNG libraries can convert the same entropy bits to numbers differently.
More information about the erlang-questions
mailing list