[erlang-questions] All possible internal states of Erlang/OTP random module are practically computable

Tue Dec 23 15:56:46 CET 2014

While people might already know these things, I'll make a writeup anyway:

Random processes are, to a certain extent, a misnomer. When writing
computer software, we want different kinds of randomness in different
situations. The hard part is that a computer is supposed to be
deterministic, and thus, the random process we are building is also, by
extension, deterministic. So when we say "random" we mean
"indistinguishable from random" It is fairly easy to devise a truly random
process:

Take a lottery machine. Put a large set of balls into it, half of which has
a 0 on them, and half of which has a 1. When we want to make up a random
sequence, crank the machine read the bit that comes out and put the bit
back into the machine before doing the next draw. Other methods that have
been used throughout time are dice throws, pictures of lava lamps,
background radiation and so forth. This yields a truly random sequence. In
practice, these methods are not feasible for the vast majority of
computers, sadly.

A sequence generated by random processes like above will have certain
properties. In order to make sure it is random, people have devised a large
set of tests which such a random sequence passes. The bulk of these are
written in Knuth, The Art of Computer Programming Vol 2., but more has
happened in the field since then. For a process to "look random", it must
pass all of these tests, as we otherwise can claim the generated sequence
to be less random than a (truly) random process.

So we want a process which passes all the known tests, looks random, but
can be implemented on computers. This is where the world of PRNGs
(Pseudo-Random Number Generators). AS183 is one such PRNG and it is used in
Erlangs 'random' module. It works like every other computational process,
it is deterministic. In order to make it look random, it stores a hidden
state, in the process dictionary, and evolves that state whenever you draw
random values from it. But being a deterministic process, sooner or later,
the state will become one of the earlier states and then the sequence will
repeat itself. Kenji has showed the period is roughly 2 ^ 43 so after that
many draws, the AS183 PRNG repeats itself. There are much better PRNGs out
there, with far better periods, and a modern implementation of randomness
would probably pick one of these. On a modern computer we would expect far
better periods than something which can be computed in 9 hours on modern
hardware.

The other problem/feature of a PRNG is the state. If you can get hold of
the state, you can generate the exact same sequence of random values. In
certain settings, this is highly useful. As an example, a benchmark would
like to keep the random input "the same" for every run. If your Common Test
run fails for a certain random perturbation of your test cases, then you
want to repeat that exact order while figuring out what is wrong. If you
are using a tool to artificially control the schedulers in your system in
order to uncover security bugs, you may want to be able to reproduce a
given "random" schedule once you find a problem.

On the contrary, however, if the random input was used in cryptographic
systems, the repeatability would be devastating! If I know the internal
PRNG state, I can predict the next number in the sequence. Many
crypto-systems fail spectacularly if the random value is known or
predicted. Worse, it might be that I can predict the internal state from
the output. Or even worse yet, that I can predict the state from partial
output. This makes a PRNG based on a static state dangerous. Even basing
the PRNG off of the ubiquituous 'os:timestamp()' won't help at all. A
malicious party who knows this, can abuse the knowledge of time in order to
narrow down what state the PRNG could have started in.

This leads us to the next type of PRNG, the CSPRNG (Cryptographically
Safe/Secure PRNG). These, while having the properties of PRNGs in general
guards against both of the before-mentioned problems: sequence
predictability and internal state. We require, in addition to passing the
normal PRNG tests that:

* The CSPRNG passes the next-bit test. That is, given k bits of output from
the CSPRNG, our prediction of bit k+1 is no better than 50%.
* The CSPRNG passes the "state compromise extension". Even if we end up
determining the internal state of the CSPRNG, fully or partially, we can't
predict the preceeding bits in the output.

In order to achieve these two goals, we usually accept a slower generator
function.

In order to achieve their goal, CSPRNGs usually harvest entropy[0] from
different entropy sources. A modern laptop can measure the nanosecond time
between keystrokes for instance. Or the arrival time of network packets.
And so on. In addition, we can store the current random state on disk
between reboots to add even more entropy to the system. This state is then
feed into the CSPRNG in order to produce its data. While the CSPRNG is
running, it periodically re-cooks itself by feeding in more entropy.
Usually, CSPRNGs use well known cryptographic primitives internally to make
it impossible to predict what it is going to do, and usually, they are
implemented in the operating system kernel for easy entropy-access.

CSPRNG selection is somewhat crucial. It is widely regarded that the
DUAL_EC_DRBG CSPRNG was backdoored, by means of "magic constants" which it
uses internally. There is a hidden of number and if known, then the
properities of a CSPRNG is broken for the algorithm. The current fact is
that someone knew the hidden numbers and that the "magic constants" were
not selected at random in order to weaken security all over the world.

There is also a source of randomness in modern CPUs (VIA PadLock, Intel
RDRAND on Ivy Bridge and Haswell). This is an entropy source you can use.
Due to worries about backdoors, it is currently fed into a CSPRNG as an
additional source rather than being used as the sole source (Intel has a
alleged CSPRNG in hardware you can call, but most operating systems shy
away from it on grounds it is not verifiable).

So where does this makes us stand?

1. The 'random' module is not a CSPRNG. It can *never* be used for
cryptographically secure randomness.

2. Even when used in its current form, the 'random' module is a weak
generator with a short period. Replacing it should definitely be a priority
because there are far better PRNGs out there which are faster and have
better periods. For a modern system, 2^43 is far too low, while not being
in RANDU territory[3]. Not having a PRNG with a nice period in Erlang is
somewhat of a weakness. While it doesn't apply to CSPRNG data, you don't
want your Non-CSPRNG processes to have randomness this predictable.

3. If you need a CSPRNG, you should use 'crypto:strong_rand_bytes/1'. It
turns out that 'crypto:rand_bytes/1' is predictable in some situations and
can't be used as a CSPRNG at all. The 'strong_rand_bytes/1' function can
return 'low_entropy' which is outright wrong and preposterous on modern
machines. It never will, if the underlying random primitive is correctly
implemented. The whole idea of "running out of entropy" is false.

4. An alternative CSPRNG is 'enacl:randombytes/1' which is using
/dev/urandom on most UNIX systems and 'arc4random' on OpenBSD. This will be
a CSPRNG on at least Linux, FreeBSD and OpenBSD. On windows it uses
RtlGenRandom, of which I know little. And I have no clue as to what Illumos
is doing with its /dev/urandom[2]

[0] A good way to understand entropy is to make it equivalent with
uncertainty. A source with high entropy has high uncertainty if you were to
guess what happens. In order to add uncertainty to a process, you harvest
entropy.
[1] 'rand_bytes/1' is a call to RAND_pseudo_bytes and 'strong_rand_bytes/1'
is a call to 'RAND_bytes', see
https://www.openssl.org/docs/crypto/RAND_bytes.html
[2] Note: On most modern systems, /dev/urandom is the same device as
/dev/random. It has the same properties w.r.t security. eNaCl exploits this
fact to its fullest.
[3] RANDU: http://en.wikipedia.org/wiki/RANDU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141223/4c12384c/attachment.htm>