[erlang-questions] All possible internal states of Erlang/OTP random module are practically computable

Fri Dec 26 12:30:04 CET 2014

Another minor nitpick with the nice explanation:

> But being a deterministic process, sooner or later, the state will become
one of the earlier states and then the sequence will repeat itself.

Having a *limited state space* (a bounded number of bits), the process will
sooner or later revisit one of the earlier states.
Being a *deterministic process*, once a state has been revisited, the
sequence will repeat itself, yielding the same numbers it did from the
first time it was in that state.

Just to get things right - I'm confident that you know this but that it
slipped in the explanation. (It's a lot of words anyhow.)

/Erik

2014-12-23 15:56 GMT+01:00 Jesper Louis Andersen <
jesper.louis.andersen@REDACTED>:

> While people might already know these things, I'll make a writeup anyway:
>
> Random processes are, to a certain extent, a misnomer. When writing
> computer software, we want different kinds of randomness in different
> situations. The hard part is that a computer is supposed to be
> deterministic, and thus, the random process we are building is also, by
> extension, deterministic. So when we say "random" we mean
> "indistinguishable from random" It is fairly easy to devise a truly random
> process:
>
> Take a lottery machine. Put a large set of balls into it, half of which
> has a 0 on them, and half of which has a 1. When we want to make up a
> random sequence, crank the machine read the bit that comes out and put the
> bit back into the machine before doing the next draw. Other methods that
> have been used throughout time are dice throws, pictures of lava lamps,
> background radiation and so forth. This yields a truly random sequence. In
> practice, these methods are not feasible for the vast majority of
> computers, sadly.
>
> A sequence generated by random processes like above will have certain
> properties. In order to make sure it is random, people have devised a large
> set of tests which such a random sequence passes. The bulk of these are
> written in Knuth, The Art of Computer Programming Vol 2., but more has
> happened in the field since then. For a process to "look random", it must
> pass all of these tests, as we otherwise can claim the generated sequence
> to be less random than a (truly) random process.
>
> So we want a process which passes all the known tests, looks random, but
> can be implemented on computers. This is where the world of PRNGs
> (Pseudo-Random Number Generators). AS183 is one such PRNG and it is used in
> Erlangs 'random' module. It works like every other computational process,
> it is deterministic. In order to make it look random, it stores a hidden
> state, in the process dictionary, and evolves that state whenever you draw
> random values from it. But being a deterministic process, sooner or later,
> the state will become one of the earlier states and then the sequence will
> repeat itself. Kenji has showed the period is roughly 2 ^ 43 so after that
> many draws, the AS183 PRNG repeats itself. There are much better PRNGs out
> there, with far better periods, and a modern implementation of randomness
> would probably pick one of these. On a modern computer we would expect far
> better periods than something which can be computed in 9 hours on modern
> hardware.
>
> The other problem/feature of a PRNG is the state. If you can get hold of
> the state, you can generate the exact same sequence of random values. In
> certain settings, this is highly useful. As an example, a benchmark would
> like to keep the random input "the same" for every run. If your Common Test
> run fails for a certain random perturbation of your test cases, then you
> want to repeat that exact order while figuring out what is wrong. If you
> are using a tool to artificially control the schedulers in your system in
> order to uncover security bugs, you may want to be able to reproduce a
> given "random" schedule once you find a problem.
>
> On the contrary, however, if the random input was used in cryptographic
> systems, the repeatability would be devastating! If I know the internal
> PRNG state, I can predict the next number in the sequence. Many
> crypto-systems fail spectacularly if the random value is known or
> predicted. Worse, it might be that I can predict the internal state from
> the output. Or even worse yet, that I can predict the state from partial
> output. This makes a PRNG based on a static state dangerous. Even basing
> the PRNG off of the ubiquituous 'os:timestamp()' won't help at all. A
> malicious party who knows this, can abuse the knowledge of time in order to
> narrow down what state the PRNG could have started in.
>
> This leads us to the next type of PRNG, the CSPRNG (Cryptographically
> Safe/Secure PRNG). These, while having the properties of PRNGs in general
> guards against both of the before-mentioned problems: sequence
> predictability and internal state. We require, in addition to passing the
> normal PRNG tests that:
>
> * The CSPRNG passes the next-bit test. That is, given k bits of output
> from the CSPRNG, our prediction of bit k+1 is no better than 50%.
> * The CSPRNG passes the "state compromise extension". Even if we end up
> determining the internal state of the CSPRNG, fully or partially, we can't
> predict the preceeding bits in the output.
>
> In order to achieve these two goals, we usually accept a slower generator
> function.
>
> In order to achieve their goal, CSPRNGs usually harvest entropy[0] from
> different entropy sources. A modern laptop can measure the nanosecond time
> between keystrokes for instance. Or the arrival time of network packets.
> And so on. In addition, we can store the current random state on disk
> between reboots to add even more entropy to the system. This state is then
> feed into the CSPRNG in order to produce its data. While the CSPRNG is
> running, it periodically re-cooks itself by feeding in more entropy.
> Usually, CSPRNGs use well known cryptographic primitives internally to make
> it impossible to predict what it is going to do, and usually, they are
> implemented in the operating system kernel for easy entropy-access.
>
> CSPRNG selection is somewhat crucial. It is widely regarded that the
> DUAL_EC_DRBG CSPRNG was backdoored, by means of "magic constants" which it
> uses internally. There is a hidden of number and if known, then the
> properities of a CSPRNG is broken for the algorithm. The current fact is
> that someone knew the hidden numbers and that the "magic constants" were
> not selected at random in order to weaken security all over the world.
>
> There is also a source of randomness in modern CPUs (VIA PadLock, Intel
> RDRAND on Ivy Bridge and Haswell). This is an entropy source you can use.
> Due to worries about backdoors, it is currently fed into a CSPRNG as an
> additional source rather than being used as the sole source (Intel has a
> alleged CSPRNG in hardware you can call, but most operating systems shy
> away from it on grounds it is not verifiable).
>
> So where does this makes us stand?
>
> 1. The 'random' module is not a CSPRNG. It can *never* be used for
> cryptographically secure randomness.
>
> 2. Even when used in its current form, the 'random' module is a weak
> generator with a short period. Replacing it should definitely be a priority
> because there are far better PRNGs out there which are faster and have
> better periods. For a modern system, 2^43 is far too low, while not being
> in RANDU territory[3]. Not having a PRNG with a nice period in Erlang is
> somewhat of a weakness. While it doesn't apply to CSPRNG data, you don't
> want your Non-CSPRNG processes to have randomness this predictable.
>
> 3. If you need a CSPRNG, you should use 'crypto:strong_rand_bytes/1'. It
> turns out that 'crypto:rand_bytes/1' is predictable in some situations and
> can't be used as a CSPRNG at all. The 'strong_rand_bytes/1' function can
> return 'low_entropy' which is outright wrong and preposterous on modern
> machines. It never will, if the underlying random primitive is correctly
> implemented. The whole idea of "running out of entropy" is false.
>
> 4. An alternative CSPRNG is 'enacl:randombytes/1' which is using
> /dev/urandom on most UNIX systems and 'arc4random' on OpenBSD. This will be
> a CSPRNG on at least Linux, FreeBSD and OpenBSD. On windows it uses
> RtlGenRandom, of which I know little. And I have no clue as to what Illumos
> is doing with its /dev/urandom[2]
> 
> [0] A good way to understand entropy is to make it equivalent with
> uncertainty. A source with high entropy has high uncertainty if you were to
> guess what happens. In order to add uncertainty to a process, you harvest
> entropy.
> [1] 'rand_bytes/1' is a call to RAND_pseudo_bytes and
> 'strong_rand_bytes/1' is a call to 'RAND_bytes', see
> https://www.openssl.org/docs/crypto/RAND_bytes.html
> [2] Note: On most modern systems, /dev/urandom is the same device as
> /dev/random. It has the same properties w.r.t security. eNaCl exploits this
> fact to its fullest.
> [3] RANDU: http://en.wikipedia.org/wiki/RANDU
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141226/48ce291f/attachment.htm>