<div dir="ltr"><div><span style="font-size:13px">Another minor nitpick with the nice explanation:</span></div><span style="font-size:13px"><div><span style="font-size:13px"><br></span></div>> But being a deterministic process, sooner or later, the state will become one of the earlier states and then the sequence will repeat itself. </span><br><div><span style="font-size:13px"><br></span></div><div><span style="font-size:13px">Having a *limited state space* (a bounded number of bits), the process will sooner or later revisit one of the earlier states.</span></div><div><span style="font-size:13px">Being a *deterministic process*, once a state has been revisited, the sequence will repeat itself, yielding the same numbers it did from the first time it was in that state.</span></div><div><span style="font-size:13px"><br></span></div><div><span style="font-size:13px">Just to get things right - I'm confident that you know this but that it slipped in the explanation. (It's a lot of words anyhow.)</span></div><div><span style="font-size:13px"><br></span></div><div><span style="font-size:13px">/Erik</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">2014-12-23 15:56 GMT+01:00 Jesper Louis Andersen <span dir="ltr"><<a href="mailto:jesper.louis.andersen@gmail.com" target="_blank">jesper.louis.andersen@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">While people might already know these things, I'll make a writeup anyway:<div><br></div><div>Random processes are, to a certain extent, a misnomer. When writing computer software, we want different kinds of randomness in different situations. The hard part is that a computer is supposed to be deterministic, and thus, the random process we are building is also, by extension, deterministic. So when we say "random" we mean "indistinguishable from random" It is fairly easy to devise a truly random process:</div><div><br></div><div>Take a lottery machine. Put a large set of balls into it, half of which has a 0 on them, and half of which has a 1. When we want to make up a random sequence, crank the machine read the bit that comes out and put the bit back into the machine before doing the next draw. Other methods that have been used throughout time are dice throws, pictures of lava lamps, background radiation and so forth. This yields a truly random sequence. In practice, these methods are not feasible for the vast majority of computers, sadly.</div><div><br></div><div>A sequence generated by random processes like above will have certain properties. In order to make sure it is random, people have devised a large set of tests which such a random sequence passes. The bulk of these are written in Knuth, The Art of Computer Programming Vol 2., but more has happened in the field since then. For a process to "look random", it must pass all of these tests, as we otherwise can claim the generated sequence to be less random than a (truly) random process.</div><div><br></div><div>So we want a process which passes all the known tests, looks random, but can be implemented on computers. This is where the world of PRNGs (Pseudo-Random Number Generators). AS183 is one such PRNG and it is used in Erlangs 'random' module. It works like every other computational process, it is deterministic. In order to make it look random, it stores a hidden state, in the process dictionary, and evolves that state whenever you draw random values from it. But being a deterministic process, sooner or later, the state will become one of the earlier states and then the sequence will repeat itself. Kenji has showed the period is roughly 2 ^ 43 so after that many draws, the AS183 PRNG repeats itself. There are much better PRNGs out there, with far better periods, and a modern implementation of randomness would probably pick one of these. On a modern computer we would expect far better periods than something which can be computed in 9 hours on modern hardware.</div><div><br></div><div>The other problem/feature of a PRNG is the state. If you can get hold of the state, you can generate the exact same sequence of random values. In certain settings, this is highly useful. As an example, a benchmark would like to keep the random input "the same" for every run. If your Common Test run fails for a certain random perturbation of your test cases, then you want to repeat that exact order while figuring out what is wrong. If you are using a tool to artificially control the schedulers in your system in order to uncover security bugs, you may want to be able to reproduce a given "random" schedule once you find a problem.</div><div><br></div><div>On the contrary, however, if the random input was used in cryptographic systems, the repeatability would be devastating! If I know the internal PRNG state, I can predict the next number in the sequence. Many crypto-systems fail spectacularly if the random value is known or predicted. Worse, it might be that I can predict the internal state from the output. Or even worse yet, that I can predict the state from partial output. This makes a PRNG based on a static state dangerous. Even basing the PRNG off of the ubiquituous 'os:timestamp()' won't help at all. A malicious party who knows this, can abuse the knowledge of time in order to narrow down what state the PRNG could have started in.</div><div><br></div><div>This leads us to the next type of PRNG, the CSPRNG (Cryptographically Safe/Secure PRNG). These, while having the properties of PRNGs in general guards against both of the before-mentioned problems: sequence predictability and internal state. We require, in addition to passing the normal PRNG tests that:</div><div><br></div><div>* The CSPRNG passes the next-bit test. That is, given k bits of output from the CSPRNG, our prediction of bit k+1 is no better than 50%.</div><div>* The CSPRNG passes the "state compromise extension". Even if we end up determining the internal state of the CSPRNG, fully or partially, we can't predict the preceeding bits in the output.</div><div><br></div><div>In order to achieve these two goals, we usually accept a slower generator function.</div><div><br></div><div>In order to achieve their goal, CSPRNGs usually harvest entropy[0] from different entropy sources. A modern laptop can measure the nanosecond time between keystrokes for instance. Or the arrival time of network packets. And so on. In addition, we can store the current random state on disk between reboots to add even more entropy to the system. This state is then feed into the CSPRNG in order to produce its data. While the CSPRNG is running, it periodically re-cooks itself by feeding in more entropy. Usually, CSPRNGs use well known cryptographic primitives internally to make it impossible to predict what it is going to do, and usually, they are implemented in the operating system kernel for easy entropy-access.</div><div><br></div><div>CSPRNG selection is somewhat crucial. It is widely regarded that the DUAL_EC_DRBG CSPRNG was backdoored, by means of "magic constants" which it uses internally. There is a hidden of number and if known, then the properities of a CSPRNG is broken for the algorithm. The current fact is that someone knew the hidden numbers and that the "magic constants" were not selected at random in order to weaken security all over the world.</div><div><br></div><div>There is also a source of randomness in modern CPUs (VIA PadLock, Intel RDRAND on Ivy Bridge and Haswell). This is an entropy source you can use. Due to worries about backdoors, it is currently fed into a CSPRNG as an additional source rather than being used as the sole source (Intel has a alleged CSPRNG in hardware you can call, but most operating systems shy away from it on grounds it is not verifiable).</div><div><br></div><div>So where does this makes us stand?</div><div><br></div><div>1. The 'random' module is not a CSPRNG. It can *never* be used for cryptographically secure randomness.</div><div><br></div><div>2. Even when used in its current form, the 'random' module is a weak generator with a short period. Replacing it should definitely be a priority because there are far better PRNGs out there which are faster and have better periods. For a modern system, 2^43 is far too low, while not being in RANDU territory[3]. Not having a PRNG with a nice period in Erlang is somewhat of a weakness. While it doesn't apply to CSPRNG data, you don't want your Non-CSPRNG processes to have randomness this predictable.</div><div><br></div><div>3. If you need a CSPRNG, you should use 'crypto:strong_rand_bytes/1'. It turns out that 'crypto:rand_bytes/1' is predictable in some situations and can't be used as a CSPRNG at all. The 'strong_rand_bytes/1' function can return 'low_entropy' which is outright wrong and preposterous on modern machines. It never will, if the underlying random primitive is correctly implemented. The whole idea of "running out of entropy" is false.</div><div><br></div><div>4. An alternative CSPRNG is 'enacl:randombytes/1' which is using /dev/urandom on most UNIX systems and 'arc4random' on OpenBSD. This will be a CSPRNG on at least Linux, FreeBSD and OpenBSD. On windows it uses RtlGenRandom, of which I know little. And I have no clue as to what Illumos is doing with its /dev/urandom[2]</div><div>[0] A good way to understand entropy is to make it equivalent with uncertainty. A source with high entropy has high uncertainty if you were to guess what happens. In order to add uncertainty to a process, you harvest entropy.</div><div>[1] 'rand_bytes/1' is a call to RAND_pseudo_bytes and 'strong_rand_bytes/1' is a call to 'RAND_bytes', see <a href="https://www.openssl.org/docs/crypto/RAND_bytes.html" target="_blank">https://www.openssl.org/docs/crypto/RAND_bytes.html</a></div><div>[2] Note: On most modern systems, /dev/urandom is the same device as /dev/random. It has the same properties w.r.t security. eNaCl exploits this fact to its fullest.</div><div>[3] RANDU: <a href="http://en.wikipedia.org/wiki/RANDU" target="_blank">http://en.wikipedia.org/wiki/RANDU</a></div><div><br></div></div>
<br>_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
<br></blockquote></div><br></div>