Random Number
Generators

The Art of Mathematical Computing

by Benjamin (Bill) Planche
feat. Philipp Jovanovic
Slides at github.com/Aldream/presentations

Structure

  1. Randomness - defs + pseudo-randomness
  2. RNGs & PRNGs - defs + implementation
  3. Test Tools - suites + implementation

Randomness

Lack of pattern, predictability or determinism in events.

However.

Is it really random?

Or are we simply ignorant of the underlying pattern?

Random Sequences

Information Theory

Basic definition
Sequence of independent random variables
Formal definition
???

Definition by Von Mises

Based on Theory of Large Numbers

An infinite sequence can be considered random if:

  • It has the Frequency Stability Property
    • $S \in \mathbb{A}^n$ with $\mathbb{A}$ alphabet of $m$ symbols
    • $\forall a \in \mathbb{A}, \lim_{n \to \infty} |\{s, s \in S \land s = a\}| = \frac{1}{m}$
  • Any sub-seq. selected by a proper method isn't biased too. ex:
    • $ S = (1,0,1,0,1,0,1,0) $ not biased
    • $f(X) \to (X^i | \forall i \leq |X| \land i \equiv 0 \pmod 2)$
    • $\implies f(S) \to (1,1,1,1)$ biased

However.

  • How to mathematize this proper method of selection?
  • Yields an empty set (demo by Jean Ville in 1939)

Definition by Martin-Löf

A Random Sequence has no "exceptional and effectively verifiable" property
  • No properties verifiable by a recursive algorithm
  • Frequency / measure-theoretic
  • Quite satisfactory

Definition by Levin/Chaitin

Complexity of Kolmogorov

  • Important measure for Information Theory
Length of the shortest program able to generate the sequence.

Resulting Definition

A finite string is random
if it requires a program at least as long as itself to be computed

$\exists c \geq 0$, such as $H(S_0^n) \geq n - c$
with $H$ complexity of Kolmogorov

  • "Incomprehensible informational content"
  • Complexity / Compressibility approach

Statistical Randomness

A sequence is statistically random if it has no recognizable patterns.
  • Less strict than previous definitions
  • Doesn't imply objective unpredictability

... leaves room to the concept of ...

Pseudo-Randomness

and Pseudo-Random Sequence

Exhibits statistical randomness...

... though generated by a deterministic causal method.

Random Number
Generators

Definition

Device which can produce a sequence of random numbers, i.e. a without deterministic properties and patterns.

Categories

Generators based on physical phenomena

Traditionnal Methods

Dice tossing, coin flipping, bird trajectories, ...

  • Often only random in appearances
  • Cheating by knowing the rules / initial state

Quantic Phenomena

Nuclear decay, Behavior of photons hitting a semi-transparent mirror, ...

  • Golden solutions
  • Globally too costly to be democratized

Noisy Phenomena

Thermal signal from transistor, radio noise, Analog-to-digital conversion noise, ...

  • Easier to detect
  • Offer good results

OS Implementations

  • Unix Systems
    • $/dev/urandom$ & $/dev/random$
    • Device files probing analog sources (mouse, keyboard, disk accesses, etc.)
  • Windows Systems
    • $CryptGenRandom$
    • Gather through system state (CPU counters, env. var, threads IDs, etc.)

  • Based on the unpredictable IO + behavior of the users
  • Harvest entropy, Output random bytes

In both cases, entropy decreases during inactivity

... Shortages.

Pseudo-Random Number Generators

Ramdomness through Determinism...??

WTF?!

Pseudo-Random Sequences

  • Clever implementations $\rightarrow$ Long-enough period
  • Determinism $\rightarrow$ totally defined by init config
    • State
    • Seed

LFSR

Linear Feedback Shift Register

  • Sequential shift register
  • New bit = linear function of previous state
  • Combinational logic
  • $\mathfrak{F}$ mapping in vector space of binary $n$-tuples
  • $f$ feedback function = boolean operation of $n$ variables

$$\mathfrak{F}:\mathbb{F}_2^n \to \mathbb{F}_2^n$$ $$\mathfrak{F}:(x_1, x_2, ..., x_n)\mapsto (x_2, x_3, ..., x_n, f((x_1, x_2, ..., x_n))$$

  • $f$ $\equiv$ poly mod 2 in finite field arithmetic
    • Feedback polynomial
  • Taps = bits of the register used in the linear operation
  • Conditions to maximal-length LFSR (period $2^n-1$):
    • Having an even number of taps
    • Using a relatively-prime set of taps
Example

$f(x) = x_{16} + x_{14} + x_{13} + x_{11} + 1$

lfsr
Let's play


NLFSR

Non-Linear Feedback Shift Register

Same theory as for LFSRs

Only one difference

The feedback function $f$ is non-linear

ex: $f(x) = x^4 + x^1 \cdot x^2 + 1$
nlfsr

  • Makes NLFSRs harder to predict than LFSRs
  • Makes it harder to ensure a max period of $2^n-1$ bits.
Let's play


Applications and Uses

Applications in every area
where unpredictable behavior is desirable/required

cryptographic systems, gambling applications, statistical sampling, simulation, ...

Various applications $ \rightarrow $ Various requirements

  • Crypto-secure RNGs for security applications
  • Outputs uniqueness for shuffling methods
  • ...

  • RNGs $ \rightarrow $ ~ safer but less abundant
  • PRNGs $ \rightarrow $ ~ weaker but lighter

Testing Randomness

About the Difficulty to Test Randomness

Reasons

  • Def. depending on the field $ \rightarrow $ Which one to test?
  • Large number of possibilities $ \rightarrow $ Impossible to fully cover

Solutions

  • Statistical tests or complexity evaluations
  • Battery of tests to identify statistical bias
  • Checking hypothesis of perfect behavior

Limitations

  • Impossible to fully cover $ \rightarrow $ no universal battery of tests
  • Good RNGs $ \approx $ pass complicated or numerous tests

Common Tests

  • DIEHARD Tests
  • TestU1 Suite

Berlekamp-Massey Algorithm

Definition

  • Break linearly recurrent sequences in $\mathbb{F}_n$
  • Find min degree $L$ and annihilator poly $F(x)$ of the seq $S$

Algorithm

$S$ sequence, $F(x)$ polynomial, $ \beta_i^j $ discrepancy
  • Make a first guess for $F(x)$
  • At each iteration $l$:
    • Generate $S_l'$ of $l$ elements, using reverse of $F(x)$
    • Compare $S$ and $S_l'$: $ \beta_0^l(S, S_l') $
    • We know $S_l'$ correct up to the $(l-1)^{th}$ symbol
    • If $l^{th}$ symbol not correct, ie $ \beta_0^l(S, S_l') = (0,0,0,...,1) $:
      • Last iteration $m$ when this happened, we had $ \beta_0^m(S, S_m') = (0,0,0,...,1) $
      • So $ \beta_0^l(S, S_l') + \beta_{l-m}^l(S, S_m') = (0,0,0,...,0) \to$ correction to apply
      • $F(x) \gets F(x) + x^mF_l(x)$
Let's play


Iteration 0
$\beta$ = 0
(stopped)

Conclusion

Overview of a large topic

  • Various characteristics / Various Uses
  • Choose wisely!
  • Don't implement your own RNG!
    • ... especially for crypto!
    • ... but if you try, test test test!

References

Presentation based on a personal survey: https://github.com/Aldream/random-number-generator
  1. Downey, R.: Some recent progress in algorithmic randomness. In Mathematical Foundations of Computer Science 2004. Springer Berlin Heidelberg (2004)
  2. Wikipedia: Random Number Generation (2014)
  3. Aumasson, J.P.: Crypto for Developers - Part 2, Randomness. AppSec Forum Switzerland 2013 (2013)
  4. Raymond, S., Andrew, S., Patrick, C., Jason, M.: Linear Feedback Shift Register (2001)
  5. Joux, A.: Algorithmic cryptanalysis. CRC Press (2009)
  6. Szmidt, J.: The Search and Construction of Nonlinear Feedback Shift Registers. Military Communication Institute, Zegrze, Poland (2013)
  7. Wikipedia: Linear Feedback Shift Register (2014)
  8. Dubrova, E.: A List of Maximum Period NLFSRs. IACR Cryptology ePrint Archive 2012 (2012) 166
  9. Ritter, T.: Randomness Tests: A Literature Survey (2007)
  10. L'Ecuyer, P, S.R.: TestU01 - A Software Library in ANSI C for Empirical Testing of Random Number Generators (2002)
  11. Marsaglia, G.: The Marsaglia Random Number CDROM including the Diehard Battery of Tests of Randomness (2005)
  12. Soto, J.: Statistical Testing of Random Number Generators. Proceedings of the 22nd National Information Systems Security Conference NIST, 1999 (1999)
  13. Berlekamp, E.R.: Nonbinary BCH decoding. University of North Carolina. Department of Statistics (1967)
  14. Massey, J.L.: Shift-register synthesis and BCH decoding. Information Theory. IEEE Transactions 15(1) (1969) 122-127
  15. Feng, G.-L., T.K.: A generalization of the Berlekamp-Massey algorithm for multisequence shift-register. Information Theory, IEEE Transactions 37(5) (2012) 1274-1287
  16. Rodrigez, S.: Implementation of a decoding algorithm for codes from algebraic curves in the programming language Sage. diploma thesis, Faculty of San Diego State University (2013)

Thanks for you attention!

Questions?

@b_aldream | git:Aldream | aldream.net

Annexe

Randomness defined by Schnorr

A random sequence must not be predictable.
No effective strategy should lead to an infinite gain if we bet on its symbols.
  • Predictability approach

LFSR - Implementation

Python
def createLFSRgenerator(taps, seed):
    """ Returns a LFSR generator, defined by the given sequence of taps and initial value.
        @param taps (Tuple[int]): Sequence of taps defining the register.
			ex: (1, 0, 0, 1) -> f(x) = x^4 + x^3 + 1
        @param seed (int):        Initial value given to the register
        @return                   LFSR Generator """
    def lfsrGen(): """ @yield Pseudo-Random value from the defined LFSR """
        deg = len(taps)              # Degree of the feedback polynomial
        period = math.pow(2,deg) - 1 # Max period of the LFSR
        value = seed                 # Initial value
        it = 0
        while (it < period): # Computing new value of most-significant bit:
            bit = 0
            for j in range(deg): # AND-operation between the current value and the taps-tuple
                if taps[j]:
                    bit ^= value >> j
            bit &= 1 # XOR-operation to get the new value of the bit
            # Final value in register by popping less-sign bit and appending the new most-sign one:
            value = (value >> 1) | (bit << (deg-1))
            it += 1
            yield value
    return lfsrGen
Javascript
function createLFSRGenerator(taps, seed) {
/** Returns a LFSR generator, defined by the given sequence of taps and initial value.
        @param taps (Tuple[int]): Sequence of taps defining the register.
			ex: (1, 0, 0, 1) -> f(x) = x^4 + x^3 + 1
        @param seed (int):        Initial value given to the register
        @return                   LFSR Generator */
    return function *lfsrGen() { /** @yield Pseudo-Random value from the defined LFSR */
        var deg = taps.length,              // Degree of the feedback polynomial
            period = Math.pow(2, deg) - 1,   // Max period of the LFSR
            value = seed;                     // Initial value
        for (var it = 0; it < period; it++) { // Computing new value of most-significant bit:
            var bit = 0;
            for (var j = 0; j < deg; j++) { // AND-operation between the current value and the taps-tuple
                if (taps[j])
                    bit ^= value >> j;
            }
            bit &= 1; // XOR-operation to get the new value of the bit
            // Final value in register by popping less-sign bit and appending the new most-sign one:
            yield (value = (value >> 1) | (bit << (deg - 1)));
        }
    }
}

NLFSR - Implementation

Python
def createNLFSRgenerator(taps, seed):
    """ Returns a NLFSR generator, defined by the given combination of taps and initial value.
        @param taps (Tuple[Array[int]]): Sequence of combination of taps defining the non-linear register.
            ex: ([0,0],[],[2],[1,2]) -> f(x) = x^4*x^4 + x^2  + x^1*x^2 + 1 (poor choice)
        @param seed (int):              Initial value given to the register
        @return                         NLFSR Generator """
    def nlfsrGen(): """ @yield Pseudo-Random value generated by a pre-defined NLFSR """
        deg = len(taps)              # Degree of the feedback polynomial
        period = math.pow(2,deg) - 1 # Max Period of the NLFSR (read Warning above)
        value = seed                 # Initial value
        it = 0
        while (it < period): # Computing the new value of the most-significant bit:
            bit = 0
			for tap in taps:
                # Computing the binary multiplication x^K_0 * x^K_1 * ... * x^K_n with [K_0, K_1, ..., K_n] the j-th taps array
                if len(tap):
                    element = 1
                    for k in tap:
                        if not (value >> k & 1):
                            element = 0 # Binary multiplication of terms returns 1 iif none of the terms is null.
                            break       # So if we encounter a null bit, we simply return 0, else 1.
                else:
                    element = 0
                bit ^= element # Binary addition of the multiplication results
            bit &= 1
            # Getting the final value in the register by popping the less-significant bit and appending the new most-significant one:
            value = (value >> 1) | (bit << (deg-1))
            it += 1
            yield value
    return nlfsrGen
Javascript
function createNLFSRgenerator(taps, seed) {
    /** Returns a NLFSR generator, defined by the given combination of taps and initial value.
        @param taps (Tuple[Array[int]]): Sequence of combination of taps defining the non-linear register.
            ex: ([0,0],[],[2],[1,2]) -> f(x) = x^4*x^4 + x^2  + x^1*x^2 + 1 (poor choice)
        @param seed (int):              Initial value given to the register
        @return                         NLFSR Generator */
    return function *nlfsrGen() { /** @yield Pseudo-Random value generated by a pre-defined NLFSR */
        var deg = taps.length,              // Degree of the feedback polynomial
            period = Math.pow(2,deg) - 1, // Max Period of the NLFSR (read Warning above)
            value = seed,                 // Initial value
            it = 0
        while (it < period) {             // Computing the new value of the most-significant bit:
            var    bit = 0
			for (var j = 0; j < taps.length; j++) {
				var	element = 1;
                if (taps[j].length) { // Computing the binary multiplication x^K_0 * x^K_1 * ... * x^K_n with [K_0, K_1, ..., K_n] the j-th taps array
                    for (var k = 0; k < taps[j].length; k++) {
                        if (!(value >> taps[j][k] & 1)) {
                            element = 0;  // Binary multiplication of terms returns 1 iif none of the terms is null.
                            break;        // So if we encounter a null bit, we simply return 0, else 1.
                         }
                    }
                } else { element = 0; }
                bit ^= element;          // Binary addition of the multiplication results:
            }
            bit &= 1;
            // Getting the final value in the register by popping the less-significant bit and appending the new most-significant one:
            it += 1;
            yield (value = (value >> 1) | (bit << (deg-1)));
        }
    }
}

Test Suites

DIEHARD Tests

  • Developed by George Marsaglia, in 1995
  • 15 tests run over a large file containing the sequence
birthday spacings, overlapping permutations, ranks of 31x31 and 32x32 matrices, ranks of 6x8 matrices, monkey tests, count the 1's, parking lot, minimum distance, random spheres, squeeze, overlapping sums, runs, and craps

TestU01 Suite

  • Software library, initiated in 1985
  • Collection of utilities in ANSI C
  • Classical stat tests + others from literature + original ones
  • Tools to implement specific stat tests.

Berlekamp-Massey Algorithm Alternate explanation

  • At each iteration $l$:
    • Evaluate the discrepancy
    • If null:
      • $F(x)$ and $L$ still correct
      • Go to next iteration
    • Else:
      • $F(x)$ should be concordantly adjusted
      • Shift & Scale syndromes added since last update
    • If $l > 2L$:
      • Update $L$ to keep track of progression

Berlekamp-Massey - Implementation

Python
def BerlekampMasseyAlgorithm(sequence):
    """ Applies the Berlekamp-Massey Algorithm to the given sequence of bits;
        Returns the smallest annihilating polynomial F, ie. the smallest inverse
        feedback polynomial corresponding to the generating LFSR.( F(sequence) = 0 )
        @param sequence (Array[int] or Tuple[int]): Sequence of bits to analyze
        @returns Array defining the computed inverse feedback polynomial
            ex: [1, 0, 0, 1, 1] represents the inverse polynomial x^4 + x^3 + 1,
                and thus the feedback polynomial x^4 + x + 1 (taps = (1, 0, 0, 1)) """
    
    def discrepancy(sequence, poly, i, L):
        """ Returns the discrepancy.
            @param sequence (Array[int] or Tuple[int]): Sequence of bits to analyze
            @param poly (Array[int]):                   Current version of the inverse polynomial
            @param i (int):                             Current position in the sequence
            @param L (int):                             Current number of assumed errors
            @return                                     Binary value of the discrepancy """
        return sum([sequence[i-j]&poly[j] for j in range(0,L+1)])%2 # = s[i]*p[i] + s[i-1]*p[1] + ... + s[i-L]*p[L]
    
    def addPoly(poly1, poly2, length):
        """ Computes the addition of two F2 polynomials.
            @param poly1 (Array[int]): Array representing the 1st polynomial
            @param poly2 (Array[int]): Array representing the 2nd polynomial
            @param length (int):       Length to be covered by the addition (trusting user to avoid testing)
            @returns                   Resulting Binary Array  """
        return [poly1[j]^poly2[j] for j in range(0, length)]

    # Initializing: 
    N = len(sequence)
    F, f = [0]*N, [0]*N # Polynomials, with F being the one returned at the end (inverse feedback polynomial)
    F[0] = f[0] = 1
    L = 0               # Current number of assumed errors
    delta = 1           # Number of iterations since last update of L
    for l in range(N):  # Computing F and L:
        beta = discrepancy(sequence, F, l, L)
        if beta != 0:   # Adjusting F for this term:
            g = F.copy()
            F = addPoly(F, [0]*delta + f, N)
            if 2 * L <= l:    # If it is not the case, we must update L (and thus re-initalize delta), and also f:
                L = l + 1 - L # number of available syndromes used to calculate discrepancies
                delta = 1
                f = g # f get the previous value of F
            else: delta += 1
        else: delta += 1
    return F[:L+1] # output the polynomial
Javascript
function BerlekampMasseyAlgorithm(sequence) {
    /** Applies the Berlekamp-Massey Algorithm to the given sequence of bits;
        Returns the smallest annihilating polynomial F, ie. the smallest inverse
        feedback polynomial corresponding to the generating LFSR.( F(sequence) = 0 )
        @param sequence (Array[int] or Tuple[int]): Sequence of bits to analyze
        @returns Array defining the computed inverse feedback polynomial
            ex: [1, 0, 0, 1, 1] represents the inverse polynomial x^4 + x^3 + 1,
                and thus the feedback polynomial x^4 + x + 1 (taps = (1, 0, 0, 1)) */
    
    function discrepancy(sequence, poly, i, L) {
        /** Returns the discrepancy.
            @param sequence (Array[int] or Tuple[int]): Sequence of bits to analyze
            @param poly (Array[int]):                   Current version of the inverse polynomial
            @param i (int):                             Current position in the sequence
            @param L (int):                             Current number of assumed errors
            @return                                     Binary value of the discrepancy */
		var disc = 0;
		for (var j = 0; j < L+1; j++) disc += (sequence[i-j] & poly[j]) // disc = s[i]*p[i] + s[i-1]*p[1] + ... + s[i-L]*p[L]
		return disc%2;
	}
    
	function addPoly(poly1, poly2, length) {
        /** Computes the addition of two F2 polynomials.
            @param poly1 (Array[int]): Array representing the 1st polynomial
            @param poly2 (Array[int]): Array representing the 2nd polynomial
            @param length (int):       Length to be covered by the addition (trusting user to avoid testing)
            @returns                   Resulting Binary Array  */
		var poly = [];
		for (var j = 0; j < length; j++) poly.push(poly1[j] ^ poly2[j]);
		return poly;
	}

    // Initializing: 
    var N = sequence.length;
    var F = [], f = [] 		// Polynomials, with F being the one returned at the end (inverse feedback polynomial)
	for (var i = 0; i < N; i++) { F.push(0); f.push(0); }
    F[0] = f[0] = 1
    var L = 0               // Current number of assumed errors
    var delta = 1           // Number of iterations since last update of L
    for (var l = 0; l < N; l++) { // Computing F and L:
        var beta = discrepancy(sequence, F, l, L);
        if (beta != 0) { // Adjusting F for this term:
            var g = F.slice(0);
			var fShifted = f.slice(0); for (var k = 0; k < delta; k++) { fShifted.unshift(0); }
            F = addPoly(F, fShifted, N);
            if (2 * L <= l) {
                L = l + 1 - L; // number of available syndromes used to calculate discrepancies
                delta = 1;
                f = g; // f get the previous value of F
            } else delta += 1;
		} else delta += 1;
	}
	for (var k = L+1; k < N; k++) { F.pop(); }
    return F; // output the polynomial
}