The document examines the independence of tests in the Ent randomness test battery. An evolutionary algorithm was used to generate random numbers to reduce bias. The Ent tests were run on the generated numbers and statistics were stored. Analysis found several tests were correlated, including entropy, compression, chi-square and excess. While the tests provide useful information, using more than five statistics may overestimate results. Future work aims to focus on the uncorrelated tests and explore non-linear relationships.
Evolutionary generation and degeneration of randomness to assess the independence of the Ent test battery
1. Evolutionary generation and degeneration of
randomness to assess the independence of the Ent
test battery
Julio Hernandez-Castro1 David F. Barrero2
1School of Computing, University of Kent, UK
2Computer Engineering Department, Universidad de Alcalá, Spain
CEC 2017
San Sebastián, Spain
June 5-8, 2017
2. Outline
1 Introduction
Motivation
The Ent test battery
Research question
Research methodology
2 Evolutionary generation and degeneration of randomness
Genetic Algorithm design
Fitness function
Evolutionary random numbers
3 Building the dataset
4 Ent battery independence analysis
Effect of the length
Correlation matrix
5 Conclusions and future work
3. Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Motivation
The Ent test battery
Research question
Research methodology
Introduction
Motivation
Randomness is a critical topic in Security
... however it is difficult to generate in a computer
... even with dedicated hardware, open topic!
Pseudorandom number generators (PRNG)
Cryptosystems are vulnerable to weak PRNGs
Need of assessing the quality of PRNGs
Different aspects of randomness ⇒ Need of different tests
Ideally, tests in a test battery should be independent
Otherwise we would be assessing the same property
In case of dependence, the test would yield optimistic results
Two relevant test batteries: NIST and Ent
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 3 / 18
4. Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Motivation
The Ent test battery
Research question
Research methodology
Introduction
The Ent test battery (I)
Ent is an open source randomness test battery implementation
Compute a statistic and derive a p-value for a null hypothesis
Ent contains five tests/statistics
Entropy Entropy as defined in Information Theory
Chi-square (χ2) Compares expected versus observed frequencies
Arithmetic mean Arithmetic mean of the symbols
Monte-Carlo Pi Monte-Carlo simulation to estimate π. Measured
as error of the estimation (pierror)
Serial correlation Correlation between two consecutive symbols
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 4 / 18
5. Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Motivation
The Ent test battery
Research question
Research methodology
Introduction
The Ent test battery (II)
Ent output example
Entropy = 4.385614 bits per byte.
Optimum compression would reduce the size
of this 20180 byte file by 45 percent.
Chi square distribution for 20180 samples is 1266758.61, and randomly
would exceed this value less than 0.01 percent of the times.
Arithmetic mean value of data bytes is 56.5619 (127.5 = random).
Monte Carlo value for Pi is 3.611061552 (error 14.94 percent).
Serial correlation coefficient is 0.568671 (totally uncorrelated = 0.0).
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 5 / 18
6. Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Motivation
The Ent test battery
Research question
Research methodology
Introduction
Research question
Research question
Are the Ent tests independent?
Hard analytical approach ⇒ Experimental approach
Generate random numbers, run the tests and apply statistical
tools to find dependencies
The (scarce) literature on this topic uses p-values
Good to compare tests and relate them to tests results
Non-linear transformations that loose potentially valuable
information
We focus on the statistics
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 6 / 18
7. Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Motivation
The Ent test battery
Research question
Research methodology
Introduction
Research methodology
Step I: Generate pseudorandom
numbers
Problem: Potential bias by the
PRNG
Solution: Use a GA
Step II: Run the tests
Store Ent statistics
Step III: Study the independence
Classical correlation analysis
Generate
numbers
Run Ent
Statistical
analysis
GA
Store
statistics
Correlation
study
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 7 / 18
8. Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Genetic Algorithm design
Fitness function
Evolutionary random numbers
Evolutionary generation and degeneration of randomness
Genetic algorithm design
Potential biases by PRNG
Too weak (or strong) PRNG
Solution: “Randomize”
random numbers
New problems
Any randomization could
induce new biases
Solution: GA with different
fitnesses
Population 100
Codification Binary fixed size
Length {2i }∀i = 7, . . . , 17
Initial pop Random, T-M PRNG
Crossover One-point
Mutation Bit flip, pm = 0,005
Selection Tournament size two
Elitism 1
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 8 / 18
9. Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Genetic Algorithm design
Fitness function
Evolutionary random numbers
Evolutionary generation and degeneration of randomness
Fitness function (I)
The objective of the GA is to enhance randomness diversity
No theoretical clues about how to include or remove
randomness
Lack of a general randomness theory
Idea: Use Ent output as fitness
Seven Ent statistics: Entropy, compression, χ2, excess,
average mean, pierror, serial correlation
Statistics usually found in the literature
GA run for maximization and minimization
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 9 / 18
10. Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Genetic Algorithm design
Fitness function
Evolutionary random numbers
Evolutionary generation and degeneration of randomness
Fitness function (II)
All fitnessess were
formulated as
maximization
Tournament
selects the best
or worse
Ent statistic Fitness value
Entropy fentropy = Entropy
Compression fcompression = 100 − Compression
Chisquared fchisquared = 1
1+Chisquared
Excess fexcess = Excess
Average mean famean = 1
(1+|127,5−amean|)
Pierror fpierror = 1
1+pierror
Correlation fcorrelation = 1
1+|correlation|
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 10 / 18
12. Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Building the dataset
GA populations were stored
Direct dump of chromosomes to disk
1, 559, 880 numbers generated in minimization
1, 559, 880 numbers generated in maximization
Ent statistics of each number were computed and stored
Seven statistics per number
Analysis in two phases
1 Effect of the sequence length
2 Correlation analysis
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 12 / 18
16. Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Conclusions and future work (I)
General conclusions
PRNGs can be strengthen with a naïve GA
Ent provides five tests, seven metrics analyzed
Entropy and compression are almost the same metric
χ2 and excess highly correlated
Chain of correlations
Five metrics provide almost all the relevant information
Using more than five statistics may overestimate the tests
result
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 16 / 18
17. Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Conclusions and future work (II)
Future work
Focus on the five uncorrelated metrics
Search non-linear relationships with symbolic regression
Extend the study to the NIST test battery
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 17 / 18
18. Thanks for your attention!
¡Gracias!
Eskerrik asko!
Code and datasets can be downloaded from
http://atc1.aut.uah.es/~david/cec2017
David F. Barrero
david@aut.uah.es