SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Evolutionary generation and degeneration of
randomness to assess the independence of the Ent
test battery
Julio Hernandez-Castro1 David F. Barrero2
1School of Computing, University of Kent, UK
2Computer Engineering Department, Universidad de Alcalá, Spain
CEC 2017
San Sebastián, Spain
June 5-8, 2017
Outline
1 Introduction
Motivation
The Ent test battery
Research question
Research methodology
2 Evolutionary generation and degeneration of randomness
Genetic Algorithm design
Fitness function
Evolutionary random numbers
3 Building the dataset
4 Ent battery independence analysis
Effect of the length
Correlation matrix
5 Conclusions and future work
Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Motivation
The Ent test battery
Research question
Research methodology
Introduction
Motivation
Randomness is a critical topic in Security
... however it is difficult to generate in a computer
... even with dedicated hardware, open topic!
Pseudorandom number generators (PRNG)
Cryptosystems are vulnerable to weak PRNGs
Need of assessing the quality of PRNGs
Different aspects of randomness ⇒ Need of different tests
Ideally, tests in a test battery should be independent
Otherwise we would be assessing the same property
In case of dependence, the test would yield optimistic results
Two relevant test batteries: NIST and Ent
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 3 / 18
Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Motivation
The Ent test battery
Research question
Research methodology
Introduction
The Ent test battery (I)
Ent is an open source randomness test battery implementation
Compute a statistic and derive a p-value for a null hypothesis
Ent contains five tests/statistics
Entropy Entropy as defined in Information Theory
Chi-square (χ2) Compares expected versus observed frequencies
Arithmetic mean Arithmetic mean of the symbols
Monte-Carlo Pi Monte-Carlo simulation to estimate π. Measured
as error of the estimation (pierror)
Serial correlation Correlation between two consecutive symbols
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 4 / 18
Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Motivation
The Ent test battery
Research question
Research methodology
Introduction
The Ent test battery (II)
Ent output example
Entropy = 4.385614 bits per byte.
Optimum compression would reduce the size
of this 20180 byte file by 45 percent.
Chi square distribution for 20180 samples is 1266758.61, and randomly
would exceed this value less than 0.01 percent of the times.
Arithmetic mean value of data bytes is 56.5619 (127.5 = random).
Monte Carlo value for Pi is 3.611061552 (error 14.94 percent).
Serial correlation coefficient is 0.568671 (totally uncorrelated = 0.0).
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 5 / 18
Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Motivation
The Ent test battery
Research question
Research methodology
Introduction
Research question
Research question
Are the Ent tests independent?
Hard analytical approach ⇒ Experimental approach
Generate random numbers, run the tests and apply statistical
tools to find dependencies
The (scarce) literature on this topic uses p-values
Good to compare tests and relate them to tests results
Non-linear transformations that loose potentially valuable
information
We focus on the statistics
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 6 / 18
Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Motivation
The Ent test battery
Research question
Research methodology
Introduction
Research methodology
Step I: Generate pseudorandom
numbers
Problem: Potential bias by the
PRNG
Solution: Use a GA
Step II: Run the tests
Store Ent statistics
Step III: Study the independence
Classical correlation analysis
Generate
numbers
Run Ent
Statistical
analysis
GA
Store
statistics
Correlation
study
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 7 / 18
Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Genetic Algorithm design
Fitness function
Evolutionary random numbers
Evolutionary generation and degeneration of randomness
Genetic algorithm design
Potential biases by PRNG
Too weak (or strong) PRNG
Solution: “Randomize”
random numbers
New problems
Any randomization could
induce new biases
Solution: GA with different
fitnesses
Population 100
Codification Binary fixed size
Length {2i }∀i = 7, . . . , 17
Initial pop Random, T-M PRNG
Crossover One-point
Mutation Bit flip, pm = 0,005
Selection Tournament size two
Elitism 1
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 8 / 18
Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Genetic Algorithm design
Fitness function
Evolutionary random numbers
Evolutionary generation and degeneration of randomness
Fitness function (I)
The objective of the GA is to enhance randomness diversity
No theoretical clues about how to include or remove
randomness
Lack of a general randomness theory
Idea: Use Ent output as fitness
Seven Ent statistics: Entropy, compression, χ2, excess,
average mean, pierror, serial correlation
Statistics usually found in the literature
GA run for maximization and minimization
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 9 / 18
Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Genetic Algorithm design
Fitness function
Evolutionary random numbers
Evolutionary generation and degeneration of randomness
Fitness function (II)
All fitnessess were
formulated as
maximization
Tournament
selects the best
or worse
Ent statistic Fitness value
Entropy fentropy = Entropy
Compression fcompression = 100 − Compression
Chisquared fchisquared = 1
1+Chisquared
Excess fexcess = Excess
Average mean famean = 1
(1+|127,5−amean|)
Pierror fpierror = 1
1+pierror
Correlation fcorrelation = 1
1+|correlation|
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 10 / 18
Evolutionary generation and degeneration of randomness
Evolutionary random numbers
Maximization
Generation
Meanfitness
0.00400.00450.00500.0055
0 10 20 30 40 50
chisquared
5060708090100
0 10 20 30 40 50
compression
0.850.900.951.00
0 10 20 30 40 50
correlation
45678
0 10 20 30 40 50
entropy
5060708090100
0 10 20 30 40 50
excess
0.20.40.60.8
0 10 20 30 40 50
mean
0.00.20.40.6
0 10 20 30 40 50
pierror
Length
128
256
512
1024
8192
16384
32768
Minimization
Generation
Meanfitness
0.00.10.20.30.40.5
0 10 20 30 40 50
pierror
0.00100.00150.00200.00250.00300.00350.0040
0 10 20 30 40 50
chisquared
406080100
0 10 20 30 40 50
compression
0.50.60.70.80.91.0
0 10 20 30 40 50
correlation
345678
0 10 20 30 40 50
entropy
01020304050
0 10 20 30 40 50
excess
0.00.10.20.30.40.50.6
0 10 20 30 40 50
mean
Length
128
256
512
1024
8192
16384
32768
Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Building the dataset
GA populations were stored
Direct dump of chromosomes to disk
1, 559, 880 numbers generated in minimization
1, 559, 880 numbers generated in maximization
Ent statistics of each number were computed and stored
Seven statistics per number
Analysis in two phases
1 Effect of the sequence length
2 Correlation analysis
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 12 / 18
Ent battery independence analysis
Effect of the sequence length
1024
128
256
512
8192
entropy
0 10 30 50 90 110 130 150 0 20 60 100
45678
0103050
compression
chisquared
200250300
90110130150
mean
pierror
02060100
02060100
excess
4 5 6 7 8 200 250 300 0 20 60 100 −0.8 −0.2 0.2 0.6 −0.8−0.20.20.6
correlation
Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Effect of the length
Correlation matrix
Ent battery independence analysis
Correlation analysis
Correlation matrix with l = 8, 192 bits
(maximization / minimization)
Ent. Compression χ2
Mean Pierror Excess Correlation
Ent. 1.0 -0.80 / -0.81 -0.98 / -0.98 0.11 / -0.12 0.06 / 0.07 0.90 / 0.83 -0.09 / -0.08
Comp. - 1.0 0.78 / 0.80 -0.11 / 0.09 -0.05 / -0.07 -0.62 / -0.52 0.09 / 0.06
χ2
- - 1.0 -0.11 / 0.11 -0.06 / -0.07 -0.92, -0.84 0.09 / 0.08
Mean - - - 1.0 -0.00 / 0.32 0.09 / -0.10 -0.01 / -0.02
Pierror - - - - 1.0 0.05 / 0.04 0.03 / -0.02
Excess - - - - - 1.0 -0.09 / -0.08
Corr. - - - - - - 1.0
No evident difference between maximization and minimization
Several statistics clearly correlated: Entropy, compression, χ2
and excess
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 14 / 18
Ent battery independence analysis
Correlation analysis (II): Scatterplot with length 8,192 bits
entropy
2.0 2.4 2.8 110 130 150 0 20 40 60 80
7.727.767.807.84
2.02.42.8
compression
chisquared
250300350400
110130150
mean
pierror
010203040
020406080
excess
7.72 7.76 7.80 7.84 250 300 350 400 0 10 20 30 40 −0.3 −0.1 0.1 −0.3−0.10.1
correlation
Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Conclusions and future work (I)
General conclusions
PRNGs can be strengthen with a naïve GA
Ent provides five tests, seven metrics analyzed
Entropy and compression are almost the same metric
χ2 and excess highly correlated
Chain of correlations
Five metrics provide almost all the relevant information
Using more than five statistics may overestimate the tests
result
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 16 / 18
Introduction
Evolutionary generation and degeneration of randomness
Building the dataset
Ent battery independence analysis
Conclusions
Conclusions and future work (II)
Future work
Focus on the five uncorrelated metrics
Search non-linear relationships with symbolic regression
Extend the study to the NIST test battery
CEC 2017, San Sebastián, Spain Independence of the Ent test battery 17 / 18
Thanks for your attention!
¡Gracias!
Eskerrik asko!
Code and datasets can be downloaded from
http://atc1.aut.uah.es/~david/cec2017
David F. Barrero
david@aut.uah.es

Weitere ähnliche Inhalte

Ähnlich wie Evolutionary generation and degeneration of randomness to assess the independence of the Ent test battery

A genetic algorithm for the optimal design of a multistage amplifier
A genetic algorithm for the optimal design of a multistage amplifier  A genetic algorithm for the optimal design of a multistage amplifier
A genetic algorithm for the optimal design of a multistage amplifier IJECEIAES
 
Industrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spacesIndustrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spacesCapstone
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersUniversity of Huddersfield
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomicsUSC
 
2. 6 Preliminary Numerical Essay
2. 6 Preliminary Numerical Essay2. 6 Preliminary Numerical Essay
2. 6 Preliminary Numerical EssayDiane Allen
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive modelsChemAxon
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Rafael Nogueras
 
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free NetworksSelf-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free NetworksRafael Nogueras
 
Models Can Lie
Models Can LieModels Can Lie
Models Can LieRaju Rimal
 
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyOnline Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyManuel Martín
 
LSSC2011 Optimization of intermolecular interaction potential energy paramete...
LSSC2011 Optimization of intermolecular interaction potential energy paramete...LSSC2011 Optimization of intermolecular interaction potential energy paramete...
LSSC2011 Optimization of intermolecular interaction potential energy paramete...Dragan Sahpaski
 
Questions On Algorithms
Questions On AlgorithmsQuestions On Algorithms
Questions On AlgorithmsLisa Brown
 
Probability Modeling And Statistics Essay
Probability Modeling And Statistics EssayProbability Modeling And Statistics Essay
Probability Modeling And Statistics EssayAnna Herrera
 
Monte carlo simulation in food application
Monte carlo simulation in food applicationMonte carlo simulation in food application
Monte carlo simulation in food applicationPriya darshini
 
680report final
680report final680report final
680report finalRajesh M
 

Ähnlich wie Evolutionary generation and degeneration of randomness to assess the independence of the Ent test battery (20)

A genetic algorithm for the optimal design of a multistage amplifier
A genetic algorithm for the optimal design of a multistage amplifier  A genetic algorithm for the optimal design of a multistage amplifier
A genetic algorithm for the optimal design of a multistage amplifier
 
Industrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spacesIndustrial plant optimization in reduced dimensional spaces
Industrial plant optimization in reduced dimensional spaces
 
Longintro
LongintroLongintro
Longintro
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
I010415255
I010415255I010415255
I010415255
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
2. 6 Preliminary Numerical Essay
2. 6 Preliminary Numerical Essay2. 6 Preliminary Numerical Essay
2. 6 Preliminary Numerical Essay
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive models
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
 
FBA
FBAFBA
FBA
 
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free NetworksSelf-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
 
Models Can Lie
Models Can LieModels Can Lie
Models Can Lie
 
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyOnline Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
 
LSSC2011 Optimization of intermolecular interaction potential energy paramete...
LSSC2011 Optimization of intermolecular interaction potential energy paramete...LSSC2011 Optimization of intermolecular interaction potential energy paramete...
LSSC2011 Optimization of intermolecular interaction potential energy paramete...
 
report
reportreport
report
 
Questions On Algorithms
Questions On AlgorithmsQuestions On Algorithms
Questions On Algorithms
 
Probability Modeling And Statistics Essay
Probability Modeling And Statistics EssayProbability Modeling And Statistics Essay
Probability Modeling And Statistics Essay
 
Monte carlo simulation in food application
Monte carlo simulation in food applicationMonte carlo simulation in food application
Monte carlo simulation in food application
 
680report final
680report final680report final
680report final
 

Kürzlich hochgeladen

Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 

Kürzlich hochgeladen (16)

Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 

Evolutionary generation and degeneration of randomness to assess the independence of the Ent test battery

  • 1. Evolutionary generation and degeneration of randomness to assess the independence of the Ent test battery Julio Hernandez-Castro1 David F. Barrero2 1School of Computing, University of Kent, UK 2Computer Engineering Department, Universidad de Alcalá, Spain CEC 2017 San Sebastián, Spain June 5-8, 2017
  • 2. Outline 1 Introduction Motivation The Ent test battery Research question Research methodology 2 Evolutionary generation and degeneration of randomness Genetic Algorithm design Fitness function Evolutionary random numbers 3 Building the dataset 4 Ent battery independence analysis Effect of the length Correlation matrix 5 Conclusions and future work
  • 3. Introduction Evolutionary generation and degeneration of randomness Building the dataset Ent battery independence analysis Conclusions Motivation The Ent test battery Research question Research methodology Introduction Motivation Randomness is a critical topic in Security ... however it is difficult to generate in a computer ... even with dedicated hardware, open topic! Pseudorandom number generators (PRNG) Cryptosystems are vulnerable to weak PRNGs Need of assessing the quality of PRNGs Different aspects of randomness ⇒ Need of different tests Ideally, tests in a test battery should be independent Otherwise we would be assessing the same property In case of dependence, the test would yield optimistic results Two relevant test batteries: NIST and Ent CEC 2017, San Sebastián, Spain Independence of the Ent test battery 3 / 18
  • 4. Introduction Evolutionary generation and degeneration of randomness Building the dataset Ent battery independence analysis Conclusions Motivation The Ent test battery Research question Research methodology Introduction The Ent test battery (I) Ent is an open source randomness test battery implementation Compute a statistic and derive a p-value for a null hypothesis Ent contains five tests/statistics Entropy Entropy as defined in Information Theory Chi-square (χ2) Compares expected versus observed frequencies Arithmetic mean Arithmetic mean of the symbols Monte-Carlo Pi Monte-Carlo simulation to estimate π. Measured as error of the estimation (pierror) Serial correlation Correlation between two consecutive symbols CEC 2017, San Sebastián, Spain Independence of the Ent test battery 4 / 18
  • 5. Introduction Evolutionary generation and degeneration of randomness Building the dataset Ent battery independence analysis Conclusions Motivation The Ent test battery Research question Research methodology Introduction The Ent test battery (II) Ent output example Entropy = 4.385614 bits per byte. Optimum compression would reduce the size of this 20180 byte file by 45 percent. Chi square distribution for 20180 samples is 1266758.61, and randomly would exceed this value less than 0.01 percent of the times. Arithmetic mean value of data bytes is 56.5619 (127.5 = random). Monte Carlo value for Pi is 3.611061552 (error 14.94 percent). Serial correlation coefficient is 0.568671 (totally uncorrelated = 0.0). CEC 2017, San Sebastián, Spain Independence of the Ent test battery 5 / 18
  • 6. Introduction Evolutionary generation and degeneration of randomness Building the dataset Ent battery independence analysis Conclusions Motivation The Ent test battery Research question Research methodology Introduction Research question Research question Are the Ent tests independent? Hard analytical approach ⇒ Experimental approach Generate random numbers, run the tests and apply statistical tools to find dependencies The (scarce) literature on this topic uses p-values Good to compare tests and relate them to tests results Non-linear transformations that loose potentially valuable information We focus on the statistics CEC 2017, San Sebastián, Spain Independence of the Ent test battery 6 / 18
  • 7. Introduction Evolutionary generation and degeneration of randomness Building the dataset Ent battery independence analysis Conclusions Motivation The Ent test battery Research question Research methodology Introduction Research methodology Step I: Generate pseudorandom numbers Problem: Potential bias by the PRNG Solution: Use a GA Step II: Run the tests Store Ent statistics Step III: Study the independence Classical correlation analysis Generate numbers Run Ent Statistical analysis GA Store statistics Correlation study CEC 2017, San Sebastián, Spain Independence of the Ent test battery 7 / 18
  • 8. Introduction Evolutionary generation and degeneration of randomness Building the dataset Ent battery independence analysis Conclusions Genetic Algorithm design Fitness function Evolutionary random numbers Evolutionary generation and degeneration of randomness Genetic algorithm design Potential biases by PRNG Too weak (or strong) PRNG Solution: “Randomize” random numbers New problems Any randomization could induce new biases Solution: GA with different fitnesses Population 100 Codification Binary fixed size Length {2i }∀i = 7, . . . , 17 Initial pop Random, T-M PRNG Crossover One-point Mutation Bit flip, pm = 0,005 Selection Tournament size two Elitism 1 CEC 2017, San Sebastián, Spain Independence of the Ent test battery 8 / 18
  • 9. Introduction Evolutionary generation and degeneration of randomness Building the dataset Ent battery independence analysis Conclusions Genetic Algorithm design Fitness function Evolutionary random numbers Evolutionary generation and degeneration of randomness Fitness function (I) The objective of the GA is to enhance randomness diversity No theoretical clues about how to include or remove randomness Lack of a general randomness theory Idea: Use Ent output as fitness Seven Ent statistics: Entropy, compression, χ2, excess, average mean, pierror, serial correlation Statistics usually found in the literature GA run for maximization and minimization CEC 2017, San Sebastián, Spain Independence of the Ent test battery 9 / 18
  • 10. Introduction Evolutionary generation and degeneration of randomness Building the dataset Ent battery independence analysis Conclusions Genetic Algorithm design Fitness function Evolutionary random numbers Evolutionary generation and degeneration of randomness Fitness function (II) All fitnessess were formulated as maximization Tournament selects the best or worse Ent statistic Fitness value Entropy fentropy = Entropy Compression fcompression = 100 − Compression Chisquared fchisquared = 1 1+Chisquared Excess fexcess = Excess Average mean famean = 1 (1+|127,5−amean|) Pierror fpierror = 1 1+pierror Correlation fcorrelation = 1 1+|correlation| CEC 2017, San Sebastián, Spain Independence of the Ent test battery 10 / 18
  • 11. Evolutionary generation and degeneration of randomness Evolutionary random numbers Maximization Generation Meanfitness 0.00400.00450.00500.0055 0 10 20 30 40 50 chisquared 5060708090100 0 10 20 30 40 50 compression 0.850.900.951.00 0 10 20 30 40 50 correlation 45678 0 10 20 30 40 50 entropy 5060708090100 0 10 20 30 40 50 excess 0.20.40.60.8 0 10 20 30 40 50 mean 0.00.20.40.6 0 10 20 30 40 50 pierror Length 128 256 512 1024 8192 16384 32768 Minimization Generation Meanfitness 0.00.10.20.30.40.5 0 10 20 30 40 50 pierror 0.00100.00150.00200.00250.00300.00350.0040 0 10 20 30 40 50 chisquared 406080100 0 10 20 30 40 50 compression 0.50.60.70.80.91.0 0 10 20 30 40 50 correlation 345678 0 10 20 30 40 50 entropy 01020304050 0 10 20 30 40 50 excess 0.00.10.20.30.40.50.6 0 10 20 30 40 50 mean Length 128 256 512 1024 8192 16384 32768
  • 12. Introduction Evolutionary generation and degeneration of randomness Building the dataset Ent battery independence analysis Conclusions Building the dataset GA populations were stored Direct dump of chromosomes to disk 1, 559, 880 numbers generated in minimization 1, 559, 880 numbers generated in maximization Ent statistics of each number were computed and stored Seven statistics per number Analysis in two phases 1 Effect of the sequence length 2 Correlation analysis CEC 2017, San Sebastián, Spain Independence of the Ent test battery 12 / 18
  • 13. Ent battery independence analysis Effect of the sequence length 1024 128 256 512 8192 entropy 0 10 30 50 90 110 130 150 0 20 60 100 45678 0103050 compression chisquared 200250300 90110130150 mean pierror 02060100 02060100 excess 4 5 6 7 8 200 250 300 0 20 60 100 −0.8 −0.2 0.2 0.6 −0.8−0.20.20.6 correlation
  • 14. Introduction Evolutionary generation and degeneration of randomness Building the dataset Ent battery independence analysis Conclusions Effect of the length Correlation matrix Ent battery independence analysis Correlation analysis Correlation matrix with l = 8, 192 bits (maximization / minimization) Ent. Compression χ2 Mean Pierror Excess Correlation Ent. 1.0 -0.80 / -0.81 -0.98 / -0.98 0.11 / -0.12 0.06 / 0.07 0.90 / 0.83 -0.09 / -0.08 Comp. - 1.0 0.78 / 0.80 -0.11 / 0.09 -0.05 / -0.07 -0.62 / -0.52 0.09 / 0.06 χ2 - - 1.0 -0.11 / 0.11 -0.06 / -0.07 -0.92, -0.84 0.09 / 0.08 Mean - - - 1.0 -0.00 / 0.32 0.09 / -0.10 -0.01 / -0.02 Pierror - - - - 1.0 0.05 / 0.04 0.03 / -0.02 Excess - - - - - 1.0 -0.09 / -0.08 Corr. - - - - - - 1.0 No evident difference between maximization and minimization Several statistics clearly correlated: Entropy, compression, χ2 and excess CEC 2017, San Sebastián, Spain Independence of the Ent test battery 14 / 18
  • 15. Ent battery independence analysis Correlation analysis (II): Scatterplot with length 8,192 bits entropy 2.0 2.4 2.8 110 130 150 0 20 40 60 80 7.727.767.807.84 2.02.42.8 compression chisquared 250300350400 110130150 mean pierror 010203040 020406080 excess 7.72 7.76 7.80 7.84 250 300 350 400 0 10 20 30 40 −0.3 −0.1 0.1 −0.3−0.10.1 correlation
  • 16. Introduction Evolutionary generation and degeneration of randomness Building the dataset Ent battery independence analysis Conclusions Conclusions and future work (I) General conclusions PRNGs can be strengthen with a naïve GA Ent provides five tests, seven metrics analyzed Entropy and compression are almost the same metric χ2 and excess highly correlated Chain of correlations Five metrics provide almost all the relevant information Using more than five statistics may overestimate the tests result CEC 2017, San Sebastián, Spain Independence of the Ent test battery 16 / 18
  • 17. Introduction Evolutionary generation and degeneration of randomness Building the dataset Ent battery independence analysis Conclusions Conclusions and future work (II) Future work Focus on the five uncorrelated metrics Search non-linear relationships with symbolic regression Extend the study to the NIST test battery CEC 2017, San Sebastián, Spain Independence of the Ent test battery 17 / 18
  • 18. Thanks for your attention! ¡Gracias! Eskerrik asko! Code and datasets can be downloaded from http://atc1.aut.uah.es/~david/cec2017 David F. Barrero david@aut.uah.es