SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Silico-paleontology with graph databases
Rooting through the relics of digital evolution
Nic McPhee & David Donatucci (w/ Thomas Helmuth)
Division of Science and Mathematics
University of Minnesota, Morris
Morris, Minnesota, USA
May 2015
Genetic Programming Theory and Practice
University of Michigan
Ann Arbor, MI
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 1 / 26
Overview The Big Picture
The Big Picture
Genetic programming clearly works.
But we rarely know why or how.
Databases allow examination of the internal interactions of a run.
Graph databases better suited for this than relational databases.
Silico-paleontology can help us understand and improve our tools.
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 2 / 26
Overview Outline
Outline
1 What do we know? (And how do we talk about it?)
2 Using a graph database
3 Let’s go exploring!
4 Conclusions
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 3 / 26
What do we know? (And how do we talk about it?)
Outline
1 What do we know? (And how do we talk about it?)
We throw so much away
Summary results are highly lossy
Plots are better (but can still obscure details)
Can we zoom in to individual runs?
2 Using a graph database
3 Let’s go exploring!
4 Conclusions
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 4 / 26
What do we know? (And how do we talk about it?) We throw so much away
We keep/see/share so little
EC research has the potential to generate
huge amounts of data.
What do we normally do with that data?
We normally throw it away – &
paleontologists weep!
https://www.flickr.com/photos/blmoregon/14566767645/
https://www.flickr.com/photos/
nicmcphee/1323950471
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 5 / 26
What do we know? (And how do we talk about it?) Summary results are highly lossy
Oooh – a table of results!
Treatment
Problem L T I
RSWN 55 13 17
SYL 22 1 2
SLB 75 19 10
NTZ 57 15 7
These show successes on 4 problems
for 3 different treatments
L seems to be winning
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 6 / 26
What do we know? (And how do we talk about it?) Summary results are highly lossy
Oooh – a table of results!
Treatment
Problem L T I
RSWN 55 13 17
SYL 22 1 2
SLB 75 19 10
NTZ 57 15 7
But why?!?!?
What’s actually happening in all those
matings and crossovers and mutations
that makes the difference?
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 6 / 26
What do we know? (And how do we talk about it?) Plots are better (but can still obscure details)
Let’s draw pretty pictures
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
lexicasetourney
0 100 200 300
generation
error.diversity
So much more data!
Diversity over time across all
the runs.
L’s diversity (top) is consis-
tently higher than T (bot-
tom).
That might be important
(and supports some hy-
potheses).
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 7 / 26
What do we know? (And how do we talk about it?) Plots are better (but can still obscure details)
Let’s draw pretty pictures
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
lexicasetourney
0 100 200 300
generation
error.diversity
Still, this mushes all the runs
together.
And that likely obscures in-
teresting things.
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 7 / 26
What do we know? (And how do we talk about it?) Can we zoom in to individual runs?
Zooming in
0.2
0.4
0.6
0.8
0 25 50 75
generation
error.diversity
Focusing on one successful
L run now.
Three big diversity changes:
First 15 generations
have a sharp drop then
steep rise
Around generation 40 a
sharp drop and rise
Sharp drop at end just
before a solution is
found
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 8 / 26
What do we know? (And how do we talk about it?) Can we zoom in to individual runs?
Zooming in
0.2
0.4
0.6
0.8
0 25 50 75
generation
error.diversity
What’s happening at those
sections of the run?
We want to be able to dig
through a run and see what
happened.
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 8 / 26
Using a graph DB
Outline
1 What do we know? (And how do we talk about it?)
2 Using a graph database
Goals
Neo4j
Cypher
3 Let’s go exploring!
4 Conclusions
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 9 / 26
Using a graph DB Goals
Goals
We want to store and analyze all the
individuals and their relationships.
Ancestry relationships are naturally
modeled with a graph
So graph databases seem a natural tool
for the relationship part.
www.hokstad.com/family-tree-using-graphviz-and-ruby
(a) Distribution of fitness values (b) Ge
(d) Genealogy of the best individual (e) Ro
Fitness value (Pearson’s R2
)
0.0
[Burlacu et al., 2013]
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 10 / 26
Using a graph DB Neo4j
Neo4j graph database
Part of the new-ish NoSQL movement
Neo4j’s initial release was 2007
Started to take off in 2010
Represent individuals as nodes
Represent parent-child relationships as
edges
Easy to represent complex relationships
Easy to search for relationships
Efficient recursive queries, esp.
compared to traditional databases
http://neo4j.com
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 11 / 26
Using a graph DB Cypher
Cypher query language
Neo4j uses the Cypher query language.
Fundamental elements of Cypher
queries:
START
MATCH
WHERE
RETURN
Uses "ASCII art" to describe
relationships:
(p)- ->(c)
(p)-[r:PARENT_OF]->(c)
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 12 / 26
Using a graph DB Cypher
Can model (complex) paths
Find Nic’s parents:
(Nic)<-[:PARENT_OF]-(p)
Find all Nic’s grandparents:
(Nic)<-[:PARENT_OF*2]-(gp)
Find everyone at most 5 steps from Nic:
(Nic)<-[:PARENT_OF*1..5]-(a)
Find all Nic’s siblings:
(Nic)<-[:PARENT_OF]-()-[:PARENT_OF]->(s)
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 13 / 26
Let’s go exploring!
Outline
1 What do we know? (And how do we talk about it?)
2 Using a graph database
3 Let’s go exploring!
Setup
Comparing the end-games
4 Conclusions
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 14 / 26
Let’s go exploring! Setup
What are we exploring?
Tom Helmuth provided a lot of data:
A number of program synthesis problems taken from intro
computing texts
Three different selection mechanisms: Lexicase, tournament, and
implicit fitness sharing (IFS)
All using Clojush implementation of Lee Spector’s PushGP system
https://github.com/lspector/Clojush
Population size 1,000; ≤ 300 generations
See [Helmuth and Spector, 2015] for more.
We used batch-import tool and custom scripts to import into Neo4j.
https://github.com/jexp/batch-import
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 15 / 26
Let’s go exploring! Setup
Only just the beginning
We have data from hundreds of runs
Currently a very “by hand” process
Definitely learned valuable things about:
The behavior of lexicase
Role of alternation (a type of crossover) in PushGP
Impact of test cases on evolutionary dynamics
We’ll look at results from two runs:
Both successful on replace-space-with-newline problem
One using lexicase (sol’n found in 88 gens)
One using tournament selection (sol’n found in 151 gens)
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 16 / 26
Let’s go exploring! Comparing the end-games
How did we construct a winner?
How is a winner constructed at the end of a run?
This query finds all ancestors of a winner (zero total_error) going
back at most 8 steps:
MATCH (w) WHERE w.total_error = 0
MATCH (p)- ->(c)-[*0..7]->(w)
RETURN DISTINCT id(p), id(c);
8 steps is fairly arbitrary; returns a small enough set to visualize.
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 17 / 26
Let’s go exploring! Comparing the end-games
Comparing the end-games
Ancestry of winner(s) look very
different
Tournament selection (below):
Single winner w/ high
branching factor
Lexicase (right): 45 winners w/
much lower branching factor
Gen 142
Gen 143
Gen 144
Gen 145
Gen 146
Gen 147
Gen 148
Gen 149
Gen 150
233 5 2
3
2332
2
2
2
2
2
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 18 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
A number of observations:
45(!) “winning” individuals
Individual “86:261” is (a)
parent of all 45
Individual “86:261” is a
parent of 934 (of 1,000)
individuals in next
generation
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
Seriously?!? 934 offspring?!?
Turns out to an be extreme case
of a common phenomena with
lexicase
Nodes marked with diamonds
all had at least 100 offspring
Shaded diamonds also have at
least 5 offspring that are ances-
tors of or are winners
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
What’s the total error (fitness) of
“86:261”?
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
What’s the total error (fitness) of
“86:261”?
4,034(!)
Bottom quartile!
But had 934 offspring!
Failed to return on 4 cases
(error 1,000 each)
Got 2 other answers wrong
(error 17 each)
Terrible total error, but
perfect on 194 of 200 tests
Great for lexicase!
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
What’s the total error (fitness) of
“85:086”?
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
What’s the total error (fitness) of
“85:086”?
100,000!
Rank 971 out of 1,000
But had 180 offspring
Got all the “print” cases
Failed to return value for all
100 “return” cases (error
1,000 each)
Terrible total error, but
perfect on 100 of 200 tests
Fine for lexicase
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Lexicase selection
Gen 79
Gen 80
Gen 81
Gen 82
Gen 83
Gen 84
Gen 85
Gen 86
Gen 87
80:220
82:447
83:04783:124 83:619
84:319
85:086
86:261
87:71987:941 87:94742 Other Winners
High proportion of mutations:
Roughly half the offspring
in this graph created via
mutation
Probably why there’s less
branching
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
Let’s go exploring! Comparing the end-games
Tournament selection
Gen 142
Gen 143
Gen 144
Gen 145
Gen 146
Gen 147
Gen 148
Gen 149
Gen 150
233 5 2
3
2332
2
2
2
2
2
Much broader: 42 ancestors of a winner for tournament 9 gens
back; 14 for lexicase
About two-thirds created via crossover, so more branching than
lexicase
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 20 / 26
Let’s go exploring! Comparing the end-games
Number ancestors of “winners” over time
Gens from winner Lexicase Tournament
1 4 2
2 6 4
3 7 6
4 6 10
5 7 13
6 9 20
7 10 30
8 14 33
9 14 42
10 22 63
...
...
...
18 58 297
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 21 / 26
Let’s go exploring! Comparing the end-games
12 most fecund individuals
Lexicase Tournament
934 24
657 23
594 23
590 21
433 20
326 20
297 19
294 19
285 19
283 18
279 18
271 18
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 22 / 26
Conclusions
Outline
1 What do we know? (And how do we talk about it?)
2 Using a graph database
3 Let’s go exploring!
4 Conclusions
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 23 / 26
Conclusions
Conclusions
Still early days, but we can definitely see some useful things:
Differences in ways selection mechanisms work
Support for hypotheses (e.g., Tom’s paper)
Evidence for importance of crossover in PushGP
Impact of test cases on evolutionary dynamics
Future Work
Automate more of the work
Examine more runs/problems/etc.
Explore how to include this “on-line”
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 24 / 26
Conclusions
Thanks!
Thank you for your time and attention!
Thanks to M. Kirbie Dramdahl (University of Minnesota, Morris), and to
Lee Spector’s Computational Intelligence group (Hampshire College)
for ideas and feedback.
Contacts:
mcphee@morris.umn.edu
donat056@morris.umn.edu
thelmuth@cs.umass.edu
Questions?
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 25 / 26
References
References
Burlacu, B., Affenzeller, M., Kommenda, M., Winkler, S., and Kronberger, G. (2013).
Visualization of genetic lineages and inheritance information in genetic programming.
In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’13
Companion, pages 1351–1358, New York, NY, USA. ACM.
Helmuth, T. and Spector, L. (2015).
General program synthesis benchmark suite.
In Proceedings of the 17th Annual Conference on Genetic and Evolutionary Computation, GECCO ’15, New York, NY,
USA. ACM.
McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 26 / 26

Weitere ähnliche Inhalte

Ähnlich wie Silica-Paleontology with graph databases: Rooting through the relics of digital evolution

334-335.pdf336-337.pdf338-339.pdf340-341.pdfPrin.docx
334-335.pdf336-337.pdf338-339.pdf340-341.pdfPrin.docx334-335.pdf336-337.pdf338-339.pdf340-341.pdfPrin.docx
334-335.pdf336-337.pdf338-339.pdf340-341.pdfPrin.docx
gilbertkpeters11344
 
QUARTER-3.-LESSON-2-in-RESEARCH-II.pptx
QUARTER-3.-LESSON-2-in-RESEARCH-II.pptxQUARTER-3.-LESSON-2-in-RESEARCH-II.pptx
QUARTER-3.-LESSON-2-in-RESEARCH-II.pptx
RheaannCaparas1
 
Individual functional atlasing of the human brain with multitask fMRI data: l...
Individual functional atlasing of the human brain with multitask fMRI data: l...Individual functional atlasing of the human brain with multitask fMRI data: l...
Individual functional atlasing of the human brain with multitask fMRI data: l...
Ana Luísa Pinho
 
M2 l10 fairness, accountability, and transparency
M2 l10 fairness, accountability, and transparencyM2 l10 fairness, accountability, and transparency
M2 l10 fairness, accountability, and transparency
BoPeng76
 

Ähnlich wie Silica-Paleontology with graph databases: Rooting through the relics of digital evolution (20)

GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Data Visualization - A Brief Overview
Data Visualization - A Brief OverviewData Visualization - A Brief Overview
Data Visualization - A Brief Overview
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
 
Revisiting The UK EU Membership Referendum (Brexit) Poll Tracker
Revisiting The UK EU Membership Referendum (Brexit) Poll TrackerRevisiting The UK EU Membership Referendum (Brexit) Poll Tracker
Revisiting The UK EU Membership Referendum (Brexit) Poll Tracker
 
Pedersen acl2011-business-meeting
Pedersen acl2011-business-meetingPedersen acl2011-business-meeting
Pedersen acl2011-business-meeting
 
Practical Machine Learning at Work
Practical Machine Learning at WorkPractical Machine Learning at Work
Practical Machine Learning at Work
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
 
334-335.pdf336-337.pdf338-339.pdf340-341.pdfPrin.docx
334-335.pdf336-337.pdf338-339.pdf340-341.pdfPrin.docx334-335.pdf336-337.pdf338-339.pdf340-341.pdfPrin.docx
334-335.pdf336-337.pdf338-339.pdf340-341.pdfPrin.docx
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
High-Dimensional Data Visualization, Geometry, and Stock Market CrashesHigh-Dimensional Data Visualization, Geometry, and Stock Market Crashes
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
 
QUARTER-3.-LESSON-2-in-RESEARCH-II.pptx
QUARTER-3.-LESSON-2-in-RESEARCH-II.pptxQUARTER-3.-LESSON-2-in-RESEARCH-II.pptx
QUARTER-3.-LESSON-2-in-RESEARCH-II.pptx
 
#Dgo2019 Conference workshop A3 - viza
#Dgo2019 Conference workshop A3 - viza#Dgo2019 Conference workshop A3 - viza
#Dgo2019 Conference workshop A3 - viza
 
Individual functional atlasing of the human brain with multitask fMRI data: l...
Individual functional atlasing of the human brain with multitask fMRI data: l...Individual functional atlasing of the human brain with multitask fMRI data: l...
Individual functional atlasing of the human brain with multitask fMRI data: l...
 
Visualizing Genetic Programming Ancestries
Visualizing Genetic Programming AncestriesVisualizing Genetic Programming Ancestries
Visualizing Genetic Programming Ancestries
 
Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)Tweet sentiment analysis (Data mining)
Tweet sentiment analysis (Data mining)
 
M2 l10 fairness, accountability, and transparency
M2 l10 fairness, accountability, and transparencyM2 l10 fairness, accountability, and transparency
M2 l10 fairness, accountability, and transparency
 
Beginner's Guide to Getting Public Data into the Classroom
Beginner's Guide to Getting Public Data into the ClassroomBeginner's Guide to Getting Public Data into the Classroom
Beginner's Guide to Getting Public Data into the Classroom
 
Wayne gray presentation
Wayne gray presentationWayne gray presentation
Wayne gray presentation
 

Kürzlich hochgeladen

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 

Kürzlich hochgeladen (20)

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 

Silica-Paleontology with graph databases: Rooting through the relics of digital evolution

  • 1. Silico-paleontology with graph databases Rooting through the relics of digital evolution Nic McPhee & David Donatucci (w/ Thomas Helmuth) Division of Science and Mathematics University of Minnesota, Morris Morris, Minnesota, USA May 2015 Genetic Programming Theory and Practice University of Michigan Ann Arbor, MI McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 1 / 26
  • 2. Overview The Big Picture The Big Picture Genetic programming clearly works. But we rarely know why or how. Databases allow examination of the internal interactions of a run. Graph databases better suited for this than relational databases. Silico-paleontology can help us understand and improve our tools. McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 2 / 26
  • 3. Overview Outline Outline 1 What do we know? (And how do we talk about it?) 2 Using a graph database 3 Let’s go exploring! 4 Conclusions McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 3 / 26
  • 4. What do we know? (And how do we talk about it?) Outline 1 What do we know? (And how do we talk about it?) We throw so much away Summary results are highly lossy Plots are better (but can still obscure details) Can we zoom in to individual runs? 2 Using a graph database 3 Let’s go exploring! 4 Conclusions McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 4 / 26
  • 5. What do we know? (And how do we talk about it?) We throw so much away We keep/see/share so little EC research has the potential to generate huge amounts of data. What do we normally do with that data? We normally throw it away – & paleontologists weep! https://www.flickr.com/photos/blmoregon/14566767645/ https://www.flickr.com/photos/ nicmcphee/1323950471 McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 5 / 26
  • 6. What do we know? (And how do we talk about it?) Summary results are highly lossy Oooh – a table of results! Treatment Problem L T I RSWN 55 13 17 SYL 22 1 2 SLB 75 19 10 NTZ 57 15 7 These show successes on 4 problems for 3 different treatments L seems to be winning McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 6 / 26
  • 7. What do we know? (And how do we talk about it?) Summary results are highly lossy Oooh – a table of results! Treatment Problem L T I RSWN 55 13 17 SYL 22 1 2 SLB 75 19 10 NTZ 57 15 7 But why?!?!? What’s actually happening in all those matings and crossovers and mutations that makes the difference? McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 6 / 26
  • 8. What do we know? (And how do we talk about it?) Plots are better (but can still obscure details) Let’s draw pretty pictures 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 lexicasetourney 0 100 200 300 generation error.diversity So much more data! Diversity over time across all the runs. L’s diversity (top) is consis- tently higher than T (bot- tom). That might be important (and supports some hy- potheses). McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 7 / 26
  • 9. What do we know? (And how do we talk about it?) Plots are better (but can still obscure details) Let’s draw pretty pictures 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 lexicasetourney 0 100 200 300 generation error.diversity Still, this mushes all the runs together. And that likely obscures in- teresting things. McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 7 / 26
  • 10. What do we know? (And how do we talk about it?) Can we zoom in to individual runs? Zooming in 0.2 0.4 0.6 0.8 0 25 50 75 generation error.diversity Focusing on one successful L run now. Three big diversity changes: First 15 generations have a sharp drop then steep rise Around generation 40 a sharp drop and rise Sharp drop at end just before a solution is found McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 8 / 26
  • 11. What do we know? (And how do we talk about it?) Can we zoom in to individual runs? Zooming in 0.2 0.4 0.6 0.8 0 25 50 75 generation error.diversity What’s happening at those sections of the run? We want to be able to dig through a run and see what happened. McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 8 / 26
  • 12. Using a graph DB Outline 1 What do we know? (And how do we talk about it?) 2 Using a graph database Goals Neo4j Cypher 3 Let’s go exploring! 4 Conclusions McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 9 / 26
  • 13. Using a graph DB Goals Goals We want to store and analyze all the individuals and their relationships. Ancestry relationships are naturally modeled with a graph So graph databases seem a natural tool for the relationship part. www.hokstad.com/family-tree-using-graphviz-and-ruby (a) Distribution of fitness values (b) Ge (d) Genealogy of the best individual (e) Ro Fitness value (Pearson’s R2 ) 0.0 [Burlacu et al., 2013] McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 10 / 26
  • 14. Using a graph DB Neo4j Neo4j graph database Part of the new-ish NoSQL movement Neo4j’s initial release was 2007 Started to take off in 2010 Represent individuals as nodes Represent parent-child relationships as edges Easy to represent complex relationships Easy to search for relationships Efficient recursive queries, esp. compared to traditional databases http://neo4j.com McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 11 / 26
  • 15. Using a graph DB Cypher Cypher query language Neo4j uses the Cypher query language. Fundamental elements of Cypher queries: START MATCH WHERE RETURN Uses "ASCII art" to describe relationships: (p)- ->(c) (p)-[r:PARENT_OF]->(c) McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 12 / 26
  • 16. Using a graph DB Cypher Can model (complex) paths Find Nic’s parents: (Nic)<-[:PARENT_OF]-(p) Find all Nic’s grandparents: (Nic)<-[:PARENT_OF*2]-(gp) Find everyone at most 5 steps from Nic: (Nic)<-[:PARENT_OF*1..5]-(a) Find all Nic’s siblings: (Nic)<-[:PARENT_OF]-()-[:PARENT_OF]->(s) McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 13 / 26
  • 17. Let’s go exploring! Outline 1 What do we know? (And how do we talk about it?) 2 Using a graph database 3 Let’s go exploring! Setup Comparing the end-games 4 Conclusions McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 14 / 26
  • 18. Let’s go exploring! Setup What are we exploring? Tom Helmuth provided a lot of data: A number of program synthesis problems taken from intro computing texts Three different selection mechanisms: Lexicase, tournament, and implicit fitness sharing (IFS) All using Clojush implementation of Lee Spector’s PushGP system https://github.com/lspector/Clojush Population size 1,000; ≤ 300 generations See [Helmuth and Spector, 2015] for more. We used batch-import tool and custom scripts to import into Neo4j. https://github.com/jexp/batch-import McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 15 / 26
  • 19. Let’s go exploring! Setup Only just the beginning We have data from hundreds of runs Currently a very “by hand” process Definitely learned valuable things about: The behavior of lexicase Role of alternation (a type of crossover) in PushGP Impact of test cases on evolutionary dynamics We’ll look at results from two runs: Both successful on replace-space-with-newline problem One using lexicase (sol’n found in 88 gens) One using tournament selection (sol’n found in 151 gens) McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 16 / 26
  • 20. Let’s go exploring! Comparing the end-games How did we construct a winner? How is a winner constructed at the end of a run? This query finds all ancestors of a winner (zero total_error) going back at most 8 steps: MATCH (w) WHERE w.total_error = 0 MATCH (p)- ->(c)-[*0..7]->(w) RETURN DISTINCT id(p), id(c); 8 steps is fairly arbitrary; returns a small enough set to visualize. McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 17 / 26
  • 21. Let’s go exploring! Comparing the end-games Comparing the end-games Ancestry of winner(s) look very different Tournament selection (below): Single winner w/ high branching factor Lexicase (right): 45 winners w/ much lower branching factor Gen 142 Gen 143 Gen 144 Gen 145 Gen 146 Gen 147 Gen 148 Gen 149 Gen 150 233 5 2 3 2332 2 2 2 2 2 Gen 79 Gen 80 Gen 81 Gen 82 Gen 83 Gen 84 Gen 85 Gen 86 Gen 87 80:220 82:447 83:04783:124 83:619 84:319 85:086 86:261 87:71987:941 87:94742 Other Winners McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 18 / 26
  • 22. Let’s go exploring! Comparing the end-games Lexicase selection Gen 79 Gen 80 Gen 81 Gen 82 Gen 83 Gen 84 Gen 85 Gen 86 Gen 87 80:220 82:447 83:04783:124 83:619 84:319 85:086 86:261 87:71987:941 87:94742 Other Winners A number of observations: 45(!) “winning” individuals Individual “86:261” is (a) parent of all 45 Individual “86:261” is a parent of 934 (of 1,000) individuals in next generation McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
  • 23. Let’s go exploring! Comparing the end-games Lexicase selection Gen 79 Gen 80 Gen 81 Gen 82 Gen 83 Gen 84 Gen 85 Gen 86 Gen 87 80:220 82:447 83:04783:124 83:619 84:319 85:086 86:261 87:71987:941 87:94742 Other Winners Seriously?!? 934 offspring?!? Turns out to an be extreme case of a common phenomena with lexicase Nodes marked with diamonds all had at least 100 offspring Shaded diamonds also have at least 5 offspring that are ances- tors of or are winners McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
  • 24. Let’s go exploring! Comparing the end-games Lexicase selection Gen 79 Gen 80 Gen 81 Gen 82 Gen 83 Gen 84 Gen 85 Gen 86 Gen 87 80:220 82:447 83:04783:124 83:619 84:319 85:086 86:261 87:71987:941 87:94742 Other Winners What’s the total error (fitness) of “86:261”? McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
  • 25. Let’s go exploring! Comparing the end-games Lexicase selection Gen 79 Gen 80 Gen 81 Gen 82 Gen 83 Gen 84 Gen 85 Gen 86 Gen 87 80:220 82:447 83:04783:124 83:619 84:319 85:086 86:261 87:71987:941 87:94742 Other Winners What’s the total error (fitness) of “86:261”? 4,034(!) Bottom quartile! But had 934 offspring! Failed to return on 4 cases (error 1,000 each) Got 2 other answers wrong (error 17 each) Terrible total error, but perfect on 194 of 200 tests Great for lexicase! McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
  • 26. Let’s go exploring! Comparing the end-games Lexicase selection Gen 79 Gen 80 Gen 81 Gen 82 Gen 83 Gen 84 Gen 85 Gen 86 Gen 87 80:220 82:447 83:04783:124 83:619 84:319 85:086 86:261 87:71987:941 87:94742 Other Winners What’s the total error (fitness) of “85:086”? McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
  • 27. Let’s go exploring! Comparing the end-games Lexicase selection Gen 79 Gen 80 Gen 81 Gen 82 Gen 83 Gen 84 Gen 85 Gen 86 Gen 87 80:220 82:447 83:04783:124 83:619 84:319 85:086 86:261 87:71987:941 87:94742 Other Winners What’s the total error (fitness) of “85:086”? 100,000! Rank 971 out of 1,000 But had 180 offspring Got all the “print” cases Failed to return value for all 100 “return” cases (error 1,000 each) Terrible total error, but perfect on 100 of 200 tests Fine for lexicase McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
  • 28. Let’s go exploring! Comparing the end-games Lexicase selection Gen 79 Gen 80 Gen 81 Gen 82 Gen 83 Gen 84 Gen 85 Gen 86 Gen 87 80:220 82:447 83:04783:124 83:619 84:319 85:086 86:261 87:71987:941 87:94742 Other Winners High proportion of mutations: Roughly half the offspring in this graph created via mutation Probably why there’s less branching McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 19 / 26
  • 29. Let’s go exploring! Comparing the end-games Tournament selection Gen 142 Gen 143 Gen 144 Gen 145 Gen 146 Gen 147 Gen 148 Gen 149 Gen 150 233 5 2 3 2332 2 2 2 2 2 Much broader: 42 ancestors of a winner for tournament 9 gens back; 14 for lexicase About two-thirds created via crossover, so more branching than lexicase McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 20 / 26
  • 30. Let’s go exploring! Comparing the end-games Number ancestors of “winners” over time Gens from winner Lexicase Tournament 1 4 2 2 6 4 3 7 6 4 6 10 5 7 13 6 9 20 7 10 30 8 14 33 9 14 42 10 22 63 ... ... ... 18 58 297 McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 21 / 26
  • 31. Let’s go exploring! Comparing the end-games 12 most fecund individuals Lexicase Tournament 934 24 657 23 594 23 590 21 433 20 326 20 297 19 294 19 285 19 283 18 279 18 271 18 McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 22 / 26
  • 32. Conclusions Outline 1 What do we know? (And how do we talk about it?) 2 Using a graph database 3 Let’s go exploring! 4 Conclusions McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 23 / 26
  • 33. Conclusions Conclusions Still early days, but we can definitely see some useful things: Differences in ways selection mechanisms work Support for hypotheses (e.g., Tom’s paper) Evidence for importance of crossover in PushGP Impact of test cases on evolutionary dynamics Future Work Automate more of the work Examine more runs/problems/etc. Explore how to include this “on-line” McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 24 / 26
  • 34. Conclusions Thanks! Thank you for your time and attention! Thanks to M. Kirbie Dramdahl (University of Minnesota, Morris), and to Lee Spector’s Computational Intelligence group (Hampshire College) for ideas and feedback. Contacts: mcphee@morris.umn.edu donat056@morris.umn.edu thelmuth@cs.umass.edu Questions? McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 25 / 26
  • 35. References References Burlacu, B., Affenzeller, M., Kommenda, M., Winkler, S., and Kronberger, G. (2013). Visualization of genetic lineages and inheritance information in genetic programming. In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’13 Companion, pages 1351–1358, New York, NY, USA. ACM. Helmuth, T. and Spector, L. (2015). General program synthesis benchmark suite. In Proceedings of the 17th Annual Conference on Genetic and Evolutionary Computation, GECCO ’15, New York, NY, USA. ACM. McPhee & Donatucci (UMN Morris) Graph database analysis of GP dynamics May 2015, GPTP, Ann Arbor MI 26 / 26