SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Graphs	
  are	
  Feeding	
  the	
  World

Tim	
  Williamson	
  (@TimWilliate)

Data	
  Scientist	
  
Monsanto
Our	
  Growing	
  Planet	
  Faces	
  Difficult	
  Challenges
Sources: http://esa.un.org/unpd/wpp/; UN FAO Food Balance Sheet, “World Health Organization
Global and regional food consumption patterns and trends”; The World Bank, Food and Agriculture
Organization of the United Nations (FAO-STAT), Monsanto Internal Calculations; @TimWilliate #MonDataScience
Rising
Population
Growing enough for
a growing world
Global Population
1980 TODAY 2050
4.4B
7.1B
9.6B+
Limited
Farmland
Farmers will need to
produce enough food
with fewer resources
to support our
world population
Acres per Person
1961 2050
1 <1/3
Changing
Economies
and Diets
A growing global middle
class is choosing animal
protein – meat, eggs,
and dairy – as a larger
part of their diet
Dietary Percentage of Protein
14%
1965 2030
9%
Changing
Climate
Farmers are impacted
by climate change
in many ways:
WATER AVAILABILITY ISSUES
INCREASINGLY
UNPREDICTABLE WEATHER
INSECT RANGE EXPANSION
WEED PRESSURE CHANGES
CROP DISEASE INCREASES
PLANTING ZONE SHIFTS
Improved	
  Genetic	
  Gain	
  is	
  One	
  of	
  Several	
  Tools	
  
Humanity	
  has	
  to	
  Address	
  These	
  Challenges
Sources: http://www.ers.usda.gov/data-products/feed-grains-database/feed-grains-yearbook-tables.aspx
• 8	
  commodity	
  crops	
  and	
  18	
  vegetable	
  crop	
  
families,	
  sold	
  in	
  160	
  countries
Average US Corn Yield 1866 - 2014
Yield(Bushels/Acre)
0
45
90
135
180
Year
1865 1890 1915 1940 1965 1990 2015
@TimWilliate #MonDataScience
10,000 Years
Genetic	
  Gain	
  is	
  Created	
  Through	
  Breeding	
  Cycles
@TimWilliate #MonDataScience
X
Lab Data (Genotypes)
Field Data (Phenotypes)
Lab Data (Genotypes)
Field Data (Phenotypes)
Lab Data (Genotypes)
Lab Data (Genotypes)
Select the Best,
Discard the Rest
All Progeny of Two Parents Enter
Best One Leaves to
Become a Future Parent
1000’s crosses/year
Dozens progeny/cross
5-10 locations/progeny
$3-5 million/year
Screening
Field Trials
Every	
  Breeding	
  Cycle	
  Extends	
  a	
  Tree	
  of	
  Genetic	
  Ancestry
@TimWilliate #MonDataScience
C
A B
A B
C
A	
  single	
  parent
Forcing	
  Genetic	
  Ancestry	
  Data	
  into	
  Rows	
  and	
  Columns
• In	
  our	
  relational	
  store,	
  genetic	
  ancestry	
  data	
  was	
  spread	
  across	
  a	
  hierarchy	
  of	
  ~11	
  
tables	
  representing	
  a	
  total	
  of	
  ~895	
  million	
  rows	
  
• Every	
  read	
  became	
  an	
  unpleasant	
  exercise	
  in	
  CONNECT BY PRIOR
@TimWilliate #MonDataScience
Plant Plant:Plant Relationship
plant id attributes… plant id parent plant id parental role
Given	
  a	
  Starting	
  Population,	
  Return	
  All	
  Ancestors
ResponseTime(s)
0
6
12
18
24
30
Depth
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
SQL on Oracle Exadata
@TimWilliate #MonDataScience
Genetic	
  Ancestry	
  is	
  a	
  Naturally	
  Occurring	
  Graph
• ~700	
  million	
  nodes	
  
• ~1.2	
  billion	
  relationships	
  
• ~1.7	
  billion	
  properties
@TimWilliate #MonDataScience
:Plant :Plant
:PARENT
:Plant Inventory
:Plant Inventory
:PARENT
:Planting
:PLANTED
:Selection :SELECTED
:HARVESTED
:INVENTORY
Given	
  a	
  Starting	
  Population,	
  Return	
  All	
  Ancestors
ResponseTime(s)
0
6
12
18
24
30
Depth
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
SQL on Oracle Exadata Traversal Framework on Neo4j
~90x	
  Difference
@TimWilliate #MonDataScience
Retrieving	
  Genetic	
  Ancestry	
  in	
  a	
  ‘RESTful’	
  Style
4
2
3
:PARENT
{parental_role: male}
:PARENT
{parental_role: female}
1
5
:PARENT
{parental_role: female}
:PARENT
{parental_role: female}
6
:PARENT
{parental_role: female}
/population/1/ancestors
RESTful	
  Resource
{“nodes”: [
{“id”: 1},
{“id”: 2},
{“id”: 3},
{“id”: 4},
{“id”: 5},
{“id”: 6}
],
“relationships”: [
{“from”: 1, “to”: 2, “relation”: “PARENT”},
{“from”: 2, “to”: 3, “relation”: “PARENT”},
{“from”: 2, “to”: 4, “relation”: “PARENT”},
{“from”: 3, “to”: 5, “relation”: “PARENT”}
{“from”: 4, “to”: 6, “relation”: “PARENT”}
]}
@TimWilliate #MonDataScience
Building	
  a	
  Grammar	
  for	
  Ancestral	
  Milestones
/population/1/binary-­‐cross
RESTful	
  Resource
{
“male”: {“id”: 4},
“female”: {“id”: 3}
}
4
2
3
:PARENT
{parental_role: male}
:PARENT
{parental_role: female}
1
5
:PARENT
{parental_role: female}
:PARENT
{parental_role: female}
6
:PARENT
{parental_role: female}
@TimWilliate #MonDataScience
Pruning	
  Genetic	
  Ancestry	
  Trees	
  ‘On	
  the	
  Fly’
/population/1/ancestors?until-­‐first=binary-­‐cross
RESTful	
  Resource
{“nodes”: [
{“id”: 1},
{“id”: 2},
{“id”: 3},
{“id”: 4}
],
“relationships”: [
{“from”: 1, “to”: 2, “relation”: “PARENT”},
{“from”: 2, “to”: 3, “relation”: “PARENT”},
{“from”: 2, “to”: 4, “relation”: “PARENT”}
]}
4
2
3
:PARENT
{parental_role: male}
:PARENT
{parental_role: female}
1
5
:PARENT
{parental_role: female}
:PARENT
{parental_role: female}
6
:PARENT
{parental_role: female}
@TimWilliate #MonDataScience
Ancestry-­‐as-­‐a-­‐Service	
  is	
  Released	
  September	
  2014
REST API (Ancestry-as-a-Service)
Data Scientists
Application
Developers • >30	
  elements	
  of	
  RESTful	
  grammar	
  
• ~120	
  applications	
  and	
  data	
  scientists	
  
• 	
  >	
  600	
  million	
  REST	
  requests	
  
• 10x	
  performance	
  boost	
  	
  
• 1	
  month	
  analysis	
  now	
  takes	
  3	
  hours
@TimWilliate #MonDataScience
Real-­‐Time	
  Reads	
  Require	
  Real-­‐Time	
  Data
• Ingestion	
  volume	
  is	
  ~10	
  million	
  writes/day	
  (not	
  a	
  write	
  heavy	
  flow)	
  
• https://github.com/MonsantoCo/goldengate-­‐kafka-­‐adapter
Field + Lab
Applications
{
“table”: “foo”
“type”: “INSERT”
“columns”: [
{
“name”: “bar”,
“before”: “fizz”,
“after”: “buzz”
}
]
}
REST API
REST API (Ancestry-as-a-Service)
POST /population
PUT /population/1234
PUT /population/parents
DELETE /population
@TimWilliate #MonDataScience
We’ve	
  Got	
  Ancestry	
  Figured	
  Out…What’s	
  Next?
Genotype Phenotype
Environment
Ancestry
@TimWilliate #MonDataScience
Layering	
  Genotype	
  Data	
  Over	
  Ancestry	
  Trees
Genotype	
  nodes	
  act	
  
as	
  simple	
  pointers	
  to	
  
remote	
  systems	
  
which	
  store	
  the	
  raw	
  
data
@TimWilliate #MonDataScience
:Plant :Plant
:PARENT
:Plant Inventory
:Plant Inventory
:PARENT
:Planting
:PLANTED
:Selection :SELECTED
:HARVESTED
:INVENTORY
:Genotype
:HAS_GENOTYPE
:Genotype
:HAS_GENOTYPE
Retrieving	
  Ancestry	
  Trees	
  Annotated	
  with	
  Genotypes	
  
{“nodes”: [
{“id”: 1, “genotypes”: [{“id”: 123}]},
{“id”: 2},
{“id”: 3},
{“id”: 4, “genotypes”: [{“id”: 456}]},
{“id”: 5, “genotypes”: [{“id”: 789}]}
],
“relationships”: [
{“from”: 1, “to”: 2, “relation”: “PARENT}”,
{“from”: 2, “to”: 3, “relation”: “PARENT}”,
{“from”: 3, “to”: 4, “relation”: “PARENT”},
{“from”: 3, “to”: 5, “relation”: “PARENT”}
]}
3
2
1
:Genotype
{marker_count: 300}
:Genotype
{marker_count: 60,000}
:Genotype
{marker_count: 60,000}
54
/population/1/ancestors?until=genotyped-­‐ancestor&props=genotypes
@TimWilliate #MonDataScience
Estimate	
  the	
  Genotype	
  of	
  Every	
  Seed	
  Produced
Genotypes
Field + Lab
Applications
REST API
REST API (Ancestry-as-a-Service)
Genotype Estimation
Engine
Genotype Annotated
Ancestry Trees
Required Genotype
DataSets
Estimated
Genotypes
New Estimated
Genotypes Messages
@TimWilliate #MonDataScience
Let’s	
  Revisit	
  the	
  Flow	
  of	
  a	
  Breeding	
  Cycle
@TimWilliate #MonDataScience
X
Lab Data (Genotypes)
Estimate Hi-Res Genotypes
Lab Data (Genotypes)
Field Data (Phenotypes)
Lab Data (Genotypes)
Lab Data (Genotypes)
Select the Best,
Discard the Rest
All Progeny of Two Parents Enter
Best One Leaves to
Become a Future Parent
1000’s crosses/year
Dozens progeny/cross
1 genotype/progeny
< $1 million/year
Genome-Wide
Selection
Width of Pipeline
Increases to
Accommodate More
Crosses
A	
  Glimpse	
  Inside	
  Our	
  Active	
  ‘Graphy’	
  Work
Sources: http://biodiversitylibrary.org/page/27066167#page/125/mode/1up @TimWilliate #MonDataScience
Constructing	
  Coancestry	
  Matrices
A
B C
ED GF
A B C D E F G
A 1 0.5 0.5 0.25 0.25 0.25 0.25
B 1 0 0.5 0.5 0 0
C 1 0 0 0.5 0.5
D 1 0 0 0
E 1 0 0
F 1 0
G 1
Coancestry(A)
• Consider	
  a	
  reduced	
  ancestor	
  tree	
  only	
  between	
  crosses	
  
• A	
  progeny	
  inherits	
  50%	
  of	
  its	
  genetics	
  from	
  each	
  parent	
  
• Key	
  input	
  for	
  a	
  large	
  class	
  of	
  predictive	
  genetic	
  analysis	
  algorithms
@TimWilliate #MonDataScience
Thank	
  You	
  All
@TimWilliate
http://engineering.monsanto.com/
Special	
  thanks	
  to	
  my	
  teammates	
  
• Jason	
  Clark	
  
• Marshall	
  Marietta	
  

Weitere ähnliche Inhalte

Was ist angesagt? (8)

Gene Editing
Gene EditingGene Editing
Gene Editing
 
May 2012 Santa Barbara Audubon
May 2012 Santa Barbara AudubonMay 2012 Santa Barbara Audubon
May 2012 Santa Barbara Audubon
 
2015 Soil Science of America Meeting
2015 Soil Science of America Meeting2015 Soil Science of America Meeting
2015 Soil Science of America Meeting
 
Organ cloning
Organ cloningOrgan cloning
Organ cloning
 
Mutation powerpoint
Mutation powerpointMutation powerpoint
Mutation powerpoint
 
Pattern of fecal progestagens, estrogens, and andorgens associated with repro...
Pattern of fecal progestagens, estrogens, and andorgens associated with repro...Pattern of fecal progestagens, estrogens, and andorgens associated with repro...
Pattern of fecal progestagens, estrogens, and andorgens associated with repro...
 
Research Symposium Poster Draft
Research Symposium Poster DraftResearch Symposium Poster Draft
Research Symposium Poster Draft
 
Roots tech 2016
Roots tech 2016Roots tech 2016
Roots tech 2016
 

Ähnlich wie Graphs are Feeding the World

Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
StampedeCon
 
Partnering on crop wild relative research at three scales: commonalities for ...
Partnering on crop wild relative research at three scales: commonalities for ...Partnering on crop wild relative research at three scales: commonalities for ...
Partnering on crop wild relative research at three scales: commonalities for ...
CWRofUS
 

Ähnlich wie Graphs are Feeding the World (20)

Using the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support EcoinformaticsUsing the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support Ecoinformatics
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Computational approaches to study Genetics
Computational approaches to study GeneticsComputational approaches to study Genetics
Computational approaches to study Genetics
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
 
Project Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant BreedingProject Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant Breeding
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' world
 
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
 
Open Tree of Life @NSF
Open Tree of Life @NSFOpen Tree of Life @NSF
Open Tree of Life @NSF
 
20110222 behesty monitoring and measuring biodiversity
20110222 behesty monitoring and measuring biodiversity20110222 behesty monitoring and measuring biodiversity
20110222 behesty monitoring and measuring biodiversity
 
iPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio WorkshopiPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio Workshop
 
Big Data Field Museum
Big Data Field MuseumBig Data Field Museum
Big Data Field Museum
 
Remsen sherborne
Remsen sherborneRemsen sherborne
Remsen sherborne
 
Remsen sherborne
Remsen sherborneRemsen sherborne
Remsen sherborne
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
Tair workshop stanford2017
Tair workshop stanford2017Tair workshop stanford2017
Tair workshop stanford2017
 
Partnering on crop wild relative research at three scales: commonalities for ...
Partnering on crop wild relative research at three scales: commonalities for ...Partnering on crop wild relative research at three scales: commonalities for ...
Partnering on crop wild relative research at three scales: commonalities for ...
 

Kürzlich hochgeladen

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 

Kürzlich hochgeladen (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

Graphs are Feeding the World

  • 1. Graphs  are  Feeding  the  World
 Tim  Williamson  (@TimWilliate)
 Data  Scientist   Monsanto
  • 2. Our  Growing  Planet  Faces  Difficult  Challenges Sources: http://esa.un.org/unpd/wpp/; UN FAO Food Balance Sheet, “World Health Organization Global and regional food consumption patterns and trends”; The World Bank, Food and Agriculture Organization of the United Nations (FAO-STAT), Monsanto Internal Calculations; @TimWilliate #MonDataScience Rising Population Growing enough for a growing world Global Population 1980 TODAY 2050 4.4B 7.1B 9.6B+ Limited Farmland Farmers will need to produce enough food with fewer resources to support our world population Acres per Person 1961 2050 1 <1/3 Changing Economies and Diets A growing global middle class is choosing animal protein – meat, eggs, and dairy – as a larger part of their diet Dietary Percentage of Protein 14% 1965 2030 9% Changing Climate Farmers are impacted by climate change in many ways: WATER AVAILABILITY ISSUES INCREASINGLY UNPREDICTABLE WEATHER INSECT RANGE EXPANSION WEED PRESSURE CHANGES CROP DISEASE INCREASES PLANTING ZONE SHIFTS
  • 3. Improved  Genetic  Gain  is  One  of  Several  Tools   Humanity  has  to  Address  These  Challenges Sources: http://www.ers.usda.gov/data-products/feed-grains-database/feed-grains-yearbook-tables.aspx • 8  commodity  crops  and  18  vegetable  crop   families,  sold  in  160  countries Average US Corn Yield 1866 - 2014 Yield(Bushels/Acre) 0 45 90 135 180 Year 1865 1890 1915 1940 1965 1990 2015 @TimWilliate #MonDataScience 10,000 Years
  • 4. Genetic  Gain  is  Created  Through  Breeding  Cycles @TimWilliate #MonDataScience X Lab Data (Genotypes) Field Data (Phenotypes) Lab Data (Genotypes) Field Data (Phenotypes) Lab Data (Genotypes) Lab Data (Genotypes) Select the Best, Discard the Rest All Progeny of Two Parents Enter Best One Leaves to Become a Future Parent 1000’s crosses/year Dozens progeny/cross 5-10 locations/progeny $3-5 million/year Screening Field Trials
  • 5. Every  Breeding  Cycle  Extends  a  Tree  of  Genetic  Ancestry @TimWilliate #MonDataScience C A B A B C
  • 7. Forcing  Genetic  Ancestry  Data  into  Rows  and  Columns • In  our  relational  store,  genetic  ancestry  data  was  spread  across  a  hierarchy  of  ~11   tables  representing  a  total  of  ~895  million  rows   • Every  read  became  an  unpleasant  exercise  in  CONNECT BY PRIOR @TimWilliate #MonDataScience Plant Plant:Plant Relationship plant id attributes… plant id parent plant id parental role
  • 8. Given  a  Starting  Population,  Return  All  Ancestors ResponseTime(s) 0 6 12 18 24 30 Depth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SQL on Oracle Exadata @TimWilliate #MonDataScience
  • 9. Genetic  Ancestry  is  a  Naturally  Occurring  Graph • ~700  million  nodes   • ~1.2  billion  relationships   • ~1.7  billion  properties @TimWilliate #MonDataScience :Plant :Plant :PARENT :Plant Inventory :Plant Inventory :PARENT :Planting :PLANTED :Selection :SELECTED :HARVESTED :INVENTORY
  • 10. Given  a  Starting  Population,  Return  All  Ancestors ResponseTime(s) 0 6 12 18 24 30 Depth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SQL on Oracle Exadata Traversal Framework on Neo4j ~90x  Difference @TimWilliate #MonDataScience
  • 11. Retrieving  Genetic  Ancestry  in  a  ‘RESTful’  Style 4 2 3 :PARENT {parental_role: male} :PARENT {parental_role: female} 1 5 :PARENT {parental_role: female} :PARENT {parental_role: female} 6 :PARENT {parental_role: female} /population/1/ancestors RESTful  Resource {“nodes”: [ {“id”: 1}, {“id”: 2}, {“id”: 3}, {“id”: 4}, {“id”: 5}, {“id”: 6} ], “relationships”: [ {“from”: 1, “to”: 2, “relation”: “PARENT”}, {“from”: 2, “to”: 3, “relation”: “PARENT”}, {“from”: 2, “to”: 4, “relation”: “PARENT”}, {“from”: 3, “to”: 5, “relation”: “PARENT”} {“from”: 4, “to”: 6, “relation”: “PARENT”} ]} @TimWilliate #MonDataScience
  • 12. Building  a  Grammar  for  Ancestral  Milestones /population/1/binary-­‐cross RESTful  Resource { “male”: {“id”: 4}, “female”: {“id”: 3} } 4 2 3 :PARENT {parental_role: male} :PARENT {parental_role: female} 1 5 :PARENT {parental_role: female} :PARENT {parental_role: female} 6 :PARENT {parental_role: female} @TimWilliate #MonDataScience
  • 13. Pruning  Genetic  Ancestry  Trees  ‘On  the  Fly’ /population/1/ancestors?until-­‐first=binary-­‐cross RESTful  Resource {“nodes”: [ {“id”: 1}, {“id”: 2}, {“id”: 3}, {“id”: 4} ], “relationships”: [ {“from”: 1, “to”: 2, “relation”: “PARENT”}, {“from”: 2, “to”: 3, “relation”: “PARENT”}, {“from”: 2, “to”: 4, “relation”: “PARENT”} ]} 4 2 3 :PARENT {parental_role: male} :PARENT {parental_role: female} 1 5 :PARENT {parental_role: female} :PARENT {parental_role: female} 6 :PARENT {parental_role: female} @TimWilliate #MonDataScience
  • 14. Ancestry-­‐as-­‐a-­‐Service  is  Released  September  2014 REST API (Ancestry-as-a-Service) Data Scientists Application Developers • >30  elements  of  RESTful  grammar   • ~120  applications  and  data  scientists   •  >  600  million  REST  requests   • 10x  performance  boost     • 1  month  analysis  now  takes  3  hours @TimWilliate #MonDataScience
  • 15. Real-­‐Time  Reads  Require  Real-­‐Time  Data • Ingestion  volume  is  ~10  million  writes/day  (not  a  write  heavy  flow)   • https://github.com/MonsantoCo/goldengate-­‐kafka-­‐adapter Field + Lab Applications { “table”: “foo” “type”: “INSERT” “columns”: [ { “name”: “bar”, “before”: “fizz”, “after”: “buzz” } ] } REST API REST API (Ancestry-as-a-Service) POST /population PUT /population/1234 PUT /population/parents DELETE /population @TimWilliate #MonDataScience
  • 16. We’ve  Got  Ancestry  Figured  Out…What’s  Next? Genotype Phenotype Environment Ancestry @TimWilliate #MonDataScience
  • 17. Layering  Genotype  Data  Over  Ancestry  Trees Genotype  nodes  act   as  simple  pointers  to   remote  systems   which  store  the  raw   data @TimWilliate #MonDataScience :Plant :Plant :PARENT :Plant Inventory :Plant Inventory :PARENT :Planting :PLANTED :Selection :SELECTED :HARVESTED :INVENTORY :Genotype :HAS_GENOTYPE :Genotype :HAS_GENOTYPE
  • 18. Retrieving  Ancestry  Trees  Annotated  with  Genotypes   {“nodes”: [ {“id”: 1, “genotypes”: [{“id”: 123}]}, {“id”: 2}, {“id”: 3}, {“id”: 4, “genotypes”: [{“id”: 456}]}, {“id”: 5, “genotypes”: [{“id”: 789}]} ], “relationships”: [ {“from”: 1, “to”: 2, “relation”: “PARENT}”, {“from”: 2, “to”: 3, “relation”: “PARENT}”, {“from”: 3, “to”: 4, “relation”: “PARENT”}, {“from”: 3, “to”: 5, “relation”: “PARENT”} ]} 3 2 1 :Genotype {marker_count: 300} :Genotype {marker_count: 60,000} :Genotype {marker_count: 60,000} 54 /population/1/ancestors?until=genotyped-­‐ancestor&props=genotypes @TimWilliate #MonDataScience
  • 19. Estimate  the  Genotype  of  Every  Seed  Produced Genotypes Field + Lab Applications REST API REST API (Ancestry-as-a-Service) Genotype Estimation Engine Genotype Annotated Ancestry Trees Required Genotype DataSets Estimated Genotypes New Estimated Genotypes Messages @TimWilliate #MonDataScience
  • 20. Let’s  Revisit  the  Flow  of  a  Breeding  Cycle @TimWilliate #MonDataScience X Lab Data (Genotypes) Estimate Hi-Res Genotypes Lab Data (Genotypes) Field Data (Phenotypes) Lab Data (Genotypes) Lab Data (Genotypes) Select the Best, Discard the Rest All Progeny of Two Parents Enter Best One Leaves to Become a Future Parent 1000’s crosses/year Dozens progeny/cross 1 genotype/progeny < $1 million/year Genome-Wide Selection Width of Pipeline Increases to Accommodate More Crosses
  • 21. A  Glimpse  Inside  Our  Active  ‘Graphy’  Work Sources: http://biodiversitylibrary.org/page/27066167#page/125/mode/1up @TimWilliate #MonDataScience
  • 22. Constructing  Coancestry  Matrices A B C ED GF A B C D E F G A 1 0.5 0.5 0.25 0.25 0.25 0.25 B 1 0 0.5 0.5 0 0 C 1 0 0 0.5 0.5 D 1 0 0 0 E 1 0 0 F 1 0 G 1 Coancestry(A) • Consider  a  reduced  ancestor  tree  only  between  crosses   • A  progeny  inherits  50%  of  its  genetics  from  each  parent   • Key  input  for  a  large  class  of  predictive  genetic  analysis  algorithms @TimWilliate #MonDataScience
  • 23. Thank  You  All @TimWilliate http://engineering.monsanto.com/ Special  thanks  to  my  teammates   • Jason  Clark   • Marshall  Marietta