Suche senden
Hochladen
Genomics Crash Course for Data Engineers
•
49 gefällt mir
•
20,327 views
Allen Day, PhD
Folgen
Genomics Crash Course for Data Engineers
Weniger lesen
Mehr lesen
Daten & Analysen
Wissenschaft
Melden
Teilen
Melden
Teilen
1 von 54
Empfohlen
RedNatura red natura
RedNatura red natura
Roberto Santillan
Pagina web, sitio web, web 2.0 y web 3.0
Pagina web, sitio web, web 2.0 y web 3.0
Jorge Garcia
Mexico
Mexico
antoniocruzdosar
Pedro Rojas: Estrategias de Reclutamiento 2.0
Pedro Rojas: Estrategias de Reclutamiento 2.0
jobandtalent2
Death and Disease Rates of Vegetarians and Vegans – Summary of Prospective Co...
Death and Disease Rates of Vegetarians and Vegans – Summary of Prospective Co...
Jussi Riekki
The Future Of Work & The Work Of The Future
The Future Of Work & The Work Of The Future
Arturo Pelayo
Seguros 2.0
Seguros 2.0
pocketbox
Edecanes mexico
Edecanes mexico
disenolumnivision
Empfohlen
RedNatura red natura
RedNatura red natura
Roberto Santillan
Pagina web, sitio web, web 2.0 y web 3.0
Pagina web, sitio web, web 2.0 y web 3.0
Jorge Garcia
Mexico
Mexico
antoniocruzdosar
Pedro Rojas: Estrategias de Reclutamiento 2.0
Pedro Rojas: Estrategias de Reclutamiento 2.0
jobandtalent2
Death and Disease Rates of Vegetarians and Vegans – Summary of Prospective Co...
Death and Disease Rates of Vegetarians and Vegans – Summary of Prospective Co...
Jussi Riekki
The Future Of Work & The Work Of The Future
The Future Of Work & The Work Of The Future
Arturo Pelayo
Seguros 2.0
Seguros 2.0
pocketbox
Edecanes mexico
Edecanes mexico
disenolumnivision
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
IntelHealthcare
Presentación 2018-2019
Presentación 2018-2019
Juan José Taboada León
Data analytics challenges in genomics
Data analytics challenges in genomics
mikaelhuss
Genomics isn't Special
Genomics isn't Special
Allen Day, PhD
CAD CAM CAE
CAD CAM CAE
Rejvi Ahmed
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
Allen Day, PhD
Crowdfunding: an Easy and Creative Way of Funding
Crowdfunding: an Easy and Creative Way of Funding
justverycurious
7 #designgames The Innovation Games: methods to help teams develop breakthrou...
7 #designgames The Innovation Games: methods to help teams develop breakthrou...
John Knight
How to pitch an american VC by Blake Armstrong, Partner at TheFamily
How to pitch an american VC by Blake Armstrong, Partner at TheFamily
TheFamily
Cad cam cae
Cad cam cae
Fab Lab LIMA
How Scientists Engage the Public
How Scientists Engage the Public
Pew Research Center's Internet & American Life Project
Earth images from space 2014 (2014年 太空拍的地球照片)
Earth images from space 2014 (2014年 太空拍的地球照片)
Chung Yen Chang
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
Allen Day, PhD
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]
Allen Day, PhD
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Allen Day, PhD
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Allen Day, PhD
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Allen Day, PhD
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and Genomics
Allen Day, PhD
Hadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
MapR Technologies
Genome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data Style
Julius Remigio, CBIP
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Allen Day, PhD
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Databricks
Weitere ähnliche Inhalte
Andere mochten auch
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
IntelHealthcare
Presentación 2018-2019
Presentación 2018-2019
Juan José Taboada León
Data analytics challenges in genomics
Data analytics challenges in genomics
mikaelhuss
Genomics isn't Special
Genomics isn't Special
Allen Day, PhD
CAD CAM CAE
CAD CAM CAE
Rejvi Ahmed
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
Allen Day, PhD
Crowdfunding: an Easy and Creative Way of Funding
Crowdfunding: an Easy and Creative Way of Funding
justverycurious
7 #designgames The Innovation Games: methods to help teams develop breakthrou...
7 #designgames The Innovation Games: methods to help teams develop breakthrou...
John Knight
How to pitch an american VC by Blake Armstrong, Partner at TheFamily
How to pitch an american VC by Blake Armstrong, Partner at TheFamily
TheFamily
Cad cam cae
Cad cam cae
Fab Lab LIMA
How Scientists Engage the Public
How Scientists Engage the Public
Pew Research Center's Internet & American Life Project
Earth images from space 2014 (2014年 太空拍的地球照片)
Earth images from space 2014 (2014年 太空拍的地球照片)
Chung Yen Chang
Andere mochten auch
(12)
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
Presentación 2018-2019
Presentación 2018-2019
Data analytics challenges in genomics
Data analytics challenges in genomics
Genomics isn't Special
Genomics isn't Special
CAD CAM CAE
CAD CAM CAE
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
Crowdfunding: an Easy and Creative Way of Funding
Crowdfunding: an Easy and Creative Way of Funding
7 #designgames The Innovation Games: methods to help teams develop breakthrou...
7 #designgames The Innovation Games: methods to help teams develop breakthrou...
How to pitch an american VC by Blake Armstrong, Partner at TheFamily
How to pitch an american VC by Blake Armstrong, Partner at TheFamily
Cad cam cae
Cad cam cae
How Scientists Engage the Public
How Scientists Engage the Public
Earth images from space 2014 (2014年 太空拍的地球照片)
Earth images from space 2014 (2014年 太空拍的地球照片)
Ähnlich wie Genomics Crash Course for Data Engineers
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
Allen Day, PhD
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]
Allen Day, PhD
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Allen Day, PhD
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Allen Day, PhD
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Allen Day, PhD
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and Genomics
Allen Day, PhD
Hadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
MapR Technologies
Genome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data Style
Julius Remigio, CBIP
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Allen Day, PhD
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Databricks
Deep Learning for AI (3)
Deep Learning for AI (3)
Dongheon Lee
Machine Learning: Past, Present and Future - by Tom Dietterich
Machine Learning: Past, Present and Future - by Tom Dietterich
BigML, Inc
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
Carol McDonald
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
Kees van Bochove
[Keynote] predictive technologies and the prediction of technology - Bob Will...
[Keynote] predictive technologies and the prediction of technology - Bob Will...
PAPIs.io
Hadoop recognition of biomedical named entity using conditional random fields...
Hadoop recognition of biomedical named entity using conditional random fields...
LeMeniz Infotech
Parkinson disease classification v2.0
Parkinson disease classification v2.0
Nikhil Shrivastava, MS, SAFe PMPO
COMPUTERS IN PHARMACEUTICAL DEVELOPMENT
COMPUTERS IN PHARMACEUTICAL DEVELOPMENT
Arunpandiyan59
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Bigfinite
From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)
Novartis Institutes for BioMedical Research
Ähnlich wie Genomics Crash Course for Data Engineers
(20)
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and Genomics
Hadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
Genome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data Style
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Deep Learning for AI (3)
Deep Learning for AI (3)
Machine Learning: Past, Present and Future - by Tom Dietterich
Machine Learning: Past, Present and Future - by Tom Dietterich
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
[Keynote] predictive technologies and the prediction of technology - Bob Will...
[Keynote] predictive technologies and the prediction of technology - Bob Will...
Hadoop recognition of biomedical named entity using conditional random fields...
Hadoop recognition of biomedical named entity using conditional random fields...
Parkinson disease classification v2.0
Parkinson disease classification v2.0
COMPUTERS IN PHARMACEUTICAL DEVELOPMENT
COMPUTERS IN PHARMACEUTICAL DEVELOPMENT
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)
Mehr von Allen Day, PhD
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
Allen Day, PhD
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
Allen Day, PhD
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
Allen Day, PhD
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
Allen Day, PhD
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
Allen Day, PhD
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam
Allen Day, PhD
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
Allen Day, PhD
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
Allen Day, PhD
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
Allen Day, PhD
Building Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
Allen Day, PhD
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Allen Day, PhD
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
Allen Day, PhD
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics
Allen Day, PhD
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
Allen Day, PhD
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
Allen Day, PhD
Mehr von Allen Day, PhD
(15)
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
Building Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
Kürzlich hochgeladen
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Delhi Call girls
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
firstjob4
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
Delhi Call girls
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
manisha194592
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
olyaivanovalion
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
olyaivanovalion
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
olyaivanovalion
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
olyaivanovalion
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
michael115558
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
SUHANI PANDEY
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
Invezz1
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
olyaivanovalion
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
Dr. Soumendra Kumar Patra
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Rachmat Ramadhan H
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Kürzlich hochgeladen
(20)
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Genomics Crash Course for Data Engineers
1.
© 2014 MapR
Technologies 1
2.
© 2014 MapR
Technologies 2 Biomedical & Advertising Tech Overarching Themes* *Obligatory movie references… shout-out to my hometown LA Eugenics & Determinism Free will vs. Determinism Media Tech & Privacy
3.
© 2014 MapR
Technologies 3 Biomedical Research Goal: Therapeutics => Diagnostics => Prognostics • Therapeutics => traditional medicine • Diagnostics => personalized medicine – NextGen public health – Requires hi-res mechanical knowledge – Reverse engineer how genetic variation leads to (un)desired traits • Prognostics => GATTACA (dys/eu)topia – Managed populations / NextGen eugenics
4.
© 2014 MapR
Technologies 4Star Wars III: Revenge of the Sith
5.
© 2014 MapR
Technologies 5Star Wars V: The Empire Strikes Back
6.
© 2014 MapR
Technologies 6 Genetic Basis of Facial Features self-reported values of {sex, ancestry} + observer scores [race, sex]} + 3D facial scan + genome scan ______________________________ Allelic model of 20 genes that determine facial characteristics Claes, et al. 2014. Modeling 3D Facial Shape from DNA
7.
© 2014 MapR
Technologies 7 Genetic Basis of Facial Features Claes, et al. 2014. Modeling 3D Facial Shape from DNA
8.
© 2014 MapR
Technologies 8 So Get Ready… www.theness.com
9.
© 2014 MapR
Technologies 9© 2014 MapR Technologies Genomics Crash Course for Data Engineers
10.
© 2014 MapR
Technologies 10 Me, Us • Allen Day, Principal Data Scientist, MapR 5yr Hadoop Dev, R project contributor PhD, Human Genetics, UCLA Medicine • MapR Distributes open source components for Hadoop Adds major technology for performance, HA, industry standard API’s • See Also – “allenday” most places (twitter, github, etc.) – @mapR
11.
© 2014 MapR
Technologies 11 Clinical Sequencing Business Process Workflow PhysicianPatient Clinic blood/saliva Clinical Lab Analytics extract
12.
© 2014 MapR
Technologies 12 One Bad MTHFR MTHFR C677T Methylfolate helps make neurotransmitters in your brain. When methylfolate levels are low, so are your neurotransmitters. Low production of neurotransmitters may cause conditions of addictive behavior, depression, anxiety, ADHD, mania, irritability, insomnia, learning disorders and others. Everyone should get tested. Why? Because 1 in 2 people are affected and if one knows they have a MTHFR polymorphism, they know they have to be very proactive in taking care of themselves. http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The- Link-Between-MTHFR-Gene-Mutations-and-Disease-Including-Thyroid- Health.htm
13.
© 2014 MapR
Technologies 13 One Bad MTHFR MTHFR C677T Methylfolate helps make neurotransmitters in your brain. When methylfolate levels are low, so are your neurotransmitters. Low production of neurotransmitters may cause conditions of addictive behavior, depression, anxiety, ADHD, mania, irritability, insomnia, learning disorders and others. Everyone should get tested. Why? Because 1 in 2 people are affected and if one knows they have a MTHFR polymorphism, they know they have to be very proactive in taking care of themselves. http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The- Link-Between-MTHFR-Gene-Mutations-and-Disease-Including-Thyroid- Health.htm
14.
© 2014 MapR
Technologies 14 One Bad MTHFR MTHFR C677T Methylfolate helps make neurotransmitters in your brain. When methylfolate levels are low, so are your neurotransmitters. Low production of neurotransmitters may cause conditions of addictive behavior, depression, anxiety, ADHD, mania, irritability, insomnia, learning disorders and others. Everyone should get tested. Why? Because 1 in 2 people are affected and if one knows they have a MTHFR polymorphism, they know they have to be very proactive in taking care of themselves. http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The- Link-Between-MTHFR-Gene-Mutations-and-Disease-Including-Thyroid- Health.htm
15.
© 2014 MapR
Technologies 15 One Bad MTHFR MTHFR C677T Methylfolate helps make neurotransmitters in your brain. When methylfolate levels are low, so are your neurotransmitters. Low production of neurotransmitters may cause conditions of addictive behavior, depression, anxiety, ADHD, mania, irritability, insomnia, learning disorders and others. Everyone should get tested. Why? Because 1 in 2 people are affected and if one knows they have a MTHFR polymorphism, they know they have to be very proactive in taking care of themselves. http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The- Link-Between-MTHFR-Gene-Mutations-and-Disease-Including-Thyroid- Health.htm
16.
© 2014 MapR
Technologies 16 Clinical Sequencing Business Process Workflow PhysicianPatient Clinic blood/saliva Clinical Lab Analytics extract
17.
© 2014 MapR
Technologies 17 Clinical Genomics, Information Systems Perspective Compressed Structured Base4 Data Uncompressed Unstructured Base2 Data extract Base4=>Base2 Converter [[ DE-STRUCTURES ]] “BI” Reporting and Visualization tools PhysicianPatient AnalystStakeholder
18.
© 2014 MapR
Technologies 18 Clinical Genomics, Information Systems Perspective PhysicianPatient AnalystStakeholder ETL Reporting and Viz Data Store Analytics
19.
© 2014 MapR
Technologies 19 Sequencing “Even Moore’s Law” Stein. 2010. The case for cloud computing in genome informatics
20.
© 2014 MapR
Technologies 20 The Evolving Genomics Workload Sboner, et al, 2011. The real cost of sequencing: higher than you think! <= 1º analytics “current high ROI use cases” <= 2º analytics “next-gen high ROI use cases”
21.
© 2014 MapR
Technologies 21 Clinical Genomics, Information Systems Perspective PhysicianPatient AnalystStakeholder ETL Reporting and Viz Data Store Analytics 1º analytics 2º analytics Not much in this presentation, see also: http://slidesha.re/1sC2BOX
22.
© 2014 MapR
Technologies 22 Sequence Analysis, Quick Partial Details […] G A C T A G A fragment1 A C A G T T T A C A fragment2 A G A T A - - A G A fragment3 A A C A G C T T A C A […] fragment4 C T A T A G A T A A fragment5 […] G A T T A C A G A T T A C A G A T T A C A […] referenceDNA […] G A C T A C A G A T A A C A G A T T A C A […] patient__DNA
23.
© 2014 MapR
Technologies 23 What is the (Probable) Color of Each Column?
24.
© 2014 MapR
Technologies 24 Which Columns are (probably) Not White? Strategy 1: examine foreach column, foreach row O(rows*cols) + O(1 col) memory
25.
© 2014 MapR
Technologies 25 Which Columns are (probably) Not White? Strategy 2: examine foreach row. keep running tallies O(rows) + O(rows*cols) memory
26.
© 2014 MapR
Technologies 26 Which Columns are (probably) Not White? Strategy 3: rotate matrix. examine foreach column O(rows log rows) + O(cols) + O(1 col) memory
27.
© 2014 MapR
Technologies 27 Comparison of Strategies Strategy 1 • Low mem req • Random access pattern, many ops Strategy 3 • Low mem req • Sequential access pattern • Requires Sort Strategy 2 • High mem req • Sequential access pattern O(rows*cols) + O(1 col) memory O(rows) + O(rows*cols) memory O(rows log rows) + O(cols) + O(1 col) memory
28.
© 2014 MapR
Technologies 28 Comparison of Strategies Strategy 1 • Low mem req • Random access pattern, many ops Strategy 3 • Low mem req • Sequential access pattern • Requires Sort Strategy 2 • High mem req • Sequential access pattern O(rows*cols) + O(1 col) memory O(rows) + O(rows*cols) memory O(rows log rows) ÷ shards + O(cols) ÷ shards + O(1 col) memory As # of rows & columns increases Strategy 3 becomes more attractive
29.
© 2014 MapR
Technologies 29 1º Sequence Analysis (ETL), MapReduce style .fastq .bam .vcf short read alignment genotype calling MAP MAP REDUCE, rotate matrix 90º (O(mn)) / 1 (O(mn) + O(n log n)) / s
30.
© 2014 MapR
Technologies 30 Crossbow (MapReduce Strategy, implemented) Langmead, et al. 2009. Searching for SNPs with cloud computing
31.
© 2014 MapR
Technologies 31 Ion Flux (MapReduce Strategy, implemented for Enterprise) • Sequencing workflow in MapReduce (Hadoop, Cascading, Amazon Elastic M/R) • Integrated with Ion Torrent as a plugin to stream sequence to the cloud • Emphasis on scalability and latency – assay->clinical report turnaround in < 24h • Compare to fast-follower stack ILMN MiSeq+BaseSpace http://aws.amazon.com/solutions/case-studies/ion-flux/ http://ionflux.com
32.
© 2014 MapR
Technologies 32© 2014 MapR Technologies Non-Genomics Digression, 1 of 2 Data Warehouse ETL Offload
33.
© 2014 MapR
Technologies 33 The Problem • Major telecom vendor • Key step in billing pipeline handled by data warehouse (EDW) • EDW at maximum capacity • Multiple rounds of software optimization already done • Revenue limiting (= career limiting) bottleneck
34.
© 2014 MapR
Technologies 34 Three Options 1. No more revenue growth 2. Increase EDW size – Expensive – Known to not scale well 3. Find a more scalable solution
35.
© 2014 MapR
Technologies 35 ETL CDR billing records Billing reports Data Warehouse Customer bills Original Flow – ELTL
36.
© 2014 MapR
Technologies 36 Simplified Analysis – EDW Strategy • 70% of EDW consumed by ELTL processing – Caused by 10% of code (CDR transformations) • 200% EDW capacity adds capital cost is ~X • Indirect costs non-trivial (floor space, power) • 150% performance increase (poor division of labor)
37.
© 2014 MapR
Technologies 37 ETL CDR billing records Billing reports Data Warehouse Customer billing With ETL Offload
38.
© 2014 MapR
Technologies 38 Simplified Analysis – MapR Strategy • Hardware + MapR cost ~1/20X • ETL replacement development costs ~1/20X • 300% performance increase
39.
© 2014 MapR
Technologies 39 Price Performance • EDW strategy – 1.5x performance – Cost is X • MapR Strategy – 3x performance – Cost is 1/10X • 20x cost/performance advantage for MapR strategy
40.
© 2014 MapR
Technologies 40 Platform Advantages • Standard Hadoop eco-system components allow efficient CDR parsing and ETL • MapR platform provides high availability, disaster recovery • MapR NFS interface allows direct load of transformed data
41.
© 2014 MapR
Technologies 41© 2014 MapR Technologies Non-Genomics Digression, 2 of 2
42.
© 2014 MapR
Technologies 42© 2014 MapR Technologies <Recommendation System. Redacted>
43.
© 2014 MapR
Technologies 50© 2014 MapR Technologies Hybrid Use-Cases
44.
© 2014 MapR
Technologies 51 MapR Data Platform Advantage, Telecommunications CO-OCCURRENCE (MAHOUT) SOLR INDEXING ETL BILLING REPORTS WEB TIERDATA WAREHOUSE CDR BILLING RECORDS CUSTOMER BILLING USER HISTORY QUERY / CONTEXT RECOMENDATIONS COMPLETE HISTORY (all users) ITEM META-DATA INDEX SHARDS
45.
© 2014 MapR
Technologies 52 MapR Data Platform Advantage, Clinical Genomics Epidemiological, Actuarial Analyses Denormalization for Search, Viz, Research ETL Clinical Reporting WEB TIERClinical Reporting Systems CLINICAL TREATMENT OF PATIENTS RESEARCHERS National Pop. Database INDEX SHARDSPrognostic Capability
46.
© 2014 MapR
Technologies 53© 2014 MapR Technologies Bonus Round: 2º Analytics
47.
© 2014 MapR
Technologies 54 Clinical Genomics, Information Systems Perspective PhysicianPatient AnalystStakeholder ETL Reporting and Viz Data Store Analytics 2º analytics Not much in this presentation, see also: http://slidesha.re/1sC2BOX
48.
© 2014 MapR
Technologies 55 Matrices A (U*Q) and B (U*V) Query Term = Clicked Term Users Query Terms Users Clicked Videos
49.
© 2014 MapR
Technologies 56 Relate Q to V Users Query Terms
50.
© 2014 MapR
Technologies 57 Relate Q to V Users Query Terms
51.
© 2014 MapR
Technologies 58 Relate Q to V: it’s a Cross-Recommender QueryTerms Videos
52.
© 2014 MapR
Technologies 59 Users Query Terms
53.
© 2014 MapR
Technologies 60 If they were unlabeled, would you know which is which? Friend. 2010. The Need for Precompetitive Integrative Bionetwork Disease Model Building NPR. 2011. The Search For Analysts To Make Sense Of 'Big Data’ http://www.npr.org/2011/11/30/142893065
54.
© 2014 MapR
Technologies 61 If they were unlabeled, would you know which is which? Friend. 2010. The Need for Precompetitive Integrative Bionetwork Disease Model Building • Identify network structures • Label them • Observe stimulus=>response space mapping • Purposefully target • PROFIT ! ! ! !