Suche senden
Hochladen
Genomics Crash Course for Data Engineers
•
49 gefällt mir
•
20,327 views
Allen Day, PhD
Folgen
Genomics Crash Course for Data Engineers
Weniger lesen
Mehr lesen
Daten & Analysen
Wissenschaft
Melden
Teilen
Melden
Teilen
1 von 54
Empfohlen
RedNatura red natura
RedNatura red natura
Roberto Santillan
Pagina web, sitio web, web 2.0 y web 3.0
Pagina web, sitio web, web 2.0 y web 3.0
Jorge Garcia
Mexico
Mexico
antoniocruzdosar
Pedro Rojas: Estrategias de Reclutamiento 2.0
Pedro Rojas: Estrategias de Reclutamiento 2.0
jobandtalent2
Death and Disease Rates of Vegetarians and Vegans – Summary of Prospective Co...
Death and Disease Rates of Vegetarians and Vegans – Summary of Prospective Co...
Jussi Riekki
The Future Of Work & The Work Of The Future
The Future Of Work & The Work Of The Future
Arturo Pelayo
Seguros 2.0
Seguros 2.0
pocketbox
Edecanes mexico
Edecanes mexico
disenolumnivision
Empfohlen
RedNatura red natura
RedNatura red natura
Roberto Santillan
Pagina web, sitio web, web 2.0 y web 3.0
Pagina web, sitio web, web 2.0 y web 3.0
Jorge Garcia
Mexico
Mexico
antoniocruzdosar
Pedro Rojas: Estrategias de Reclutamiento 2.0
Pedro Rojas: Estrategias de Reclutamiento 2.0
jobandtalent2
Death and Disease Rates of Vegetarians and Vegans – Summary of Prospective Co...
Death and Disease Rates of Vegetarians and Vegans – Summary of Prospective Co...
Jussi Riekki
The Future Of Work & The Work Of The Future
The Future Of Work & The Work Of The Future
Arturo Pelayo
Seguros 2.0
Seguros 2.0
pocketbox
Edecanes mexico
Edecanes mexico
disenolumnivision
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
IntelHealthcare
Presentación 2018-2019
Presentación 2018-2019
Juan José Taboada León
Data analytics challenges in genomics
Data analytics challenges in genomics
mikaelhuss
Genomics isn't Special
Genomics isn't Special
Allen Day, PhD
CAD CAM CAE
CAD CAM CAE
Rejvi Ahmed
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
Allen Day, PhD
Crowdfunding: an Easy and Creative Way of Funding
Crowdfunding: an Easy and Creative Way of Funding
justverycurious
7 #designgames The Innovation Games: methods to help teams develop breakthrou...
7 #designgames The Innovation Games: methods to help teams develop breakthrou...
John Knight
How to pitch an american VC by Blake Armstrong, Partner at TheFamily
How to pitch an american VC by Blake Armstrong, Partner at TheFamily
TheFamily
Cad cam cae
Cad cam cae
Fab Lab LIMA
How Scientists Engage the Public
How Scientists Engage the Public
Pew Research Center's Internet & American Life Project
Earth images from space 2014 (2014年 太空拍的地球照片)
Earth images from space 2014 (2014年 太空拍的地球照片)
Chung Yen Chang
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
Allen Day, PhD
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]
Allen Day, PhD
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Allen Day, PhD
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Allen Day, PhD
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Allen Day, PhD
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and Genomics
Allen Day, PhD
Hadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
MapR Technologies
Genome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data Style
Julius Remigio, CBIP
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Allen Day, PhD
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Databricks
Weitere ähnliche Inhalte
Andere mochten auch
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
IntelHealthcare
Presentación 2018-2019
Presentación 2018-2019
Juan José Taboada León
Data analytics challenges in genomics
Data analytics challenges in genomics
mikaelhuss
Genomics isn't Special
Genomics isn't Special
Allen Day, PhD
CAD CAM CAE
CAD CAM CAE
Rejvi Ahmed
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
Allen Day, PhD
Crowdfunding: an Easy and Creative Way of Funding
Crowdfunding: an Easy and Creative Way of Funding
justverycurious
7 #designgames The Innovation Games: methods to help teams develop breakthrou...
7 #designgames The Innovation Games: methods to help teams develop breakthrou...
John Knight
How to pitch an american VC by Blake Armstrong, Partner at TheFamily
How to pitch an american VC by Blake Armstrong, Partner at TheFamily
TheFamily
Cad cam cae
Cad cam cae
Fab Lab LIMA
How Scientists Engage the Public
How Scientists Engage the Public
Pew Research Center's Internet & American Life Project
Earth images from space 2014 (2014年 太空拍的地球照片)
Earth images from space 2014 (2014年 太空拍的地球照片)
Chung Yen Chang
Andere mochten auch
(12)
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
Intel - Challenges and Opportunities in Cloud-Based Genomics Analytics
Presentación 2018-2019
Presentación 2018-2019
Data analytics challenges in genomics
Data analytics challenges in genomics
Genomics isn't Special
Genomics isn't Special
CAD CAM CAE
CAD CAM CAE
Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
Crowdfunding: an Easy and Creative Way of Funding
Crowdfunding: an Easy and Creative Way of Funding
7 #designgames The Innovation Games: methods to help teams develop breakthrou...
7 #designgames The Innovation Games: methods to help teams develop breakthrou...
How to pitch an american VC by Blake Armstrong, Partner at TheFamily
How to pitch an american VC by Blake Armstrong, Partner at TheFamily
Cad cam cae
Cad cam cae
How Scientists Engage the Public
How Scientists Engage the Public
Earth images from space 2014 (2014年 太空拍的地球照片)
Earth images from space 2014 (2014年 太空拍的地球照片)
Ähnlich wie Genomics Crash Course for Data Engineers
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
Allen Day, PhD
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]
Allen Day, PhD
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Allen Day, PhD
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Allen Day, PhD
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Allen Day, PhD
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and Genomics
Allen Day, PhD
Hadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
MapR Technologies
Genome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data Style
Julius Remigio, CBIP
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Allen Day, PhD
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Databricks
Deep Learning for AI (3)
Deep Learning for AI (3)
Dongheon Lee
Machine Learning: Past, Present and Future - by Tom Dietterich
Machine Learning: Past, Present and Future - by Tom Dietterich
BigML, Inc
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
Carol McDonald
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
Kees van Bochove
[Keynote] predictive technologies and the prediction of technology - Bob Will...
[Keynote] predictive technologies and the prediction of technology - Bob Will...
PAPIs.io
Hadoop recognition of biomedical named entity using conditional random fields...
Hadoop recognition of biomedical named entity using conditional random fields...
LeMeniz Infotech
Parkinson disease classification v2.0
Parkinson disease classification v2.0
Nikhil Shrivastava, MS, SAFe PMPO
COMPUTERS IN PHARMACEUTICAL DEVELOPMENT
COMPUTERS IN PHARMACEUTICAL DEVELOPMENT
Arunpandiyan59
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Bigfinite
From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)
Novartis Institutes for BioMedical Research
Ähnlich wie Genomics Crash Course for Data Engineers
(20)
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and Genomics
Hadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
Genome Analysis Pipelines, Big Data Style
Genome Analysis Pipelines, Big Data Style
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Deep Learning for AI (3)
Deep Learning for AI (3)
Machine Learning: Past, Present and Future - by Tom Dietterich
Machine Learning: Past, Present and Future - by Tom Dietterich
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
[Keynote] predictive technologies and the prediction of technology - Bob Will...
[Keynote] predictive technologies and the prediction of technology - Bob Will...
Hadoop recognition of biomedical named entity using conditional random fields...
Hadoop recognition of biomedical named entity using conditional random fields...
Parkinson disease classification v2.0
Parkinson disease classification v2.0
COMPUTERS IN PHARMACEUTICAL DEVELOPMENT
COMPUTERS IN PHARMACEUTICAL DEVELOPMENT
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)
Mehr von Allen Day, PhD
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
Allen Day, PhD
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
Allen Day, PhD
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
Allen Day, PhD
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
Allen Day, PhD
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
Allen Day, PhD
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam
Allen Day, PhD
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
Allen Day, PhD
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
Allen Day, PhD
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
Allen Day, PhD
Building Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
Allen Day, PhD
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
Allen Day, PhD
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
Allen Day, PhD
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics
Allen Day, PhD
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
Allen Day, PhD
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
Allen Day, PhD
Mehr von Allen Day, PhD
(15)
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170426 - Deep Learning Applications in Genomics - Vancouver - Simon Fraser ...
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
Building Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
Kürzlich hochgeladen
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
Boston Institute of Analytics
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Florian Roscheck
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
jennyeacort
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
John Sterrett
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
208367051
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
dajasot375
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Boston Institute of Analytics
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
Amil Baba Dawood bangali
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
Jeremy Anderson
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
ssuserf63bd7
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
natarajan8993
While-For-loop in python used in college
While-For-loop in python used in college
ssuser7a7cd61
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
yuu sss
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
thyngster
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
F sss
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Jack DiGiovanna
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
jennyeacort
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
Pramod Kumar Srivastava
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
gstagge
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
Seán Kennedy
Kürzlich hochgeladen
(20)
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
While-For-loop in python used in college
While-For-loop in python used in college
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
Genomics Crash Course for Data Engineers
1.
© 2014 MapR
Technologies 1
2.
© 2014 MapR
Technologies 2 Biomedical & Advertising Tech Overarching Themes* *Obligatory movie references… shout-out to my hometown LA Eugenics & Determinism Free will vs. Determinism Media Tech & Privacy
3.
© 2014 MapR
Technologies 3 Biomedical Research Goal: Therapeutics => Diagnostics => Prognostics • Therapeutics => traditional medicine • Diagnostics => personalized medicine – NextGen public health – Requires hi-res mechanical knowledge – Reverse engineer how genetic variation leads to (un)desired traits • Prognostics => GATTACA (dys/eu)topia – Managed populations / NextGen eugenics
4.
© 2014 MapR
Technologies 4Star Wars III: Revenge of the Sith
5.
© 2014 MapR
Technologies 5Star Wars V: The Empire Strikes Back
6.
© 2014 MapR
Technologies 6 Genetic Basis of Facial Features self-reported values of {sex, ancestry} + observer scores [race, sex]} + 3D facial scan + genome scan ______________________________ Allelic model of 20 genes that determine facial characteristics Claes, et al. 2014. Modeling 3D Facial Shape from DNA
7.
© 2014 MapR
Technologies 7 Genetic Basis of Facial Features Claes, et al. 2014. Modeling 3D Facial Shape from DNA
8.
© 2014 MapR
Technologies 8 So Get Ready… www.theness.com
9.
© 2014 MapR
Technologies 9© 2014 MapR Technologies Genomics Crash Course for Data Engineers
10.
© 2014 MapR
Technologies 10 Me, Us • Allen Day, Principal Data Scientist, MapR 5yr Hadoop Dev, R project contributor PhD, Human Genetics, UCLA Medicine • MapR Distributes open source components for Hadoop Adds major technology for performance, HA, industry standard API’s • See Also – “allenday” most places (twitter, github, etc.) – @mapR
11.
© 2014 MapR
Technologies 11 Clinical Sequencing Business Process Workflow PhysicianPatient Clinic blood/saliva Clinical Lab Analytics extract
12.
© 2014 MapR
Technologies 12 One Bad MTHFR MTHFR C677T Methylfolate helps make neurotransmitters in your brain. When methylfolate levels are low, so are your neurotransmitters. Low production of neurotransmitters may cause conditions of addictive behavior, depression, anxiety, ADHD, mania, irritability, insomnia, learning disorders and others. Everyone should get tested. Why? Because 1 in 2 people are affected and if one knows they have a MTHFR polymorphism, they know they have to be very proactive in taking care of themselves. http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The- Link-Between-MTHFR-Gene-Mutations-and-Disease-Including-Thyroid- Health.htm
13.
© 2014 MapR
Technologies 13 One Bad MTHFR MTHFR C677T Methylfolate helps make neurotransmitters in your brain. When methylfolate levels are low, so are your neurotransmitters. Low production of neurotransmitters may cause conditions of addictive behavior, depression, anxiety, ADHD, mania, irritability, insomnia, learning disorders and others. Everyone should get tested. Why? Because 1 in 2 people are affected and if one knows they have a MTHFR polymorphism, they know they have to be very proactive in taking care of themselves. http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The- Link-Between-MTHFR-Gene-Mutations-and-Disease-Including-Thyroid- Health.htm
14.
© 2014 MapR
Technologies 14 One Bad MTHFR MTHFR C677T Methylfolate helps make neurotransmitters in your brain. When methylfolate levels are low, so are your neurotransmitters. Low production of neurotransmitters may cause conditions of addictive behavior, depression, anxiety, ADHD, mania, irritability, insomnia, learning disorders and others. Everyone should get tested. Why? Because 1 in 2 people are affected and if one knows they have a MTHFR polymorphism, they know they have to be very proactive in taking care of themselves. http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The- Link-Between-MTHFR-Gene-Mutations-and-Disease-Including-Thyroid- Health.htm
15.
© 2014 MapR
Technologies 15 One Bad MTHFR MTHFR C677T Methylfolate helps make neurotransmitters in your brain. When methylfolate levels are low, so are your neurotransmitters. Low production of neurotransmitters may cause conditions of addictive behavior, depression, anxiety, ADHD, mania, irritability, insomnia, learning disorders and others. Everyone should get tested. Why? Because 1 in 2 people are affected and if one knows they have a MTHFR polymorphism, they know they have to be very proactive in taking care of themselves. http://thyroid.about.com/od/MTHFR-Gene-Mutations-and-Polymorphisms/fl/The- Link-Between-MTHFR-Gene-Mutations-and-Disease-Including-Thyroid- Health.htm
16.
© 2014 MapR
Technologies 16 Clinical Sequencing Business Process Workflow PhysicianPatient Clinic blood/saliva Clinical Lab Analytics extract
17.
© 2014 MapR
Technologies 17 Clinical Genomics, Information Systems Perspective Compressed Structured Base4 Data Uncompressed Unstructured Base2 Data extract Base4=>Base2 Converter [[ DE-STRUCTURES ]] “BI” Reporting and Visualization tools PhysicianPatient AnalystStakeholder
18.
© 2014 MapR
Technologies 18 Clinical Genomics, Information Systems Perspective PhysicianPatient AnalystStakeholder ETL Reporting and Viz Data Store Analytics
19.
© 2014 MapR
Technologies 19 Sequencing “Even Moore’s Law” Stein. 2010. The case for cloud computing in genome informatics
20.
© 2014 MapR
Technologies 20 The Evolving Genomics Workload Sboner, et al, 2011. The real cost of sequencing: higher than you think! <= 1º analytics “current high ROI use cases” <= 2º analytics “next-gen high ROI use cases”
21.
© 2014 MapR
Technologies 21 Clinical Genomics, Information Systems Perspective PhysicianPatient AnalystStakeholder ETL Reporting and Viz Data Store Analytics 1º analytics 2º analytics Not much in this presentation, see also: http://slidesha.re/1sC2BOX
22.
© 2014 MapR
Technologies 22 Sequence Analysis, Quick Partial Details […] G A C T A G A fragment1 A C A G T T T A C A fragment2 A G A T A - - A G A fragment3 A A C A G C T T A C A […] fragment4 C T A T A G A T A A fragment5 […] G A T T A C A G A T T A C A G A T T A C A […] referenceDNA […] G A C T A C A G A T A A C A G A T T A C A […] patient__DNA
23.
© 2014 MapR
Technologies 23 What is the (Probable) Color of Each Column?
24.
© 2014 MapR
Technologies 24 Which Columns are (probably) Not White? Strategy 1: examine foreach column, foreach row O(rows*cols) + O(1 col) memory
25.
© 2014 MapR
Technologies 25 Which Columns are (probably) Not White? Strategy 2: examine foreach row. keep running tallies O(rows) + O(rows*cols) memory
26.
© 2014 MapR
Technologies 26 Which Columns are (probably) Not White? Strategy 3: rotate matrix. examine foreach column O(rows log rows) + O(cols) + O(1 col) memory
27.
© 2014 MapR
Technologies 27 Comparison of Strategies Strategy 1 • Low mem req • Random access pattern, many ops Strategy 3 • Low mem req • Sequential access pattern • Requires Sort Strategy 2 • High mem req • Sequential access pattern O(rows*cols) + O(1 col) memory O(rows) + O(rows*cols) memory O(rows log rows) + O(cols) + O(1 col) memory
28.
© 2014 MapR
Technologies 28 Comparison of Strategies Strategy 1 • Low mem req • Random access pattern, many ops Strategy 3 • Low mem req • Sequential access pattern • Requires Sort Strategy 2 • High mem req • Sequential access pattern O(rows*cols) + O(1 col) memory O(rows) + O(rows*cols) memory O(rows log rows) ÷ shards + O(cols) ÷ shards + O(1 col) memory As # of rows & columns increases Strategy 3 becomes more attractive
29.
© 2014 MapR
Technologies 29 1º Sequence Analysis (ETL), MapReduce style .fastq .bam .vcf short read alignment genotype calling MAP MAP REDUCE, rotate matrix 90º (O(mn)) / 1 (O(mn) + O(n log n)) / s
30.
© 2014 MapR
Technologies 30 Crossbow (MapReduce Strategy, implemented) Langmead, et al. 2009. Searching for SNPs with cloud computing
31.
© 2014 MapR
Technologies 31 Ion Flux (MapReduce Strategy, implemented for Enterprise) • Sequencing workflow in MapReduce (Hadoop, Cascading, Amazon Elastic M/R) • Integrated with Ion Torrent as a plugin to stream sequence to the cloud • Emphasis on scalability and latency – assay->clinical report turnaround in < 24h • Compare to fast-follower stack ILMN MiSeq+BaseSpace http://aws.amazon.com/solutions/case-studies/ion-flux/ http://ionflux.com
32.
© 2014 MapR
Technologies 32© 2014 MapR Technologies Non-Genomics Digression, 1 of 2 Data Warehouse ETL Offload
33.
© 2014 MapR
Technologies 33 The Problem • Major telecom vendor • Key step in billing pipeline handled by data warehouse (EDW) • EDW at maximum capacity • Multiple rounds of software optimization already done • Revenue limiting (= career limiting) bottleneck
34.
© 2014 MapR
Technologies 34 Three Options 1. No more revenue growth 2. Increase EDW size – Expensive – Known to not scale well 3. Find a more scalable solution
35.
© 2014 MapR
Technologies 35 ETL CDR billing records Billing reports Data Warehouse Customer bills Original Flow – ELTL
36.
© 2014 MapR
Technologies 36 Simplified Analysis – EDW Strategy • 70% of EDW consumed by ELTL processing – Caused by 10% of code (CDR transformations) • 200% EDW capacity adds capital cost is ~X • Indirect costs non-trivial (floor space, power) • 150% performance increase (poor division of labor)
37.
© 2014 MapR
Technologies 37 ETL CDR billing records Billing reports Data Warehouse Customer billing With ETL Offload
38.
© 2014 MapR
Technologies 38 Simplified Analysis – MapR Strategy • Hardware + MapR cost ~1/20X • ETL replacement development costs ~1/20X • 300% performance increase
39.
© 2014 MapR
Technologies 39 Price Performance • EDW strategy – 1.5x performance – Cost is X • MapR Strategy – 3x performance – Cost is 1/10X • 20x cost/performance advantage for MapR strategy
40.
© 2014 MapR
Technologies 40 Platform Advantages • Standard Hadoop eco-system components allow efficient CDR parsing and ETL • MapR platform provides high availability, disaster recovery • MapR NFS interface allows direct load of transformed data
41.
© 2014 MapR
Technologies 41© 2014 MapR Technologies Non-Genomics Digression, 2 of 2
42.
© 2014 MapR
Technologies 42© 2014 MapR Technologies <Recommendation System. Redacted>
43.
© 2014 MapR
Technologies 50© 2014 MapR Technologies Hybrid Use-Cases
44.
© 2014 MapR
Technologies 51 MapR Data Platform Advantage, Telecommunications CO-OCCURRENCE (MAHOUT) SOLR INDEXING ETL BILLING REPORTS WEB TIERDATA WAREHOUSE CDR BILLING RECORDS CUSTOMER BILLING USER HISTORY QUERY / CONTEXT RECOMENDATIONS COMPLETE HISTORY (all users) ITEM META-DATA INDEX SHARDS
45.
© 2014 MapR
Technologies 52 MapR Data Platform Advantage, Clinical Genomics Epidemiological, Actuarial Analyses Denormalization for Search, Viz, Research ETL Clinical Reporting WEB TIERClinical Reporting Systems CLINICAL TREATMENT OF PATIENTS RESEARCHERS National Pop. Database INDEX SHARDSPrognostic Capability
46.
© 2014 MapR
Technologies 53© 2014 MapR Technologies Bonus Round: 2º Analytics
47.
© 2014 MapR
Technologies 54 Clinical Genomics, Information Systems Perspective PhysicianPatient AnalystStakeholder ETL Reporting and Viz Data Store Analytics 2º analytics Not much in this presentation, see also: http://slidesha.re/1sC2BOX
48.
© 2014 MapR
Technologies 55 Matrices A (U*Q) and B (U*V) Query Term = Clicked Term Users Query Terms Users Clicked Videos
49.
© 2014 MapR
Technologies 56 Relate Q to V Users Query Terms
50.
© 2014 MapR
Technologies 57 Relate Q to V Users Query Terms
51.
© 2014 MapR
Technologies 58 Relate Q to V: it’s a Cross-Recommender QueryTerms Videos
52.
© 2014 MapR
Technologies 59 Users Query Terms
53.
© 2014 MapR
Technologies 60 If they were unlabeled, would you know which is which? Friend. 2010. The Need for Precompetitive Integrative Bionetwork Disease Model Building NPR. 2011. The Search For Analysts To Make Sense Of 'Big Data’ http://www.npr.org/2011/11/30/142893065
54.
© 2014 MapR
Technologies 61 If they were unlabeled, would you know which is which? Friend. 2010. The Need for Precompetitive Integrative Bionetwork Disease Model Building • Identify network structures • Label them • Observe stimulus=>response space mapping • Purposefully target • PROFIT ! ! ! !