SlideShare ist ein Scribd-Unternehmen logo
1 von 28
#Code2Cure: Engineering Genomics
: @mirkiani
A field guide for software engineers on their journey to the world of genomics.
Amirhossein Kiani
Sr. Lead Software Engineer
: amir@bina.com
Image courtesy of http://circos.ca
DISCLAIMER: The views expressed in this talk are mine alone and not
those of my employer.
Bina products are for use Research Use Only. Not for use in diagnostic
procedures.
Also, I’m a Computer Scientist by training and trying to help those with
similar background to learn about the field of genomics. Therefore there
has been a high degree of simplification done in explaining the scientific
concepts in this talk.
 https://www.youtube.com/watch?v=G1ZLyGW8rKY
2
www.bina.com
Why Genomics?
$3,000,000,000
13 years
 http://en.wikipedia.org/wiki/Human_Genome_Project
Past Present
$1000
24 hours
Future
3
www.bina.com
Why Genomics?
Some things we could do with genomics:
• Carrier Screening
• Prenatal Screening
• Newborn Screening
• Inherited Disease
• Infectious Disease
• Cancer Diagnostics
• Microbiome
• Personalized Medicine
4
But I have no genomics background!
It’s ok. 
5
www.bina.com
My personal story…
6
Now
Then
www.bina.com
What is cell, what is DNA?
 http://en.wikipedia.org/wiki/Cell_%28biology%29
 http://en.wikipedia.org/wiki/DNA
7
Image courtesy of Pinterest
Image courtesy of Tumblr
www.bina.com
Crash Course on Genomics
The field of studying the structure of genomes.
 http://en.wikipedia.org/wiki/Genomics
 http://en.wikipedia.org/wiki/RNA
 http://en.wikipedia.org/wiki/Protein
DNA RNA Protein You!
8
www.bina.com
How do we figure out what’s in DNA?
Like everything else, we turn the analog signal to digital, and then
analyze it.
 http://en.wikipedia.org/wiki/DNA_sequencing
 http://en.wikipedia.org/wiki/FASTQ_format
Illumina, Ion Torrent, Genia, …
Primary Analysis
FASTQ Format
9
Image courtesy of PersonalGenomes.org
www.bina.com
RAW Data to Variants (Secondary Analysis)
Step 1. Alignment
 http://en.wikipedia.org/wiki/DNA_sequencing
 http://en.wikipedia.org/wiki/FASTQ_format
10
Image courtesy of Wall Woodworks
Image courtesy of Wallpaper Up
www.bina.com
From “Raw” DNA to “Variants” (Secondary Analysis)
Step 1. Short-Read Sequence Alignment
 http://en.wikipedia.org/wiki/Reference_genome
 http://en.wikipedia.org/wiki/Single-nucleotide_polymorphism
 http://en.wikipedia.org/wiki/Indel
 http://en.wikipedia.org/wiki/Structural_variation
AACACACCCAAGGGGGAAACTTTGGTCCACCCAAGGGGGAAACCCAAGGGGGAAACTTTG
Reference Genome (~3B bases)
ACTTTGGTCCACCCAAGG
AAGGGGGACACCCAAGGACACCC__GGGGGAAACT
GGACACCCAAGGGGGAA
ACCCAAGGGGGACACCC
ACCC__GGGGGAAACTTTG
AACACACCC__GGGGGAA
Coverage
Deletion Single Nucleotide Polymorphism
11
www.bina.com
From “Raw” DNA to “Variants” (Secondary Analysis)
• Burrows-Wheeler Aligner (BWA)
• Uses Burrows-Wheeler transform (also used in bzip)
• Uses Smith-Waterman algorithm
• Written in C++
• Uses ~4GB memory for human genome
 http://bio-bwa.sourceforge.net
 http://bioinformatics.oxfordjournals.org/content/25/14/1754.full.pdf+html
$ bwa mem ref.fa read1.fq read2.fq > aln-pe.sam
Example
12
www.bina.com
From “Raw” DNA to “Variants” (Secondary Analysis)
Alignment
FASTQ SAM
Convert to Binary
BZIP (samtools)
BAM File
BAM File Index
 http://samtools.github.io/hts-specs/SAMv1.pdf
 http://samtools.github.io
13
www.bina.com
From “Raw” DNA to “Variants” (Secondary Analysis)
BAM File
BAM File Index
 http://www.broadinstitute.org/igv
 https://github.com/ekg/freebayes
 http://arxiv.org/abs/1207.3907
 https://www.broadinstitute.org/gatk
Visualize
Variant Calling
$ freebayes -f ref.fa aln.bam >var.vcf
Example
Interactive Genome Browser (IGV)
14
www.bina.com
From “Raw” DNA to “Variants” (Secondary Analysis)
15
… and here are your variants (VCF file)! 
 http://samtools.github.io/hts-specs/VCFv4.2.pdf
www.bina.com
What do we do with variant calls then?
Zooming in on the Central Dogma of Molecular Biology:
• There is redundancy in protein codes.
• But a mutation could change the protein coding.
16
Image courtesy of Wikipedia
www.bina.com
What do we do with variant calls then?
Annotation & Interpretation
• Functional Annotation  Figure out if the mutation is dangerous (Use SNPEff)
• Synonymous
• Non-Synonymous
• Frame-shift
• …
• Put in the context of existing findings
• dbSNP
• ClinVar
• COSMIC
• ESP
• 1000 Genomes
• …
 http://snpeff.sourceforge.net
 http://www.ncbi.nlm.nih.gov/SNP
17
www.bina.com
CASE STUDY:
18
www.bina.com
Statistics
Data AnalyticsBioinformatics
Genomics
Big Data Technologies
Compute and Data Science
19
Bringing three disciplines together
www.bina.com
Case Study: Bina GMS
20
Sequencing 2º Analysis 3º Analysis Interpretation
Meaningful Results
& Clinical Relevance
20+ DBs including over
140+ annotations:
HGMD // PGMD // Clinvar
COSMIC // dbNSFP // TRANSFAC
1000 Genome and more.
Tools & Workflows for:
WGS // WES // RNAseq
Somatic Mutations
Multi sample
Gene Panels
Bina Products are for Research Use Only
www.bina.com
Bina RAVE Architecture (1)
21
Secure REST Interface
Portal Server(s)
Portal Backend
(Distributed)
• Workflow Definition
• Templates
• QC/Monitoring
• System Management/Updates
Task Dependency
Graphs
Distributed
Workflow
Orchestration
Secure Push
Interface
WorkflowGeneration
Interactive UI // Command Line SDK
Executor
Dynamic
Scheduling
Local Storage
ExecutionEngine
Executor Nodes / VMs
Network Storage – Input/Output Data
Static
Scheduling
Workflows
Tools
Commands
www.bina.com
Bina RAVE Architecture (2)
Workflows (DNA, RNA ..)
Tools (BWA, GATK, SVs)
Services
(Logging, Storage, Caching,
Streaming)
Commands
(Samtools, GATK, URL,..)
Genome-aware – Workflow Generation
Distributed Coordination
Task Graph
JSON Request
(UI/CMD/SDK)
Nodes / VMs
Executor
Dynamic
scheduling
Graph
Triggers
Updates
Genome aware – Distributed Execution Framework
Syncing all
Nodes
Dependency
Graph
Task Status
Network storage – Input/output data
Local storage
• Dependency Aware Execution
• Locality Aware Execution (Caching)
• Streaming Through “Engines”
• In-Memory Computation
Output
(VCF,SV)
Input
(BAM, FASTQ)
Static
Scheduling
www.bina.com
Bina AAiM Architecture
Annotation and Indexing Engine
Input
VCF
UI/CMD
Clinical
Annotations
Genomic
Context
Prediction
Func. Impact
Population
Frequency
Distributed Execution
Framework
Annotation
(Join static DBs)
Indexing &
Functional Filters
MapReduce Jobs
Analytics Engine
NoSQL
Data Store
Indices
Metadata
Store
Tumor/Norma
l
Pedigree
Queries, Filters, Variant Sets, Reports
Bina
Secondary
Cohort StudyProband
www.bina.com
What next?
 http://www.genomicsengland.co.uk
 http://www.personalgenomes.org
• Apply this process to different domains and applications
• Come up with ways of ranking variants
• Keep learning from data
• Sequence everyone!
• Genomics England 100,000 Genome Project
• Personal Genomes Project
• Decrease cost
• Increase accuracy
• Make the technology faster and more usable!
Map of sequencers around the globe: http://omicsmaps.com
24
www.bina.com
Challenges in Genomics
• Accuracy
• Gold standard? What tool is best, there are so many!
• NIST, Dream Challenge
• Need to speak the same language… interoperability
• Global Alliance
• API, format, meta data, …
• Regulations
• HIPPA, CLIA: security, accuracy, anonymity and encryption
• Scalability
• Storage
• Need terabytes
• Each genome could be up to 1T
• Computation
• We still pretty much have no idea what most of DNA is doing…
• Can’t run on single machine. Need to scale to many nodes
• Need to leverage cloud technologies
• Provenance and auditability
• Importance of usability
• Different personas
• Errors are very expensive (life and death)
• Better visualization → faster discovery → faster cure
25
www.bina.com
Why should software engineers move to genomics?
Because genomics needs you, and you need genomics.
Work on something that matters! (#Code2Cure)
Things that SWEs do very well:
• Automation
• Elegant solutions for complex problems
• Enabling non-savvy users by
making the technology robust and accessible
• Scale
• Optimization
• Building production-grade platforms
• Tested
• Robust
• Secure
THESE ARE ALL NEEDED IN GENOMICS YESTERDAY!
26
Image courtesy of http://silvsoul.blogspot.com
www.bina.com
Open projects/resources to checkout/contribute to
Projects/Conferences
• Galaxy -- http://galaxyproject.org
• Arvados -- https://arvados.org
• Open Bio Conference -- http://www.open-bio.org
• BioViz -- http://www.biovis.net
• BioPython -- http://biopython.org
• Global Alliance for Genomics Health -- http://ga4gh.org
• Rosalind Project -- http://rosalind.info
Blogs/Websites
• http://bcb.io
• http://nextgenseek.com/
• http://ngs-expert.com/
• http://seqanswers.com/
• http://core-genomics.blogspot.com
• http://www.genomesunzipped.org
• http://genomeweb.com
27
Thank you.
And I hope you consider moving to genomics! 
 http://info.bina.com/code2cure-community
: @mirkiani
Amirhossein Kiani
Sr. Lead Software Engineer
: amir@bina.com

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Kürzlich hochgeladen (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Empfohlen

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 

Empfohlen (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

#Code2Cure: A field guide for software engineers on their journey to the world of genomics.

  • 1. #Code2Cure: Engineering Genomics : @mirkiani A field guide for software engineers on their journey to the world of genomics. Amirhossein Kiani Sr. Lead Software Engineer : amir@bina.com Image courtesy of http://circos.ca DISCLAIMER: The views expressed in this talk are mine alone and not those of my employer. Bina products are for use Research Use Only. Not for use in diagnostic procedures. Also, I’m a Computer Scientist by training and trying to help those with similar background to learn about the field of genomics. Therefore there has been a high degree of simplification done in explaining the scientific concepts in this talk.
  • 3. www.bina.com Why Genomics? $3,000,000,000 13 years  http://en.wikipedia.org/wiki/Human_Genome_Project Past Present $1000 24 hours Future 3
  • 4. www.bina.com Why Genomics? Some things we could do with genomics: • Carrier Screening • Prenatal Screening • Newborn Screening • Inherited Disease • Infectious Disease • Cancer Diagnostics • Microbiome • Personalized Medicine 4
  • 5. But I have no genomics background! It’s ok.  5
  • 7. www.bina.com What is cell, what is DNA?  http://en.wikipedia.org/wiki/Cell_%28biology%29  http://en.wikipedia.org/wiki/DNA 7 Image courtesy of Pinterest Image courtesy of Tumblr
  • 8. www.bina.com Crash Course on Genomics The field of studying the structure of genomes.  http://en.wikipedia.org/wiki/Genomics  http://en.wikipedia.org/wiki/RNA  http://en.wikipedia.org/wiki/Protein DNA RNA Protein You! 8
  • 9. www.bina.com How do we figure out what’s in DNA? Like everything else, we turn the analog signal to digital, and then analyze it.  http://en.wikipedia.org/wiki/DNA_sequencing  http://en.wikipedia.org/wiki/FASTQ_format Illumina, Ion Torrent, Genia, … Primary Analysis FASTQ Format 9 Image courtesy of PersonalGenomes.org
  • 10. www.bina.com RAW Data to Variants (Secondary Analysis) Step 1. Alignment  http://en.wikipedia.org/wiki/DNA_sequencing  http://en.wikipedia.org/wiki/FASTQ_format 10 Image courtesy of Wall Woodworks Image courtesy of Wallpaper Up
  • 11. www.bina.com From “Raw” DNA to “Variants” (Secondary Analysis) Step 1. Short-Read Sequence Alignment  http://en.wikipedia.org/wiki/Reference_genome  http://en.wikipedia.org/wiki/Single-nucleotide_polymorphism  http://en.wikipedia.org/wiki/Indel  http://en.wikipedia.org/wiki/Structural_variation AACACACCCAAGGGGGAAACTTTGGTCCACCCAAGGGGGAAACCCAAGGGGGAAACTTTG Reference Genome (~3B bases) ACTTTGGTCCACCCAAGG AAGGGGGACACCCAAGGACACCC__GGGGGAAACT GGACACCCAAGGGGGAA ACCCAAGGGGGACACCC ACCC__GGGGGAAACTTTG AACACACCC__GGGGGAA Coverage Deletion Single Nucleotide Polymorphism 11
  • 12. www.bina.com From “Raw” DNA to “Variants” (Secondary Analysis) • Burrows-Wheeler Aligner (BWA) • Uses Burrows-Wheeler transform (also used in bzip) • Uses Smith-Waterman algorithm • Written in C++ • Uses ~4GB memory for human genome  http://bio-bwa.sourceforge.net  http://bioinformatics.oxfordjournals.org/content/25/14/1754.full.pdf+html $ bwa mem ref.fa read1.fq read2.fq > aln-pe.sam Example 12
  • 13. www.bina.com From “Raw” DNA to “Variants” (Secondary Analysis) Alignment FASTQ SAM Convert to Binary BZIP (samtools) BAM File BAM File Index  http://samtools.github.io/hts-specs/SAMv1.pdf  http://samtools.github.io 13
  • 14. www.bina.com From “Raw” DNA to “Variants” (Secondary Analysis) BAM File BAM File Index  http://www.broadinstitute.org/igv  https://github.com/ekg/freebayes  http://arxiv.org/abs/1207.3907  https://www.broadinstitute.org/gatk Visualize Variant Calling $ freebayes -f ref.fa aln.bam >var.vcf Example Interactive Genome Browser (IGV) 14
  • 15. www.bina.com From “Raw” DNA to “Variants” (Secondary Analysis) 15 … and here are your variants (VCF file)!   http://samtools.github.io/hts-specs/VCFv4.2.pdf
  • 16. www.bina.com What do we do with variant calls then? Zooming in on the Central Dogma of Molecular Biology: • There is redundancy in protein codes. • But a mutation could change the protein coding. 16 Image courtesy of Wikipedia
  • 17. www.bina.com What do we do with variant calls then? Annotation & Interpretation • Functional Annotation  Figure out if the mutation is dangerous (Use SNPEff) • Synonymous • Non-Synonymous • Frame-shift • … • Put in the context of existing findings • dbSNP • ClinVar • COSMIC • ESP • 1000 Genomes • …  http://snpeff.sourceforge.net  http://www.ncbi.nlm.nih.gov/SNP 17
  • 19. www.bina.com Statistics Data AnalyticsBioinformatics Genomics Big Data Technologies Compute and Data Science 19 Bringing three disciplines together
  • 20. www.bina.com Case Study: Bina GMS 20 Sequencing 2º Analysis 3º Analysis Interpretation Meaningful Results & Clinical Relevance 20+ DBs including over 140+ annotations: HGMD // PGMD // Clinvar COSMIC // dbNSFP // TRANSFAC 1000 Genome and more. Tools & Workflows for: WGS // WES // RNAseq Somatic Mutations Multi sample Gene Panels Bina Products are for Research Use Only
  • 21. www.bina.com Bina RAVE Architecture (1) 21 Secure REST Interface Portal Server(s) Portal Backend (Distributed) • Workflow Definition • Templates • QC/Monitoring • System Management/Updates Task Dependency Graphs Distributed Workflow Orchestration Secure Push Interface WorkflowGeneration Interactive UI // Command Line SDK Executor Dynamic Scheduling Local Storage ExecutionEngine Executor Nodes / VMs Network Storage – Input/Output Data Static Scheduling Workflows Tools Commands
  • 22. www.bina.com Bina RAVE Architecture (2) Workflows (DNA, RNA ..) Tools (BWA, GATK, SVs) Services (Logging, Storage, Caching, Streaming) Commands (Samtools, GATK, URL,..) Genome-aware – Workflow Generation Distributed Coordination Task Graph JSON Request (UI/CMD/SDK) Nodes / VMs Executor Dynamic scheduling Graph Triggers Updates Genome aware – Distributed Execution Framework Syncing all Nodes Dependency Graph Task Status Network storage – Input/output data Local storage • Dependency Aware Execution • Locality Aware Execution (Caching) • Streaming Through “Engines” • In-Memory Computation Output (VCF,SV) Input (BAM, FASTQ) Static Scheduling
  • 23. www.bina.com Bina AAiM Architecture Annotation and Indexing Engine Input VCF UI/CMD Clinical Annotations Genomic Context Prediction Func. Impact Population Frequency Distributed Execution Framework Annotation (Join static DBs) Indexing & Functional Filters MapReduce Jobs Analytics Engine NoSQL Data Store Indices Metadata Store Tumor/Norma l Pedigree Queries, Filters, Variant Sets, Reports Bina Secondary Cohort StudyProband
  • 24. www.bina.com What next?  http://www.genomicsengland.co.uk  http://www.personalgenomes.org • Apply this process to different domains and applications • Come up with ways of ranking variants • Keep learning from data • Sequence everyone! • Genomics England 100,000 Genome Project • Personal Genomes Project • Decrease cost • Increase accuracy • Make the technology faster and more usable! Map of sequencers around the globe: http://omicsmaps.com 24
  • 25. www.bina.com Challenges in Genomics • Accuracy • Gold standard? What tool is best, there are so many! • NIST, Dream Challenge • Need to speak the same language… interoperability • Global Alliance • API, format, meta data, … • Regulations • HIPPA, CLIA: security, accuracy, anonymity and encryption • Scalability • Storage • Need terabytes • Each genome could be up to 1T • Computation • We still pretty much have no idea what most of DNA is doing… • Can’t run on single machine. Need to scale to many nodes • Need to leverage cloud technologies • Provenance and auditability • Importance of usability • Different personas • Errors are very expensive (life and death) • Better visualization → faster discovery → faster cure 25
  • 26. www.bina.com Why should software engineers move to genomics? Because genomics needs you, and you need genomics. Work on something that matters! (#Code2Cure) Things that SWEs do very well: • Automation • Elegant solutions for complex problems • Enabling non-savvy users by making the technology robust and accessible • Scale • Optimization • Building production-grade platforms • Tested • Robust • Secure THESE ARE ALL NEEDED IN GENOMICS YESTERDAY! 26 Image courtesy of http://silvsoul.blogspot.com
  • 27. www.bina.com Open projects/resources to checkout/contribute to Projects/Conferences • Galaxy -- http://galaxyproject.org • Arvados -- https://arvados.org • Open Bio Conference -- http://www.open-bio.org • BioViz -- http://www.biovis.net • BioPython -- http://biopython.org • Global Alliance for Genomics Health -- http://ga4gh.org • Rosalind Project -- http://rosalind.info Blogs/Websites • http://bcb.io • http://nextgenseek.com/ • http://ngs-expert.com/ • http://seqanswers.com/ • http://core-genomics.blogspot.com • http://www.genomesunzipped.org • http://genomeweb.com 27
  • 28. Thank you. And I hope you consider moving to genomics!   http://info.bina.com/code2cure-community : @mirkiani Amirhossein Kiani Sr. Lead Software Engineer : amir@bina.com

Hinweis der Redaktion

  1. Data scale problem 1000s of WGS Customers at research and clinical We are here to introduce our company, products, team Get feedback
  2. At bina, we focus on the analysis of next generation sequencing datasets and provide best in class tools for secondary and tertiary analysis of Whole Genome, Whole Exome, RNAseq and targeted panel data sets. By optimizing the workflows, and tools incorporated in those workflows, to function as efficiently as possible on hardware we supply, we can achieve speed and performance unparalleled in the industry. I will spend the majority of this presentation discussing our analytical workflows, the methods we use to benchmark the tools and workflows, and the performance of those tools running our appliances.
  3. We at bina concentrate on exactly this challenge. We approach the problem from a different perspective than most traditional genomics companies. We have expertise across all of engineering, bioinformatics, software development and generally, on managing large datasets. For example, we have engineers that come from companies like Yahoo and Google that are experts at dealing with this large data challenge and bioinformatics scientists expertise with individuals coming from Stanford, and UCSF, 23 and me.
  4. At bina, we focus on the analysis of next generation sequencing datasets and provide best in class tools for secondary and tertiary analysis of Whole Genome, Whole Exome, RNAseq and targeted panel data sets. By optimizing the workflows, and tools incorporated in those workflows, to function as efficiently as possible on hardware we supply, we can achieve speed and performance unparalleled in the industry. I will spend the majority of this presentation discussing our analytical workflows, the methods we use to benchmark the tools and workflows, and the performance of those tools running our appliances.
  5. For customers with very large amount of data and compute demand or very sensitive data. In countries with no access to secure public clouds
  6. In Memory computation – Alignment and sorting (samsorter) are running in memory, in parallel, as apposed to sequential, on disk way of doing (overlapping compute, minimizing I/O)