SlideShare a Scribd company logo
1 of 19
Download to read offline
ADAM: Fast, Scalable
Genome Analysis
Frank Austin Nothaft
AMPLab, University of California, Berkeley
with: Matt Massie, André Schumacher, Timothy Danford, Chris
Hartl, Jey Kottalam, Arun Aruha, Neal Sidhwaney, Michael
Linderman, Jeff Hammerbacher, Anthony Joseph, and Dave
Patterson

https://github.com/bigdatagenomics
Wednesday, February 12, 14
Problem
• Whole genome files are large
• Biological systems are complex
• Population analysis requires petabytes of
data

• Analysis time is often a matter of life and
death

Wednesday, February 12, 14
Shredded Book Analogy
Dickens accidentally shreds the first printing of A Tale of Two Cities
Text printed on 5 long spools
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

Slide credit to Michael Schatz http://schatzlab.cshl.edu/
Wednesday, February 12, 14
Shredded Book Analogy
Dickens accidentally shreds the first printing of A Tale of Two Cities
Text printed on 5 long spools
It was the best of
It was the best
It was the
It was
It

times, it was the worst
of times, it was the

best of times, it was

the best of times, it

was the best of times,

of times, it was the
worst of times, it was

the worst of times, it

was the worst of times,
it was the worst of

age of wisdom, it was
the age of wisdom, it

was the age of wisdom,
it was the age of

times, it was the age

the age of foolishness, …
was the age of foolishness,

it was the age of

wisdom, it was the age
of wisdom, it was the

foolishness, …
of foolishness, …

age of foolishness, …

Slide credit to Michael Schatz http://schatzlab.cshl.edu/
Wednesday, February 12, 14
Shredded Book Analogy
Dickens accidentally shreds the first printing of A Tale of Two Cities
Text printed on 5 long spools
It was the best of
It was the best
It was the
It was
It

times, it was the worst
of times, it was the

best of times, it was

the best of times, it

was the best of times,

of times, it was the
worst of times, it was

the worst of times, it

was the worst of times,
it was the worst of

age of wisdom, it was
the age of wisdom, it

was the age of wisdom,
it was the age of

times, it was the age

the age of foolishness, …
was the age of foolishness,

it was the age of

wisdom, it was the age
of wisdom, it was the

foolishness, …
of foolishness, …

age of foolishness, …

• How can he reconstruct the text?
– 5 copies x 138, 656 words / 5 words per fragment = 138k fragments
– The short fragments from every copy are mixed together
– Some fragments are identical
Slide credit to Michael Schatz http://schatzlab.cshl.edu/
Wednesday, February 12, 14
Shredded Book Analogy
Dickens accidentally shreds the first printing of A Tale of Two Cities
Text printed on 5 long spools
It was the best of
It was the best
It was the
It was
It

times, it was the worst
of times, it was the

best of times, it was

the best of times, it

was the best of times,

of times, it was the
worst of times, it was

the worst of times, it

was the worst of times,
it was the worst of

age of wisdom, it was
the age of wisdom, it

was the age of wisdom,
it was the age of

times, it was the age

the age of foolishness, …
was the age of foolishness,

it was the age of

wisdom, it was the age
of wisdom, it was the

foolishness, …
of foolishness, …

age of foolishness, …

• How can he reconstruct the text?
– 5 copies x 138, 656 words / 5 words per fragment = 138k fragments
– The short fragments from every copy are mixed together
– Some fragments are identical
Slide credit to Michael Schatz http://schatzlab.cshl.edu/
Wednesday, February 12, 14
What is ADAM?
• File formats: columnar file format that

allows efficient parallel access to genomes

• API: interface for transforming, analyzing,
and querying genomic data

• CLI: a handy toolkit for quickly processing
genomes

Wednesday, February 12, 14
SNAP
ADAM

SiRen CAGE
Avocado

MLBase
GraphX

The Broad Institute “Best Practices” Pipeline
Wednesday, February 12, 14
Design Goals
• Develop processing pipeline that enables
efficient, scalable use of cluster/cloud

• Provide data format that has efficient

parallel/distributed access across platforms

• Enhance semantics of data and allow more
flexible data access patterns

Wednesday, February 12, 14
Whole Genome
Data Sizes
Input

Pipeline
Stage

Output

SNAP

1GB Fasta
Alignment
150GB Fastq

250GB BAM

ADAM

250GB BAM Preprocessing

200GB
ADAM

Avocado

200GB
ADAM

10MB ADAM

Variant
Calling

Variants found at about 1 in 1,000 loci
Wednesday, February 12, 14
ADAM Stack
RDD

‣Transform records using Apache Spark
‣Query with SQL using Shark
‣Graph processing with GraphX
‣Machine learning using MLBase

‣Schema-driven records w/ Apache Avro
Record/Split ‣Store and retrieve records using Parquet
‣Read BAM Files using Hadoop-BAM
File/Block
Physical
Wednesday, February 12, 14

‣Hadoop Distributed Filesystem
‣Local Filesystem
‣Commodity Hardware
‣Cloud Systems - Amazon, GCE, Azure
Commits

Implementation

•
•
•
•

Work accelerated at end of September

•

Proof of concept implementations at The Broad
Institute, Duke, Harvard, UCSC

•

Global Alliance looking at ADAM as potential standard

Wednesday, February 12, 14

15K lines of Scala code
100% Apache-licensed open-source
Code contributions from Mt. Sinai, GenomeBridge, The
Broad Institute
AWS EC2 Performance
NA12878 Whole Genome
234GB as BAM, 229 GB as ADAM
Sort

Mark Duplicates

1064
1222

Picard

536

ADAM 1 node

32

ADAM 32 nodes

20
28
1

10

53x
100

Minutes (log scale)
Wednesday, February 12, 14

33x

70

ADAM 100 nodes

2x

1000

10000
Mt. Sinai Performance
NA12878 Whole Genome
234GB as BAM, 229 GB as ADAM
Sort

Mark Duplicates

1064
1222

Picard

29
50
17
34
11
15

ADAM 16 nodes

ADAM 32 nodes

ADAM 64 nodes

8

ADAM 82 nodes
1

10

62x
96x
133x

14
100
Minutes (log scale)

Wednesday, February 12, 14

36x

1000

10000
Scalability

HG00096 à 16GB
NA12878 à 234GB
Wednesday, February 12, 14
Acknowledgements
• UC Berkeley: Matt Massie, André
Schumacher, Jey Kottalam, Christos
Kozanitis

• Mt. Sinai: Arun Ahuja, Neal Sidhwaney,
Michael Linderman, Jeff Hammerbacher

• GenomeBridge: Timothy Danford, Carl
Yeksigian

Wednesday, February 12, 14
Acknowledgements
This research is supported in part by NSF CISE
Expeditions award CCF-1139158 and DARPA
XData Award FA8750-12-2-0331, and gifts from
Amazon Web Services, Google, SAP, Apple, Inc.,
Cisco, Clearstory Data, Cloudera, Ericsson,
Facebook, GameOnTalis, General Electric,
Hortonworks, Huawei, Intel, Microsoft, NetApp,
Oracle, Samsung, Splunk,VMware, WANdisco and
Yahoo!.

Wednesday, February 12, 14
“Before seeing your data, I would not
think sorting could be done that fast.”
- research scientist at the Broad Institute

Wednesday, February 12, 14
Call for contributions
• As an open source project, we welcome
contributions

• We maintain a list of open enhancements at
our Github issue tracker

• Enhancements tagged with “Pick me up!”
don’t require a genomics background

• Github: https://github.com/bigdatagenomics/
Wednesday, February 12, 14

More Related Content

Viewers also liked

Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreUri Laserson
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIAllen Day, PhD
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilSpark Summit
 
Free Code Friday: Genome Resequencing with Spark, Part 1
Free Code Friday: Genome Resequencing with Spark, Part 1Free Code Friday: Genome Resequencing with Spark, Part 1
Free Code Friday: Genome Resequencing with Spark, Part 1MapR Technologies
 
Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Timothy Danford
 
Developing openEHR EHRs - core functionalities
Developing openEHR EHRs - core functionalitiesDeveloping openEHR EHRs - core functionalities
Developing openEHR EHRs - core functionalitiesPablo Pazos
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data scienceAndy Petrella
 
Visualizing the genome: Techniques for presenting genome data and annotations
Visualizing the genome: Techniques for presenting genome data and annotationsVisualizing the genome: Techniques for presenting genome data and annotations
Visualizing the genome: Techniques for presenting genome data and annotationsAnn Loraine
 
Bioinformatics & Genomics December Newsletter
Bioinformatics & Genomics December NewsletterBioinformatics & Genomics December Newsletter
Bioinformatics & Genomics December NewsletterKelly Wickham
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Matthieu Schapranow
 
Interviewing - why some questions are off limits
Interviewing - why some questions are off limitsInterviewing - why some questions are off limits
Interviewing - why some questions are off limitsAnn Loraine
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data ChallengesPhilip Bourne
 
Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)Deepak Singh
 
Daily Snapshot - 2nd March 2017
Daily Snapshot - 2nd March 2017Daily Snapshot - 2nd March 2017
Daily Snapshot - 2nd March 2017Tracxn
 
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Spark Summit
 
Hannes Smarason: Genomics: Forging Patient-Centric Communities
Hannes Smarason: Genomics: Forging Patient-Centric CommunitiesHannes Smarason: Genomics: Forging Patient-Centric Communities
Hannes Smarason: Genomics: Forging Patient-Centric CommunitiesHannes Smárason
 
Hannes Smarason: Progress & Prospects in Genomics
Hannes Smarason: Progress & Prospects in GenomicsHannes Smarason: Progress & Prospects in Genomics
Hannes Smarason: Progress & Prospects in GenomicsHannes Smárason
 
Hannes Smarason: 2015 = An Inflection Point in Genomics
Hannes Smarason: 2015 = An Inflection Point in GenomicsHannes Smarason: 2015 = An Inflection Point in Genomics
Hannes Smarason: 2015 = An Inflection Point in GenomicsHannes Smárason
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't SpecialAllen Day, PhD
 

Viewers also liked (20)

Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant Store
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And Toil
 
openEHR sll-2015final
openEHR sll-2015finalopenEHR sll-2015final
openEHR sll-2015final
 
Free Code Friday: Genome Resequencing with Spark, Part 1
Free Code Friday: Genome Resequencing with Spark, Part 1Free Code Friday: Genome Resequencing with Spark, Part 1
Free Code Friday: Genome Resequencing with Spark, Part 1
 
Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?
 
Developing openEHR EHRs - core functionalities
Developing openEHR EHRs - core functionalitiesDeveloping openEHR EHRs - core functionalities
Developing openEHR EHRs - core functionalities
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
 
Visualizing the genome: Techniques for presenting genome data and annotations
Visualizing the genome: Techniques for presenting genome data and annotationsVisualizing the genome: Techniques for presenting genome data and annotations
Visualizing the genome: Techniques for presenting genome data and annotations
 
Bioinformatics & Genomics December Newsletter
Bioinformatics & Genomics December NewsletterBioinformatics & Genomics December Newsletter
Bioinformatics & Genomics December Newsletter
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
 
Interviewing - why some questions are off limits
Interviewing - why some questions are off limitsInterviewing - why some questions are off limits
Interviewing - why some questions are off limits
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 
Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)
 
Daily Snapshot - 2nd March 2017
Daily Snapshot - 2nd March 2017Daily Snapshot - 2nd March 2017
Daily Snapshot - 2nd March 2017
 
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
 
Hannes Smarason: Genomics: Forging Patient-Centric Communities
Hannes Smarason: Genomics: Forging Patient-Centric CommunitiesHannes Smarason: Genomics: Forging Patient-Centric Communities
Hannes Smarason: Genomics: Forging Patient-Centric Communities
 
Hannes Smarason: Progress & Prospects in Genomics
Hannes Smarason: Progress & Prospects in GenomicsHannes Smarason: Progress & Prospects in Genomics
Hannes Smarason: Progress & Prospects in Genomics
 
Hannes Smarason: 2015 = An Inflection Point in Genomics
Hannes Smarason: 2015 = An Inflection Point in GenomicsHannes Smarason: 2015 = An Inflection Point in Genomics
Hannes Smarason: 2015 = An Inflection Point in Genomics
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't Special
 

Similar to Strata Big Data Science Talk on ADAM

Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMfnothaft
 
Life of P FITC & CreateUpstate
Life of P FITC & CreateUpstateLife of P FITC & CreateUpstate
Life of P FITC & CreateUpstateJason Pamental
 
The Life of <p>
The Life of <p>The Life of <p>
The Life of <p>FITC
 
The-Language-Instinct-How-the-Mind-Creates-Language,-Steven-Pinker.pdf
The-Language-Instinct-How-the-Mind-Creates-Language,-Steven-Pinker.pdfThe-Language-Instinct-How-the-Mind-Creates-Language,-Steven-Pinker.pdf
The-Language-Instinct-How-the-Mind-Creates-Language,-Steven-Pinker.pdfjosepulido64
 
Book discussion 101 rev
Book discussion 101 revBook discussion 101 rev
Book discussion 101 revZarah Gagatiga
 
Tools of the trade region x
Tools of the trade region xTools of the trade region x
Tools of the trade region xTeri Lesesne
 
The walkingpeople
The walkingpeopleThe walkingpeople
The walkingpeopleJohn Thomas
 
N E I S D Mar 2015 tools of trade
N E I S D Mar 2015 tools of tradeN E I S D Mar 2015 tools of trade
N E I S D Mar 2015 tools of tradeTeri Lesesne
 
Writing The Science Fiction Film: Where do you get your ideas from?
Writing The Science Fiction Film: Where do you get your ideas from?Writing The Science Fiction Film: Where do you get your ideas from?
Writing The Science Fiction Film: Where do you get your ideas from?robgrant
 
Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it mattersJames Hendler
 
Big Modernism
Big ModernismBig Modernism
Big Modernismjanamu
 
Ecu library summit 2014
Ecu library summit 2014Ecu library summit 2014
Ecu library summit 2014twetherell
 
Pearland I S D 2014
Pearland I S D 2014Pearland I S D 2014
Pearland I S D 2014Teri Lesesne
 
Topics For Exemplification Essays.pdf
Topics For Exemplification Essays.pdfTopics For Exemplification Essays.pdf
Topics For Exemplification Essays.pdfJennifer Triepke
 
Tools of the trade region x 2 kp (1)
Tools of the trade region x 2 kp (1)Tools of the trade region x 2 kp (1)
Tools of the trade region x 2 kp (1)Karin Perry
 
Tools of the trade region X
Tools of the trade region X Tools of the trade region X
Tools of the trade region X Teri Lesesne
 
The hitchhiker's guide to the galaxy by douglas adams slide - ppt
The hitchhiker's guide to the galaxy by  douglas adams   slide - pptThe hitchhiker's guide to the galaxy by  douglas adams   slide - ppt
The hitchhiker's guide to the galaxy by douglas adams slide - pptkawaiiassunantejames
 

Similar to Strata Big Data Science Talk on ADAM (19)

Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Life of p
Life of pLife of p
Life of p
 
Life of P FITC & CreateUpstate
Life of P FITC & CreateUpstateLife of P FITC & CreateUpstate
Life of P FITC & CreateUpstate
 
The Life of <p>
The Life of <p>The Life of <p>
The Life of <p>
 
The-Language-Instinct-How-the-Mind-Creates-Language,-Steven-Pinker.pdf
The-Language-Instinct-How-the-Mind-Creates-Language,-Steven-Pinker.pdfThe-Language-Instinct-How-the-Mind-Creates-Language,-Steven-Pinker.pdf
The-Language-Instinct-How-the-Mind-Creates-Language,-Steven-Pinker.pdf
 
Fake News Essay
Fake News EssayFake News Essay
Fake News Essay
 
Book discussion 101 rev
Book discussion 101 revBook discussion 101 rev
Book discussion 101 rev
 
Tools of the trade region x
Tools of the trade region xTools of the trade region x
Tools of the trade region x
 
The walkingpeople
The walkingpeopleThe walkingpeople
The walkingpeople
 
N E I S D Mar 2015 tools of trade
N E I S D Mar 2015 tools of tradeN E I S D Mar 2015 tools of trade
N E I S D Mar 2015 tools of trade
 
Writing The Science Fiction Film: Where do you get your ideas from?
Writing The Science Fiction Film: Where do you get your ideas from?Writing The Science Fiction Film: Where do you get your ideas from?
Writing The Science Fiction Film: Where do you get your ideas from?
 
Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it matters
 
Big Modernism
Big ModernismBig Modernism
Big Modernism
 
Ecu library summit 2014
Ecu library summit 2014Ecu library summit 2014
Ecu library summit 2014
 
Pearland I S D 2014
Pearland I S D 2014Pearland I S D 2014
Pearland I S D 2014
 
Topics For Exemplification Essays.pdf
Topics For Exemplification Essays.pdfTopics For Exemplification Essays.pdf
Topics For Exemplification Essays.pdf
 
Tools of the trade region x 2 kp (1)
Tools of the trade region x 2 kp (1)Tools of the trade region x 2 kp (1)
Tools of the trade region x 2 kp (1)
 
Tools of the trade region X
Tools of the trade region X Tools of the trade region X
Tools of the trade region X
 
The hitchhiker's guide to the galaxy by douglas adams slide - ppt
The hitchhiker's guide to the galaxy by  douglas adams   slide - pptThe hitchhiker's guide to the galaxy by  douglas adams   slide - ppt
The hitchhiker's guide to the galaxy by douglas adams slide - ppt
 

Recently uploaded

Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbaisonalikaur4
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...Miss joya
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Modelssonalikaur4
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Serviceparulsinha
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...saminamagar
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowNehru place Escorts
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfHemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfMedicoseAcademics
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.MiadAlsulami
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 

Recently uploaded (20)

Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
 
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
 
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfHemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
 
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCREscort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
 
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Servicesauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 

Strata Big Data Science Talk on ADAM

  • 1. ADAM: Fast, Scalable Genome Analysis Frank Austin Nothaft AMPLab, University of California, Berkeley with: Matt Massie, André Schumacher, Timothy Danford, Chris Hartl, Jey Kottalam, Arun Aruha, Neal Sidhwaney, Michael Linderman, Jeff Hammerbacher, Anthony Joseph, and Dave Patterson https://github.com/bigdatagenomics Wednesday, February 12, 14
  • 2. Problem • Whole genome files are large • Biological systems are complex • Population analysis requires petabytes of data • Analysis time is often a matter of life and death Wednesday, February 12, 14
  • 3. Shredded Book Analogy Dickens accidentally shreds the first printing of A Tale of Two Cities Text printed on 5 long spools It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, … It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, … It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, … It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, … It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, … Slide credit to Michael Schatz http://schatzlab.cshl.edu/ Wednesday, February 12, 14
  • 4. Shredded Book Analogy Dickens accidentally shreds the first printing of A Tale of Two Cities Text printed on 5 long spools It was the best of It was the best It was the It was It times, it was the worst of times, it was the best of times, it was the best of times, it was the best of times, of times, it was the worst of times, it was the worst of times, it was the worst of times, it was the worst of age of wisdom, it was the age of wisdom, it was the age of wisdom, it was the age of times, it was the age the age of foolishness, … was the age of foolishness, it was the age of wisdom, it was the age of wisdom, it was the foolishness, … of foolishness, … age of foolishness, … Slide credit to Michael Schatz http://schatzlab.cshl.edu/ Wednesday, February 12, 14
  • 5. Shredded Book Analogy Dickens accidentally shreds the first printing of A Tale of Two Cities Text printed on 5 long spools It was the best of It was the best It was the It was It times, it was the worst of times, it was the best of times, it was the best of times, it was the best of times, of times, it was the worst of times, it was the worst of times, it was the worst of times, it was the worst of age of wisdom, it was the age of wisdom, it was the age of wisdom, it was the age of times, it was the age the age of foolishness, … was the age of foolishness, it was the age of wisdom, it was the age of wisdom, it was the foolishness, … of foolishness, … age of foolishness, … • How can he reconstruct the text? – 5 copies x 138, 656 words / 5 words per fragment = 138k fragments – The short fragments from every copy are mixed together – Some fragments are identical Slide credit to Michael Schatz http://schatzlab.cshl.edu/ Wednesday, February 12, 14
  • 6. Shredded Book Analogy Dickens accidentally shreds the first printing of A Tale of Two Cities Text printed on 5 long spools It was the best of It was the best It was the It was It times, it was the worst of times, it was the best of times, it was the best of times, it was the best of times, of times, it was the worst of times, it was the worst of times, it was the worst of times, it was the worst of age of wisdom, it was the age of wisdom, it was the age of wisdom, it was the age of times, it was the age the age of foolishness, … was the age of foolishness, it was the age of wisdom, it was the age of wisdom, it was the foolishness, … of foolishness, … age of foolishness, … • How can he reconstruct the text? – 5 copies x 138, 656 words / 5 words per fragment = 138k fragments – The short fragments from every copy are mixed together – Some fragments are identical Slide credit to Michael Schatz http://schatzlab.cshl.edu/ Wednesday, February 12, 14
  • 7. What is ADAM? • File formats: columnar file format that allows efficient parallel access to genomes • API: interface for transforming, analyzing, and querying genomic data • CLI: a handy toolkit for quickly processing genomes Wednesday, February 12, 14
  • 8. SNAP ADAM SiRen CAGE Avocado MLBase GraphX The Broad Institute “Best Practices” Pipeline Wednesday, February 12, 14
  • 9. Design Goals • Develop processing pipeline that enables efficient, scalable use of cluster/cloud • Provide data format that has efficient parallel/distributed access across platforms • Enhance semantics of data and allow more flexible data access patterns Wednesday, February 12, 14
  • 10. Whole Genome Data Sizes Input Pipeline Stage Output SNAP 1GB Fasta Alignment 150GB Fastq 250GB BAM ADAM 250GB BAM Preprocessing 200GB ADAM Avocado 200GB ADAM 10MB ADAM Variant Calling Variants found at about 1 in 1,000 loci Wednesday, February 12, 14
  • 11. ADAM Stack RDD ‣Transform records using Apache Spark ‣Query with SQL using Shark ‣Graph processing with GraphX ‣Machine learning using MLBase ‣Schema-driven records w/ Apache Avro Record/Split ‣Store and retrieve records using Parquet ‣Read BAM Files using Hadoop-BAM File/Block Physical Wednesday, February 12, 14 ‣Hadoop Distributed Filesystem ‣Local Filesystem ‣Commodity Hardware ‣Cloud Systems - Amazon, GCE, Azure
  • 12. Commits Implementation • • • • Work accelerated at end of September • Proof of concept implementations at The Broad Institute, Duke, Harvard, UCSC • Global Alliance looking at ADAM as potential standard Wednesday, February 12, 14 15K lines of Scala code 100% Apache-licensed open-source Code contributions from Mt. Sinai, GenomeBridge, The Broad Institute
  • 13. AWS EC2 Performance NA12878 Whole Genome 234GB as BAM, 229 GB as ADAM Sort Mark Duplicates 1064 1222 Picard 536 ADAM 1 node 32 ADAM 32 nodes 20 28 1 10 53x 100 Minutes (log scale) Wednesday, February 12, 14 33x 70 ADAM 100 nodes 2x 1000 10000
  • 14. Mt. Sinai Performance NA12878 Whole Genome 234GB as BAM, 229 GB as ADAM Sort Mark Duplicates 1064 1222 Picard 29 50 17 34 11 15 ADAM 16 nodes ADAM 32 nodes ADAM 64 nodes 8 ADAM 82 nodes 1 10 62x 96x 133x 14 100 Minutes (log scale) Wednesday, February 12, 14 36x 1000 10000
  • 15. Scalability HG00096 à 16GB NA12878 à 234GB Wednesday, February 12, 14
  • 16. Acknowledgements • UC Berkeley: Matt Massie, André Schumacher, Jey Kottalam, Christos Kozanitis • Mt. Sinai: Arun Ahuja, Neal Sidhwaney, Michael Linderman, Jeff Hammerbacher • GenomeBridge: Timothy Danford, Carl Yeksigian Wednesday, February 12, 14
  • 17. Acknowledgements This research is supported in part by NSF CISE Expeditions award CCF-1139158 and DARPA XData Award FA8750-12-2-0331, and gifts from Amazon Web Services, Google, SAP, Apple, Inc., Cisco, Clearstory Data, Cloudera, Ericsson, Facebook, GameOnTalis, General Electric, Hortonworks, Huawei, Intel, Microsoft, NetApp, Oracle, Samsung, Splunk,VMware, WANdisco and Yahoo!. Wednesday, February 12, 14
  • 18. “Before seeing your data, I would not think sorting could be done that fast.” - research scientist at the Broad Institute Wednesday, February 12, 14
  • 19. Call for contributions • As an open source project, we welcome contributions • We maintain a list of open enhancements at our Github issue tracker • Enhancements tagged with “Pick me up!” don’t require a genomics background • Github: https://github.com/bigdatagenomics/ Wednesday, February 12, 14