SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Big data and Life Sciences
Guy Coates
Wellcome Trust Sanger Institute
gmpc@sanger.ac.uk
The Sanger Institute
Funded by Wellcome Trust.
• 2nd
largest research charity in the world.
• ~700 employees.
• Based in Hinxton Genome Campus,
Cambridge, UK.
Large scale genomic research.
• Sequenced 1/3 of the human genome.
(largest single contributor).
• Large scale sequencing with an impact
on human and animal health.
Data is freely available.
• Websites, ftp, direct database access,
programmatic APIs.
• Some restrictions for potentially
identifiable data.
My team:
• Scientific computing systems architects.
DNA Sequencing
TCTTTATTTTAGCTGGACCAGACCAATTTTGAGGAAAGGATACAGACAGCGCCTGTCTTTATTTTAGCTGGACCAGACCAATTTTGAGGAAAGGATACAGACAGCGCCTG
AAGGTATGTTCATGTACATTGTTTAGTTGAAGAGAGAAATTCATATTATTAATTAAAGGTATGTTCATGTACATTGTTTAGTTGAAGAGAGAAATTCATATTATTAATTA
TGGTGGCTAATGCCTGTAATCCCAACTATTTGGGAGGCCAAGATGAGAGGATTGCTGGTGGCTAATGCCTGTAATCCCAACTATTTGGGAGGCCAAGATGAGAGGATTGC
ATAAAAAAGTTAGCTGGGAATGGTAGTGCATGCTTGTATTCCCAGCTACTCAGGAGGCTGATAAAAAAGTTAGCTGGGAATGGTAGTGCATGCTTGTATTCCCAGCTACTCAGGAGGCTG
TGCACTCCAGCTTGGGTGACACAG CAACCCTCTCTCTCTAAAAAAAAAAAAAAAAAGGTGCACTCCAGCTTGGGTGACACAG CAACCCTCTCTCTCTAAAAAAAAAAAAAAAAAGG
AAATAATCAGTTTCCTAAGATTTTTTTCCTGAAAAATACACATTTGGTTTCAAAATAATCAGTTTCCTAAGATTTTTTTCCTGAAAAATACACATTTGGTTTCA
ATGAAGTAAATCG ATTTGCTTTCAAAACCTTTATATTTGAATACAAATGTACTCCATGAAGTAAATCG ATTTGCTTTCAAAACCTTTATATTTGAATACAAATGTACTCC
250 Million * 75-108 Base fragments250 Million * 75-108 Base fragments
~1 TByte / day / machine~1 TByte / day / machine
Human Genome (3GBases)Human Genome (3GBases)
Economic Trends:
Cost of sequencing halves every 12
months.
• Wrong side of Moore's Law.
The Human genome project:
• 13 years.
• 23 labs.
• $500 Million.
A Human genome today:
• 3 days.
• 1 machine.
• $8,000.
Trend will continue:
• $1000 genome is probable within 2 years.
• Informatics not included.
The scary graph
Peak Yearly capillaryPeak Yearly capillary
sequencing: 30 Gbasesequencing: 30 Gbase
Current weekly sequencing:Current weekly sequencing:
7-10 Tbases7-10 Tbases
Data doubling Time: 4-6Data doubling Time: 4-6
months.months.
Gen III Sequencers this year?
PbytesPbytes!!
Sequencing data flow.
SequencerSequencerSequencerSequencer Processing/Processing/
QCQC
Processing/Processing/
QCQC
ComparativeComparative
analysisanalysis
ComparativeComparative
analysisanalysis ArchiveArchiveArchiveArchive
Structured dataStructured data
(databases)(databases)
Unstructured dataUnstructured data
(Flat files)(Flat files)
InternetInternetInternetInternet
AlignmentsAlignments
(200GB)(200GB)
Variation dataVariation data
(1GB)(1GB)
FeatureFeature
(3MB)(3MB)
Raw dataRaw data
(10 TB)(10 TB)
SequenceSequence
(500GB)(500GB)
A Sequencing Centre Today
CPU
• Generic x86_64 cluster.
• (16,000 cores)
Storage
• ~1 TB per day per sequencer.
• (15 PB disk)
• (Lustre + NFS)
Metadata driven data management
• Only keep our important files.
• Catalogue them, so we can find them!
• Keep the number of copies we want, and no more.
• (iRODS, in house LIMs).
A solved problem; we know how to do this.A solved problem; we know how to do this.
This is not big data
This is not big data either...
Proper Big Data
We want to compute across all the data.
• Sequencing data (of course).
• Patient records, treatment and outcomes.
Why?
• Cancer: tie in genetics, patient outcomes and treatments.
• Pharma: high failure rate due to genetic factors in drug response.
• Infectious disease epidemiology.
• Rare genetic diseases.
Many genetic effects are small
• Million member cohorts to get good signal:noise.
Translation: Genomics of drug
sensitivity in Cancer
Pre-treatmentPre-treatment BRAF inhibitorBRAF inhibitor
15 weeks of treatment15 weeks of treatment
molecularmolecular
diagnosticdiagnostic
BRAF mutation positiveBRAF mutation positive ✔✔
70% response rate vs 10% for standard chemotherapy70% response rate vs 10% for standard chemotherapy
BRAF Inhibitors in maligant melanomaBRAF Inhibitors in maligant melanoma
Slide from Mathew Garnet (CGP)Slide from Mathew Garnet (CGP)
Current Data Archives
EBI ERA / NCBI SRA store
results of all sequencing
experiments.
• Public data availability: A good
thing (tm)
• 1.6 Pbases
Problems
• Archives are “dark”.
• You can put data in, but you can't
do anything with it.
• In order to analyse the data, you
need to download it all.
• 100s of Tbytes
Situation replicated at local
Institute level too.
• eg How does CRI get hold of their
data currently held at Sanger?
The Vision
Global Alliance for sharing genomic and clinical data
• 70 research institutes & hospitals (including Sanger, Broad, EBI, BGI,
Cancer Research UK)
Million cancer genome warehouse
• (UC Berkeley)
Institute AInstitute AInstitute AInstitute A
To the Cloud!
DataData
AnalysisAnalysis
pipelinepipeline
Institute BInstitute BInstitute BInstitute B
DataData
AnalysisAnalysis
pipelinepipeline
DataData
DataData
DataData
AnalysisAnalysis
pipelinepipeline AnalysisAnalysis
pipelinepipeline
AnalysisAnalysis
pipelinepipeline
DataData
How do we get there?
Code & Algorithms
Bioinformatics code:
• Integer not FP heavy.
• Single threaded.
• Large memory footprints.
• Interpreted languages.
Not a good fit for future computing architectures.
Expensive to run on public clouds.
• Memory footprint leads to unused cores.
Out of scope for a data talk, but still an important point.
Architectural differences
Global File systemGlobal File system
cpucpucpucpu cpucpucpucpu cpucpucpucpu cpucpucpucpu
cpucpucpucpu
cpucpucpucpu
cpucpucpucpu
Object StoreObject Store
cpucpucpucpu
Fast NetworkFast Network Slow NetworkSlow Network
Static nodesStatic nodes
dynamic nodesdynamic nodes
VSVS
Whose Cloud?
A VM is just a VM, right?
• Clouds are supposed to be
programmable.
• Nobody wants to re-write a pipeline
when they move clouds.
Storage:
• Posix:
• (lustre/GPFS/EMC)?
• Object:
• Low level: AWS S3, Openstack
SWIFT, Ceph/rados
• High level: Data management
layer (eg iRODS)?
Cloud Interoperability?
• Do we need is more standards?!
Pragmatic approach:
• First person to make one that
actually works, wins.
Moving data
Data still has to get from our instruments
to the Cloud.
Good news:
• Lots of products out there for wide area data
movement.
Bad news:
• We are currently using all of them(!)
Network bandwidth still a problem.
• Research institutes have fast data networks.
• What about your GP's surgery?
UDT / UDRUDT / UDR
rsync / sshrsync / ssh
genetorrentgenetorrent
Identity Access
Unlikely that data archives are going to
allow anonymous access.
• Who are you?
Federated identify providers.
• Is everyone signed up to the same federation?
• Does it include the right mix of cross-national co-
operation?
• Does your favourite bit of software support
federated IDs?
Janet MoonshotJanet Moonshot
The LAW
Legal
• Theory: anonymised data can be stored and
accessed without jumping through hoops.
• Practice: Risk of re-identification. Becomes
easier the more data you have.
• Medical records are hard to anonymise
and still be useful.
Ethical
• Medical consent process adds more
restrictions above data-protection law.
• Limits data use & access even if
anonymised.
Controlled data access?
• No ad-hoc analysis.
• Access via restricted API only (“trusted
intermediary model”).
Policy development ongoing.
• Cross juristiction for added fun.
Summary
We know where we want to get to.
• No shortage of Vision
There are lots of interesting tools and technologies out
there.
• Getting them to work coherently together will be a challenge.
• Prototyping efforts are underway.
• Need to leverage expertese and experience in other fields.
Not simply technical issues:
• Significant policy issues need to be worked out.
• We have to bring the public along.
Acknowledgements
ISG:
•
• James Beal
• Helen Brimmer
• Pete Clapham
Global Alliance whitepaper:
http://www.sanger.ac.uk/about/press/assets/130605-white-paper.pdf
Million Cancer Genome Warehouse whitepaper
http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-211.html

Weitere ähnliche Inhalte

Was ist angesagt?

Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...Amazon Web Services
 
Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynotec.titus.brown
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
Cluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomesCluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomesGuy Coates
 
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Spark Summit
 
Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences EMC
 
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...EMC
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreUri Laserson
 
The Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
The Gordon Data-intensive Supercomputer. Enabling Scientific DiscoveryThe Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
The Gordon Data-intensive Supercomputer. Enabling Scientific DiscoveryIntel IT Center
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational ScienceChelle Gentemann
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grcc.titus.brown
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated GenomicsIdan Tohami
 
Cassava genome hub
Cassava genome hubCassava genome hub
Cassava genome hubCIAT
 

Was ist angesagt? (20)

Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
 
Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van Ham
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Cluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomesCluster Filesystems and the next 1000 human genomes
Cluster Filesystems and the next 1000 human genomes
 
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
Finding Needles in Genomic Haystacks with “Wide” Random Forest: Spark Summit ...
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences
 
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant Store
 
The Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
The Gordon Data-intensive Supercomputer. Enabling Scientific DiscoveryThe Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
The Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated Genomics
 
Cassava genome hub
Cassava genome hubCassava genome hub
Cassava genome hub
 

Andere mochten auch

2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen ChinaAllen Day, PhD
 
Declaring a TB outbreak over with genomics
Declaring a TB outbreak over with genomicsDeclaring a TB outbreak over with genomics
Declaring a TB outbreak over with genomicsJennifer Gardy
 
" Use of genomics for understanding and improving adaptation to climate chang...
" Use of genomics for understanding and improving adaptation to climate chang..." Use of genomics for understanding and improving adaptation to climate chang...
" Use of genomics for understanding and improving adaptation to climate chang...ExternalEvents
 
Groundnut improvement: Use of genetic and genomic tools
Groundnut improvement: Use of genetic and genomic toolsGroundnut improvement: Use of genetic and genomic tools
Groundnut improvement: Use of genetic and genomic toolsICRISAT
 
Genetic enhancement of groundnut for resistance to aflatoxin contamination
Genetic enhancement of groundnut for resistance to aflatoxin contaminationGenetic enhancement of groundnut for resistance to aflatoxin contamination
Genetic enhancement of groundnut for resistance to aflatoxin contaminationILRI
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / PhoenixAllen Day, PhD
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingHealth Catalyst
 

Andere mochten auch (7)

2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
Declaring a TB outbreak over with genomics
Declaring a TB outbreak over with genomicsDeclaring a TB outbreak over with genomics
Declaring a TB outbreak over with genomics
 
" Use of genomics for understanding and improving adaptation to climate chang...
" Use of genomics for understanding and improving adaptation to climate chang..." Use of genomics for understanding and improving adaptation to climate chang...
" Use of genomics for understanding and improving adaptation to climate chang...
 
Groundnut improvement: Use of genetic and genomic tools
Groundnut improvement: Use of genetic and genomic toolsGroundnut improvement: Use of genetic and genomic tools
Groundnut improvement: Use of genetic and genomic tools
 
Genetic enhancement of groundnut for resistance to aflatoxin contamination
Genetic enhancement of groundnut for resistance to aflatoxin contaminationGenetic enhancement of groundnut for resistance to aflatoxin contamination
Genetic enhancement of groundnut for resistance to aflatoxin contamination
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
 

Ähnlich wie Life sciences big data use cases

Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Pistoia Alliance
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
EBI Industry programme TCGA Warren KIbbe November 2013
EBI Industry programme TCGA Warren KIbbe November 2013EBI Industry programme TCGA Warren KIbbe November 2013
EBI Industry programme TCGA Warren KIbbe November 2013Warren Kibbe
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical ScienceAri Berman
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016Fiona Nielsen
 
Genomics: Big Data Leading to Big Opportunities
Genomics: Big Data Leading to Big OpportunitiesGenomics: Big Data Leading to Big Opportunities
Genomics: Big Data Leading to Big OpportunitiesHannes Smárason
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Amazon Web Services
 
International Cancer Genomics Consortium (ICGC) Data Coordinating Center
International Cancer Genomics Consortium (ICGC) Data Coordinating CenterInternational Cancer Genomics Consortium (ICGC) Data Coordinating Center
International Cancer Genomics Consortium (ICGC) Data Coordinating CenterNeuro, McGill University
 
FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014Warren Kibbe
 
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...Tom Connor
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?Al Dossetter
 
Using research software in a production environment
Using research software in a production environmentUsing research software in a production environment
Using research software in a production environmentMorgan Taschuk
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Networking Summit
 
Careers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and JobsCareers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and JobsM Abdullah Chaudhry
 
Big Data and Smart Healthcare
Big Data and Smart Healthcare Big Data and Smart Healthcare
Big Data and Smart Healthcare Sujan Perera
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsManuel Corpas
 

Ähnlich wie Life sciences big data use cases (20)

Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
Sequence analysis in the regulated domain - A Pistoia Alliance Debates webina...
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
EBI Industry programme TCGA Warren KIbbe November 2013
EBI Industry programme TCGA Warren KIbbe November 2013EBI Industry programme TCGA Warren KIbbe November 2013
EBI Industry programme TCGA Warren KIbbe November 2013
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
 
Nov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_finalNov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_final
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016
 
Genomics: Big Data Leading to Big Opportunities
Genomics: Big Data Leading to Big OpportunitiesGenomics: Big Data Leading to Big Opportunities
Genomics: Big Data Leading to Big Opportunities
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
 
International Cancer Genomics Consortium (ICGC) Data Coordinating Center
International Cancer Genomics Consortium (ICGC) Data Coordinating CenterInternational Cancer Genomics Consortium (ICGC) Data Coordinating Center
International Cancer Genomics Consortium (ICGC) Data Coordinating Center
 
FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014
 
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
 
Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03
 
Using research software in a production environment
Using research software in a production environmentUsing research software in a production environment
Using research software in a production environment
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of Cancer
 
Careers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and JobsCareers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and Jobs
 
Big Data and Smart Healthcare
Big Data and Smart Healthcare Big Data and Smart Healthcare
Big Data and Smart Healthcare
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 

Kürzlich hochgeladen

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Kürzlich hochgeladen (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Life sciences big data use cases

  • 1. Big data and Life Sciences Guy Coates Wellcome Trust Sanger Institute gmpc@sanger.ac.uk
  • 2. The Sanger Institute Funded by Wellcome Trust. • 2nd largest research charity in the world. • ~700 employees. • Based in Hinxton Genome Campus, Cambridge, UK. Large scale genomic research. • Sequenced 1/3 of the human genome. (largest single contributor). • Large scale sequencing with an impact on human and animal health. Data is freely available. • Websites, ftp, direct database access, programmatic APIs. • Some restrictions for potentially identifiable data. My team: • Scientific computing systems architects.
  • 3. DNA Sequencing TCTTTATTTTAGCTGGACCAGACCAATTTTGAGGAAAGGATACAGACAGCGCCTGTCTTTATTTTAGCTGGACCAGACCAATTTTGAGGAAAGGATACAGACAGCGCCTG AAGGTATGTTCATGTACATTGTTTAGTTGAAGAGAGAAATTCATATTATTAATTAAAGGTATGTTCATGTACATTGTTTAGTTGAAGAGAGAAATTCATATTATTAATTA TGGTGGCTAATGCCTGTAATCCCAACTATTTGGGAGGCCAAGATGAGAGGATTGCTGGTGGCTAATGCCTGTAATCCCAACTATTTGGGAGGCCAAGATGAGAGGATTGC ATAAAAAAGTTAGCTGGGAATGGTAGTGCATGCTTGTATTCCCAGCTACTCAGGAGGCTGATAAAAAAGTTAGCTGGGAATGGTAGTGCATGCTTGTATTCCCAGCTACTCAGGAGGCTG TGCACTCCAGCTTGGGTGACACAG CAACCCTCTCTCTCTAAAAAAAAAAAAAAAAAGGTGCACTCCAGCTTGGGTGACACAG CAACCCTCTCTCTCTAAAAAAAAAAAAAAAAAGG AAATAATCAGTTTCCTAAGATTTTTTTCCTGAAAAATACACATTTGGTTTCAAAATAATCAGTTTCCTAAGATTTTTTTCCTGAAAAATACACATTTGGTTTCA ATGAAGTAAATCG ATTTGCTTTCAAAACCTTTATATTTGAATACAAATGTACTCCATGAAGTAAATCG ATTTGCTTTCAAAACCTTTATATTTGAATACAAATGTACTCC 250 Million * 75-108 Base fragments250 Million * 75-108 Base fragments ~1 TByte / day / machine~1 TByte / day / machine Human Genome (3GBases)Human Genome (3GBases)
  • 4. Economic Trends: Cost of sequencing halves every 12 months. • Wrong side of Moore's Law. The Human genome project: • 13 years. • 23 labs. • $500 Million. A Human genome today: • 3 days. • 1 machine. • $8,000. Trend will continue: • $1000 genome is probable within 2 years. • Informatics not included.
  • 5. The scary graph Peak Yearly capillaryPeak Yearly capillary sequencing: 30 Gbasesequencing: 30 Gbase Current weekly sequencing:Current weekly sequencing: 7-10 Tbases7-10 Tbases Data doubling Time: 4-6Data doubling Time: 4-6 months.months.
  • 6. Gen III Sequencers this year?
  • 7. PbytesPbytes!! Sequencing data flow. SequencerSequencerSequencerSequencer Processing/Processing/ QCQC Processing/Processing/ QCQC ComparativeComparative analysisanalysis ComparativeComparative analysisanalysis ArchiveArchiveArchiveArchive Structured dataStructured data (databases)(databases) Unstructured dataUnstructured data (Flat files)(Flat files) InternetInternetInternetInternet AlignmentsAlignments (200GB)(200GB) Variation dataVariation data (1GB)(1GB) FeatureFeature (3MB)(3MB) Raw dataRaw data (10 TB)(10 TB) SequenceSequence (500GB)(500GB)
  • 8. A Sequencing Centre Today CPU • Generic x86_64 cluster. • (16,000 cores) Storage • ~1 TB per day per sequencer. • (15 PB disk) • (Lustre + NFS) Metadata driven data management • Only keep our important files. • Catalogue them, so we can find them! • Keep the number of copies we want, and no more. • (iRODS, in house LIMs). A solved problem; we know how to do this.A solved problem; we know how to do this.
  • 9. This is not big data
  • 10. This is not big data either...
  • 11. Proper Big Data We want to compute across all the data. • Sequencing data (of course). • Patient records, treatment and outcomes. Why? • Cancer: tie in genetics, patient outcomes and treatments. • Pharma: high failure rate due to genetic factors in drug response. • Infectious disease epidemiology. • Rare genetic diseases. Many genetic effects are small • Million member cohorts to get good signal:noise.
  • 12. Translation: Genomics of drug sensitivity in Cancer Pre-treatmentPre-treatment BRAF inhibitorBRAF inhibitor 15 weeks of treatment15 weeks of treatment molecularmolecular diagnosticdiagnostic BRAF mutation positiveBRAF mutation positive ✔✔ 70% response rate vs 10% for standard chemotherapy70% response rate vs 10% for standard chemotherapy BRAF Inhibitors in maligant melanomaBRAF Inhibitors in maligant melanoma Slide from Mathew Garnet (CGP)Slide from Mathew Garnet (CGP)
  • 13. Current Data Archives EBI ERA / NCBI SRA store results of all sequencing experiments. • Public data availability: A good thing (tm) • 1.6 Pbases Problems • Archives are “dark”. • You can put data in, but you can't do anything with it. • In order to analyse the data, you need to download it all. • 100s of Tbytes Situation replicated at local Institute level too. • eg How does CRI get hold of their data currently held at Sanger?
  • 14. The Vision Global Alliance for sharing genomic and clinical data • 70 research institutes & hospitals (including Sanger, Broad, EBI, BGI, Cancer Research UK) Million cancer genome warehouse • (UC Berkeley)
  • 15. Institute AInstitute AInstitute AInstitute A To the Cloud! DataData AnalysisAnalysis pipelinepipeline Institute BInstitute BInstitute BInstitute B DataData AnalysisAnalysis pipelinepipeline DataData DataData DataData AnalysisAnalysis pipelinepipeline AnalysisAnalysis pipelinepipeline AnalysisAnalysis pipelinepipeline DataData
  • 16. How do we get there?
  • 17. Code & Algorithms Bioinformatics code: • Integer not FP heavy. • Single threaded. • Large memory footprints. • Interpreted languages. Not a good fit for future computing architectures. Expensive to run on public clouds. • Memory footprint leads to unused cores. Out of scope for a data talk, but still an important point.
  • 18. Architectural differences Global File systemGlobal File system cpucpucpucpu cpucpucpucpu cpucpucpucpu cpucpucpucpu cpucpucpucpu cpucpucpucpu cpucpucpucpu Object StoreObject Store cpucpucpucpu Fast NetworkFast Network Slow NetworkSlow Network Static nodesStatic nodes dynamic nodesdynamic nodes VSVS
  • 19. Whose Cloud? A VM is just a VM, right? • Clouds are supposed to be programmable. • Nobody wants to re-write a pipeline when they move clouds. Storage: • Posix: • (lustre/GPFS/EMC)? • Object: • Low level: AWS S3, Openstack SWIFT, Ceph/rados • High level: Data management layer (eg iRODS)? Cloud Interoperability? • Do we need is more standards?! Pragmatic approach: • First person to make one that actually works, wins.
  • 20. Moving data Data still has to get from our instruments to the Cloud. Good news: • Lots of products out there for wide area data movement. Bad news: • We are currently using all of them(!) Network bandwidth still a problem. • Research institutes have fast data networks. • What about your GP's surgery? UDT / UDRUDT / UDR rsync / sshrsync / ssh genetorrentgenetorrent
  • 21. Identity Access Unlikely that data archives are going to allow anonymous access. • Who are you? Federated identify providers. • Is everyone signed up to the same federation? • Does it include the right mix of cross-national co- operation? • Does your favourite bit of software support federated IDs? Janet MoonshotJanet Moonshot
  • 22. The LAW Legal • Theory: anonymised data can be stored and accessed without jumping through hoops. • Practice: Risk of re-identification. Becomes easier the more data you have. • Medical records are hard to anonymise and still be useful. Ethical • Medical consent process adds more restrictions above data-protection law. • Limits data use & access even if anonymised. Controlled data access? • No ad-hoc analysis. • Access via restricted API only (“trusted intermediary model”). Policy development ongoing. • Cross juristiction for added fun.
  • 23. Summary We know where we want to get to. • No shortage of Vision There are lots of interesting tools and technologies out there. • Getting them to work coherently together will be a challenge. • Prototyping efforts are underway. • Need to leverage expertese and experience in other fields. Not simply technical issues: • Significant policy issues need to be worked out. • We have to bring the public along.
  • 24. Acknowledgements ISG: • • James Beal • Helen Brimmer • Pete Clapham Global Alliance whitepaper: http://www.sanger.ac.uk/about/press/assets/130605-white-paper.pdf Million Cancer Genome Warehouse whitepaper http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-211.html

Hinweis der Redaktion

  1. Sequencing the start of most analysis People = Umanaged data Data in wrong place Duplicated Nobody can find anything Inc systems:Backups/security Capacity planning?