SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Bioinformatics of TB
A case study in big data
Peter van Heusden
pvh@sanbi.ac.za
and Alan Christoffels
South African National Bioinformatics Institute
University of the Western Cape
Bellville, South Africa
January 2015
The plummeting cost of sequencing
M. tuberculosis
Widespread pathogen, responsible for 1.3 million deaths annually
Genome size ~4 megabases
Illumina NGS sequencing run ~2 gigabytes (uncompressed)
M. tuberculosis
Widespread pathogen, responsible for 1.3 million deaths annually
Genome size ~4 megabases
Illumina NGS sequencing run ~2 gigabytes (uncompressed)
Typical student project (2014)
1. Gather data (on hard disk / over network)
2. Run annotation pipeline (compute time < 1 week, disk used 20 to
40 GB)
3. Examine significance of variation compared to “reference
sequence”
What’s coming down the pipe
In South Africa alone we have access to samples from several
thousand strains of TB
Low cost of sequencing means
1. More depth: capture population of pathogens in single patient
2. More length: study progression of infection in a patient
3. More breadth: build in depth regional or global picture of pathogen
sequence
Mapping a virulent TB strain
“Evolutionary history and global spread of the Mycobacterium
tuberculosis Beijing lineage” Merker et al (2015)
Beijing lineage strains associated with Multi-Drug Resistant
(MDR) TB spread worldwide
Studied 4987 isolates, fully sequenced 110 representatives
Mapped 6 clonal complexes and ancestral base sublineage
Paper presents wealth of different data types:
1. DNA reads
2. Genotyping
3. Phylogeny
4. Geospatial
5. Time series data
6. Metadata on samples and experiments
More data: not more of the same
Existing publishing puts focus on results not data
Research data is very seldom FAIR:
1. Findable
2. Accessible
3. Interpretable
4. Reusable
(j.mp/fairdata1)
Change data handling, change research results
In the 21st century, much of the vast volume of scientific
data captured by new instruments on a 24/7 basis, along
with information generated in the artificial worlds of
computer models, is likely to reside forever in a live,
substantially publicly accessible, curated state for the
purposes of continued analysis. This analysis will result in
the development of many new theories! (Jim Gray)
“Big” in “Big Data” is not (only) about data volume
Cheap pathogen sequencing is driving complexity of questions
that can be asked of data
...but only if data is FAIR
Why we’re not all riding to work on unicorns
[W]e now have terrible data management tools for most of
the science disciplines. . . . When you go and look at what
scientists are doing, day in and day out, in terms of data
analysis, it is truly dreadful. (Jim Gray)
Who curates your data?
How is it managed?
Where is it analysed?
And who gets access?
Future directions for SANBI (data management) research
Research programme is necessarily modest:
1. Cross-institution authentication, authorisation and movement of
data
2. New storage technologies
3. Data repositories in addition to filesystems
4. Storing and querying data on sequence collections, not individual
samples
Future directions for SANBI (data management) research
Research programme is necessarily modest:
1. Cross-institution authentication, authorisation and movement of
data
2. New storage technologies
3. Data repositories in addition to filesystems
4. Storing and querying data on sequence collections, not individual
samples
Individual institutes can only prototype solutions: scale of the
challenge will require much broader collaborative development

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Technology R&D Theme 2: From Descriptive to Predictive Networks
Technology R&D Theme 2: From Descriptive to Predictive NetworksTechnology R&D Theme 2: From Descriptive to Predictive Networks
Technology R&D Theme 2: From Descriptive to Predictive Networks
 
Algorithmic approach to computational biology using graphs
Algorithmic approach to computational biology using graphsAlgorithmic approach to computational biology using graphs
Algorithmic approach to computational biology using graphs
 
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomi...
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
 
iOmics
iOmicsiOmics
iOmics
 
NetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang SuNetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang Su
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
 
AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinar
 
An Introduction to Biology with Computers
An Introduction to Biology with ComputersAn Introduction to Biology with Computers
An Introduction to Biology with Computers
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
 
Bioinformatics in medicine
Bioinformatics in medicineBioinformatics in medicine
Bioinformatics in medicine
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Microsoft genomics to advance clinical science
Microsoft genomics to advance clinical scienceMicrosoft genomics to advance clinical science
Microsoft genomics to advance clinical science
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 
DOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biology
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New Science
 
EFO tools - the good, the great, and the evil
EFO tools - the good, the great, and the evilEFO tools - the good, the great, and the evil
EFO tools - the good, the great, and the evil
 
Cheminformatics Workflows Using Mobile Apps for Drug Discovery
Cheminformatics Workflows Using Mobile Apps for Drug DiscoveryCheminformatics Workflows Using Mobile Apps for Drug Discovery
Cheminformatics Workflows Using Mobile Apps for Drug Discovery
 
LarsJuhlJensen2020
LarsJuhlJensen2020LarsJuhlJensen2020
LarsJuhlJensen2020
 

Andere mochten auch (8)

Writing Galaxy Tools
Writing Galaxy ToolsWriting Galaxy Tools
Writing Galaxy Tools
 
A case of MDR-TB
A case of MDR-TBA case of MDR-TB
A case of MDR-TB
 
Tuberculosis Presentation
Tuberculosis PresentationTuberculosis Presentation
Tuberculosis Presentation
 
Tuberculosis
TuberculosisTuberculosis
Tuberculosis
 
A Case of TB meningitis with Pituitary TB
A Case of TB meningitis with Pituitary TBA Case of TB meningitis with Pituitary TB
A Case of TB meningitis with Pituitary TB
 
Antibiotic resistance
Antibiotic resistance Antibiotic resistance
Antibiotic resistance
 
Pulmonary tuberculosis..ptt
Pulmonary tuberculosis..pttPulmonary tuberculosis..ptt
Pulmonary tuberculosis..ptt
 
Tuberculosis presentation
Tuberculosis presentationTuberculosis presentation
Tuberculosis presentation
 

Ähnlich wie Bioinformatics of TB: A case study in big data

NLBIF_NIOO_2017v3
NLBIF_NIOO_2017v3NLBIF_NIOO_2017v3
NLBIF_NIOO_2017v3
Jan Kuiper
 
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Jake Chen
 

Ähnlich wie Bioinformatics of TB: A case study in big data (20)

Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.caGenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
 
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meetingDay 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
 
NLBIF_NIOO_2017v3
NLBIF_NIOO_2017v3NLBIF_NIOO_2017v3
NLBIF_NIOO_2017v3
 
PhD defense Julien Troudet (29/11/2017)
PhD defense Julien Troudet (29/11/2017)PhD defense Julien Troudet (29/11/2017)
PhD defense Julien Troudet (29/11/2017)
 
Big data
Big dataBig data
Big data
 
Museum collections as research data - October 2019
Museum collections as research data - October 2019Museum collections as research data - October 2019
Museum collections as research data - October 2019
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sgScott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
 
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
 
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical...
 
Genomics: Big Data Leading to Big Opportunities
Genomics: Big Data Leading to Big OpportunitiesGenomics: Big Data Leading to Big Opportunities
Genomics: Big Data Leading to Big Opportunities
 
Data dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data DiscoveryData dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data Discovery
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 

Kürzlich hochgeladen

Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
mahaiklolahd
 

Kürzlich hochgeladen (20)

Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
 
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 9332606886 ⟟ Call Me For G...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟  9332606886 ⟟ Call Me For G...Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟  9332606886 ⟟ Call Me For G...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 9332606886 ⟟ Call Me For G...
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
 
Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 8250077686 Top Class Call Girl Service Ava...
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
 
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Raipur Just Call 9630942363 Top Class Call Girl Service Available
 
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Guntur  Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Guntur  Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
 
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
 

Bioinformatics of TB: A case study in big data

  • 1. Bioinformatics of TB A case study in big data Peter van Heusden pvh@sanbi.ac.za and Alan Christoffels South African National Bioinformatics Institute University of the Western Cape Bellville, South Africa January 2015
  • 2. The plummeting cost of sequencing
  • 3. M. tuberculosis Widespread pathogen, responsible for 1.3 million deaths annually Genome size ~4 megabases Illumina NGS sequencing run ~2 gigabytes (uncompressed)
  • 4. M. tuberculosis Widespread pathogen, responsible for 1.3 million deaths annually Genome size ~4 megabases Illumina NGS sequencing run ~2 gigabytes (uncompressed) Typical student project (2014) 1. Gather data (on hard disk / over network) 2. Run annotation pipeline (compute time < 1 week, disk used 20 to 40 GB) 3. Examine significance of variation compared to “reference sequence”
  • 5. What’s coming down the pipe In South Africa alone we have access to samples from several thousand strains of TB Low cost of sequencing means 1. More depth: capture population of pathogens in single patient 2. More length: study progression of infection in a patient 3. More breadth: build in depth regional or global picture of pathogen sequence
  • 6. Mapping a virulent TB strain “Evolutionary history and global spread of the Mycobacterium tuberculosis Beijing lineage” Merker et al (2015) Beijing lineage strains associated with Multi-Drug Resistant (MDR) TB spread worldwide Studied 4987 isolates, fully sequenced 110 representatives Mapped 6 clonal complexes and ancestral base sublineage Paper presents wealth of different data types: 1. DNA reads 2. Genotyping 3. Phylogeny 4. Geospatial 5. Time series data 6. Metadata on samples and experiments
  • 7. More data: not more of the same Existing publishing puts focus on results not data Research data is very seldom FAIR: 1. Findable 2. Accessible 3. Interpretable 4. Reusable (j.mp/fairdata1)
  • 8. Change data handling, change research results In the 21st century, much of the vast volume of scientific data captured by new instruments on a 24/7 basis, along with information generated in the artificial worlds of computer models, is likely to reside forever in a live, substantially publicly accessible, curated state for the purposes of continued analysis. This analysis will result in the development of many new theories! (Jim Gray) “Big” in “Big Data” is not (only) about data volume Cheap pathogen sequencing is driving complexity of questions that can be asked of data ...but only if data is FAIR
  • 9.
  • 10. Why we’re not all riding to work on unicorns [W]e now have terrible data management tools for most of the science disciplines. . . . When you go and look at what scientists are doing, day in and day out, in terms of data analysis, it is truly dreadful. (Jim Gray) Who curates your data? How is it managed? Where is it analysed? And who gets access?
  • 11. Future directions for SANBI (data management) research Research programme is necessarily modest: 1. Cross-institution authentication, authorisation and movement of data 2. New storage technologies 3. Data repositories in addition to filesystems 4. Storing and querying data on sequence collections, not individual samples
  • 12. Future directions for SANBI (data management) research Research programme is necessarily modest: 1. Cross-institution authentication, authorisation and movement of data 2. New storage technologies 3. Data repositories in addition to filesystems 4. Storing and querying data on sequence collections, not individual samples Individual institutes can only prototype solutions: scale of the challenge will require much broader collaborative development