SlideShare ist ein Scribd-Unternehmen logo
1 von 14
The HDF Group




            BioHDF
  Open Binary File Formats for
Next-Generation Sequencing Data
                     Current Status and Future Directions


                                 Dana Robinson
                                The HDF Group
                             derobins@hdfgroup.org

                      Copyright © 2010 The HDF Group. All Rights Reserved
 July 9, 2010                                                          1    www.hdfgroup.org
NGS Data Challenges



                               Very large quantities of data
                               (100s of GB)

                               "Drinking from the firehose"




   Analysis methods vary greatly, so a flexible yet unified
   data store would be useful.


July 9, 2010                                    2   www.hdfgroup.org
What is Needed

A Data Model
A data model which accurately describes the data and can
be expanded to contain new types of data

A Data Store
A file format or data store which is efficient in access time
and storage size and which scales well

A Toolkit
A flexible software toolkit that can be used to create tools
and pipelines based on the data model and file format



 July 9, 2010                                    3    www.hdfgroup.org
What is BioHDF?
An open-source, community-driven project, funded by an NIH
SBIR grant and led by Geospiza, Inc. in collaboration with
The HDF Group.


BioHDF is a particular arrangement of objects in an HDF5
file (similar to a database schema)


BioHDF is a library and C API which can be used to write
applications (coming soon)


BioHDF is a set of command line tools for
storing, retrieving and manipulating data in BioHDF files
 July 9, 2010                                   4   www.hdfgroup.org
HDF = Hierarchical Data Format

An example of how data is stored in HDF5


                  somefile.h5                        datasets
                  /
                        Reads/


                        Alignments/   is_sorted
groups
                        References                   attributes


 July 9, 2010                                 5   www.hdfgroup.org
Benefits of BioHDF
• Portability and data sharing:
Platform independent, endian independent, self
describing, common data models.

• High performance:
Fast random access and efficient, scalable, petabyte level
compressed storage.

• Widespread adoption:
MATLAB, IDL, NASA-Earth Observing System, Pacific
Biosciences, SOLiD, 100's of products.

• 20 year history:
Robust, performance tuned, and well supported by The HDF
Group, an independent non-profit entity.
  July 9, 2010                                   6   www.hdfgroup.org
HDF in Bioinformatics

•   Baylor Imaging Group
•   Life Technologies
•   Pacific Biosciences
•   Oxford Nanopore
•   GenomeData (UW)
•   Geospiza
•   Others




July 9, 2010                           www.hdfgroup.org
Data Stored

The prototype BioHDF stores

Reads

Alignments

Annotations

Clusters of Aligned Reads

Reference Sequences

Indexes (NCList or simple)

  July 9, 2010                      8   www.hdfgroup.org
Data Stored

Additional user-specific data can be stored without breaking
the library or tools.



                                           Similar to how
                   BioHDF                  adding additional
                    Data                   tables to a
                                           database schema
                                           does not invalidate
                                           existing queries.
                 User-Specific
                    Data


  July 9, 2010                                  9    www.hdfgroup.org
Project Stages

A "pipeline prototype " set of tools to demonstrate the
suitability of HDF5 for NGS data storage.


An version 1.0 release of a BioHDF library and C API targeting
the functionality of samtools.


A higher-level C API that abstracts out and hides the
underlying storage technology.




  July 9, 2010                                  10   www.hdfgroup.org
HDF5 API and Applications

                  BioHDF Applications and
                 Wrappers (e.g. Perl, Python)

                High-Level API


                    BioHDF API


                          HDF5 API


                       Physical Storage

July 9, 2010                                    11   www.hdfgroup.org
A Higher-Level API

A high-level API will encapsulate and hide the underlying
storage technology.


   low-level
    C APIs                                     samtools
     BioHDF
       API               high-level                   tool
                           C API

        BAM                                     wrapper
        API


  July 9, 2010                                 12   www.hdfgroup.org
Acknowledgements



                        Geospiza
                       Todd Smith
                       Mark Welsh

                      The HDF Group
                        Mike Folk



BioHDF is supported by NIH SBIR Phase II grant HG003792
               awarded to Geospiza, Inc.


 July 9, 2010                             13   www.hdfgroup.org
The HDF Group




                Thank you for your time!
               If you are interested in using or contributing to
                         BioHDF, please contact us!

                 Dana Robinson (derobins@hdfgroup.org)

                               http://www.biohdf.org

                   BOSC BoF: Friday 5:10-6:00

                   ISMB Poster J18: Monday, July 12: 12:40-2:30

                   ISMB BoF: Tuesday, July 13 1-2 pm, room 306
July 9, 2010                                              14      www.hdfgroup.org

Weitere ähnliche Inhalte

Was ist angesagt?

Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Dag Endresen
 
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)Dag Endresen
 
Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)Dag Endresen
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016Dag Endresen
 
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014Dag Endresen
 
BioCASE web services for germplasm data sets, at FAO, Rome (2006)
BioCASE web services for germplasm data sets, at FAO, Rome (2006)BioCASE web services for germplasm data sets, at FAO, Rome (2006)
BioCASE web services for germplasm data sets, at FAO, Rome (2006)Dag Endresen
 
鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107皓仁 柯
 
DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013Frauke Ziedorn
 
#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishing#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishingDag Endresen
 
Cross-Community User Requirements and the Biodiversity Heritage Library
Cross-Community User Requirements and the Biodiversity Heritage LibraryCross-Community User Requirements and the Biodiversity Heritage Library
Cross-Community User Requirements and the Biodiversity Heritage LibraryChris Freeland
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Dag Endresen
 
Workshop about research data archiving and open access publishing at the Rese...
Workshop about research data archiving and open access publishing at the Rese...Workshop about research data archiving and open access publishing at the Rese...
Workshop about research data archiving and open access publishing at the Rese...Dag Endresen
 
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Dag Endresen
 
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Dag Endresen
 
Integrated database biology with well-curated and circulated knowledge
Integrated database biology with well-curated and circulated knowledgeIntegrated database biology with well-curated and circulated knowledge
Integrated database biology with well-curated and circulated knowledgeHidemasa Bono
 
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)Dag Endresen
 
GBIF registry (GBRDS), at European Nodes meeting in Alicante, Spain (10 March...
GBIF registry (GBRDS), at European Nodes meeting in Alicante, Spain (10 March...GBIF registry (GBRDS), at European Nodes meeting in Alicante, Spain (10 March...
GBIF registry (GBRDS), at European Nodes meeting in Alicante, Spain (10 March...Dag Endresen
 
Darwin Core extension for germplasm (11th December 2013)
Darwin Core extension for germplasm (11th December 2013)Darwin Core extension for germplasm (11th December 2013)
Darwin Core extension for germplasm (11th December 2013)Dag Endresen
 

Was ist angesagt? (20)

Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013
 
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
EURISCO needs and priorities, at CGIAR ICT-KM Workshop, IPGRI, Rome (2005)
 
Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016
 
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
 
BioCASE web services for germplasm data sets, at FAO, Rome (2006)
BioCASE web services for germplasm data sets, at FAO, Rome (2006)BioCASE web services for germplasm data sets, at FAO, Rome (2006)
BioCASE web services for germplasm data sets, at FAO, Rome (2006)
 
鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107
 
DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013DataCite and its DOI infrastructure - IASSIST 2013
DataCite and its DOI infrastructure - IASSIST 2013
 
#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishing#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishing
 
Cross-Community User Requirements and the Biodiversity Heritage Library
Cross-Community User Requirements and the Biodiversity Heritage LibraryCross-Community User Requirements and the Biodiversity Heritage Library
Cross-Community User Requirements and the Biodiversity Heritage Library
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
 
Workshop about research data archiving and open access publishing at the Rese...
Workshop about research data archiving and open access publishing at the Rese...Workshop about research data archiving and open access publishing at the Rese...
Workshop about research data archiving and open access publishing at the Rese...
 
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
 
Spitzer Preprints and the Research Workflow
Spitzer Preprints and the Research WorkflowSpitzer Preprints and the Research Workflow
Spitzer Preprints and the Research Workflow
 
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
 
Integrated database biology with well-curated and circulated knowledge
Integrated database biology with well-curated and circulated knowledgeIntegrated database biology with well-curated and circulated knowledge
Integrated database biology with well-curated and circulated knowledge
 
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
 
GBIF registry (GBRDS), at European Nodes meeting in Alicante, Spain (10 March...
GBIF registry (GBRDS), at European Nodes meeting in Alicante, Spain (10 March...GBIF registry (GBRDS), at European Nodes meeting in Alicante, Spain (10 March...
GBIF registry (GBRDS), at European Nodes meeting in Alicante, Spain (10 March...
 
Darwin Core extension for germplasm (11th December 2013)
Darwin Core extension for germplasm (11th December 2013)Darwin Core extension for germplasm (11th December 2013)
Darwin Core extension for germplasm (11th December 2013)
 
圖書館趨勢觀察
圖書館趨勢觀察圖書館趨勢觀察
圖書館趨勢觀察
 

Andere mochten auch

Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...Luis Cipriani
 
Vänsterpartiet - Tisdagens frukostseminarie i Almedalen
Vänsterpartiet - Tisdagens frukostseminarie i AlmedalenVänsterpartiet - Tisdagens frukostseminarie i Almedalen
Vänsterpartiet - Tisdagens frukostseminarie i AlmedalenInfopaq Sverige
 
605專屬搭畢業特輯
605專屬搭畢業特輯605專屬搭畢業特輯
605專屬搭畢業特輯musicghost
 
Graduate Students Workshop
Graduate Students Workshop Graduate Students Workshop
Graduate Students Workshop Naz Torabi
 
How To Use Your Website to Get Customers
How To Use Your Website to Get CustomersHow To Use Your Website to Get Customers
How To Use Your Website to Get CustomersclickTRUE
 
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development Naz Torabi
 
Making Your Apps More Sociable
Making Your Apps More SociableMaking Your Apps More Sociable
Making Your Apps More SociableSamsung
 
HP Programvare SPOR 3
HP Programvare SPOR 3HP Programvare SPOR 3
HP Programvare SPOR 3HP Norge
 
IPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videosIPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videosKevin Amboe
 
H σαλαμινα στις τεχνες
H σαλαμινα στις τεχνεςH σαλαμινα στις τεχνες
H σαλαμινα στις τεχνεςRallou Thoma
 
Marketing Busuness Art 2012
Marketing Busuness Art 2012Marketing Busuness Art 2012
Marketing Busuness Art 2012Arif Mahmood
 
Benjamín Arditi (Democracia postliberal participativa)
Benjamín Arditi (Democracia postliberal participativa)Benjamín Arditi (Democracia postliberal participativa)
Benjamín Arditi (Democracia postliberal participativa)Adolfo Orive
 
mobility programs for education
mobility programs for educationmobility programs for education
mobility programs for educationRosario Outes
 
Presentation pl
Presentation plPresentation pl
Presentation plAndrzej
 
DiNapoli Family Trip to Italy
DiNapoli Family Trip to ItalyDiNapoli Family Trip to Italy
DiNapoli Family Trip to Italytomdinapoli
 
Portfolio Presentation1
Portfolio Presentation1Portfolio Presentation1
Portfolio Presentation1jamespiatt
 

Andere mochten auch (20)

Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
Explaining A Programming Model for Context-Aware Applications in Large-Scale ...
 
Vänsterpartiet - Tisdagens frukostseminarie i Almedalen
Vänsterpartiet - Tisdagens frukostseminarie i AlmedalenVänsterpartiet - Tisdagens frukostseminarie i Almedalen
Vänsterpartiet - Tisdagens frukostseminarie i Almedalen
 
605專屬搭畢業特輯
605專屬搭畢業特輯605專屬搭畢業特輯
605專屬搭畢業特輯
 
Linked In Power Point 2
Linked In Power Point 2Linked In Power Point 2
Linked In Power Point 2
 
Graduate Students Workshop
Graduate Students Workshop Graduate Students Workshop
Graduate Students Workshop
 
How To Use Your Website to Get Customers
How To Use Your Website to Get CustomersHow To Use Your Website to Get Customers
How To Use Your Website to Get Customers
 
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
 
Making Your Apps More Sociable
Making Your Apps More SociableMaking Your Apps More Sociable
Making Your Apps More Sociable
 
HP Programvare SPOR 3
HP Programvare SPOR 3HP Programvare SPOR 3
HP Programvare SPOR 3
 
IPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videosIPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videos
 
H σαλαμινα στις τεχνες
H σαλαμινα στις τεχνεςH σαλαμινα στις τεχνες
H σαλαμινα στις τεχνες
 
Marketing Busuness Art 2012
Marketing Busuness Art 2012Marketing Busuness Art 2012
Marketing Busuness Art 2012
 
Latest trends in em
Latest trends in emLatest trends in em
Latest trends in em
 
Benjamín Arditi (Democracia postliberal participativa)
Benjamín Arditi (Democracia postliberal participativa)Benjamín Arditi (Democracia postliberal participativa)
Benjamín Arditi (Democracia postliberal participativa)
 
mobility programs for education
mobility programs for educationmobility programs for education
mobility programs for education
 
Presentation pl
Presentation plPresentation pl
Presentation pl
 
DiNapoli Family Trip to Italy
DiNapoli Family Trip to ItalyDiNapoli Family Trip to Italy
DiNapoli Family Trip to Italy
 
4wd coupon
4wd coupon4wd coupon
4wd coupon
 
Portfolio Presentation1
Portfolio Presentation1Portfolio Presentation1
Portfolio Presentation1
 
Cultural diff
Cultural diffCultural diff
Cultural diff
 

Ähnlich wie Robinson bosc2010 bio_hdf

Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012 Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012 Stephen Katz
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
University of Minho Data Repository - features to publish & share data and w...
University of Minho Data Repository - features to publish & share data and  w...University of Minho Data Repository - features to publish & share data and  w...
University of Minho Data Repository - features to publish & share data and w...Pedro Príncipe
 
An On-line Collaborative Data Management System
An On-line Collaborative Data Management SystemAn On-line Collaborative Data Management System
An On-line Collaborative Data Management SystemCameron Kiddle
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
GBIF: An infrastructure for infrastructures
GBIF: An infrastructure for infrastructures GBIF: An infrastructure for infrastructures
GBIF: An infrastructure for infrastructures Francisco Pando
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Brain Imaging Data Structure and Center for Reproducible Neuroscince
Brain Imaging Data Structure and Center for Reproducible NeuroscinceBrain Imaging Data Structure and Center for Reproducible Neuroscince
Brain Imaging Data Structure and Center for Reproducible NeuroscinceKrzysztof Gorgolewski
 
Good (enough) research data management practices
Good (enough) research data management practicesGood (enough) research data management practices
Good (enough) research data management practicesLeon Osinski
 
Accessing Digital Collections Data Sources for Research: A Tour of iDigBio Da...
Accessing Digital Collections Data Sources for Research: A Tour of iDigBio Da...Accessing Digital Collections Data Sources for Research: A Tour of iDigBio Da...
Accessing Digital Collections Data Sources for Research: A Tour of iDigBio Da...Matthew J Collins
 
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic DatasetsDiversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic DatasetsAdila Krisnadhi
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptxvijayapraba1
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Arohi Khandelwal
 
What funders want you to do with your data
What funders want you to do with your dataWhat funders want you to do with your data
What funders want you to do with your dataLeon Osinski
 
White Paper: Hadoop in Life Sciences — An Introduction
White Paper: Hadoop in Life Sciences — An Introduction   White Paper: Hadoop in Life Sciences — An Introduction
White Paper: Hadoop in Life Sciences — An Introduction EMC
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 

Ähnlich wie Robinson bosc2010 bio_hdf (20)

Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012 Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012
 
Hadoop.powerpoint.pptx
Hadoop.powerpoint.pptxHadoop.powerpoint.pptx
Hadoop.powerpoint.pptx
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
University of Minho Data Repository - features to publish & share data and w...
University of Minho Data Repository - features to publish & share data and  w...University of Minho Data Repository - features to publish & share data and  w...
University of Minho Data Repository - features to publish & share data and w...
 
An On-line Collaborative Data Management System
An On-line Collaborative Data Management SystemAn On-line Collaborative Data Management System
An On-line Collaborative Data Management System
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
GBIF: An infrastructure for infrastructures
GBIF: An infrastructure for infrastructures GBIF: An infrastructure for infrastructures
GBIF: An infrastructure for infrastructures
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Brain Imaging Data Structure and Center for Reproducible Neuroscince
Brain Imaging Data Structure and Center for Reproducible NeuroscinceBrain Imaging Data Structure and Center for Reproducible Neuroscince
Brain Imaging Data Structure and Center for Reproducible Neuroscince
 
Good (enough) research data management practices
Good (enough) research data management practicesGood (enough) research data management practices
Good (enough) research data management practices
 
Elsevier1 vc
Elsevier1 vcElsevier1 vc
Elsevier1 vc
 
Accessing Digital Collections Data Sources for Research: A Tour of iDigBio Da...
Accessing Digital Collections Data Sources for Research: A Tour of iDigBio Da...Accessing Digital Collections Data Sources for Research: A Tour of iDigBio Da...
Accessing Digital Collections Data Sources for Research: A Tour of iDigBio Da...
 
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic DatasetsDiversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx
 
big data
big databig data
big data
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
What funders want you to do with your data
What funders want you to do with your dataWhat funders want you to do with your data
What funders want you to do with your data
 
White Paper: Hadoop in Life Sciences — An Introduction
White Paper: Hadoop in Life Sciences — An Introduction   White Paper: Hadoop in Life Sciences — An Introduction
White Paper: Hadoop in Life Sciences — An Introduction
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Setting up a data repository, what does it entail?
Setting up a data repository, what does it entail?Setting up a data repository, what does it entail?
Setting up a data repository, what does it entail?
 

Mehr von BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsBOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesBOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 embossBOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evokerBOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorBOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisBOSC 2010
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorBOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfBOSC 2010
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsBOSC 2010
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perlBOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopythonBOSC 2010
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaBOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytowebBOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloBOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptxBOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiBOSC 2010
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitBOSC 2010
 

Mehr von BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 

Kürzlich hochgeladen

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Robinson bosc2010 bio_hdf

  • 1. The HDF Group BioHDF Open Binary File Formats for Next-Generation Sequencing Data Current Status and Future Directions Dana Robinson The HDF Group derobins@hdfgroup.org Copyright © 2010 The HDF Group. All Rights Reserved July 9, 2010 1 www.hdfgroup.org
  • 2. NGS Data Challenges Very large quantities of data (100s of GB) "Drinking from the firehose" Analysis methods vary greatly, so a flexible yet unified data store would be useful. July 9, 2010 2 www.hdfgroup.org
  • 3. What is Needed A Data Model A data model which accurately describes the data and can be expanded to contain new types of data A Data Store A file format or data store which is efficient in access time and storage size and which scales well A Toolkit A flexible software toolkit that can be used to create tools and pipelines based on the data model and file format July 9, 2010 3 www.hdfgroup.org
  • 4. What is BioHDF? An open-source, community-driven project, funded by an NIH SBIR grant and led by Geospiza, Inc. in collaboration with The HDF Group. BioHDF is a particular arrangement of objects in an HDF5 file (similar to a database schema) BioHDF is a library and C API which can be used to write applications (coming soon) BioHDF is a set of command line tools for storing, retrieving and manipulating data in BioHDF files July 9, 2010 4 www.hdfgroup.org
  • 5. HDF = Hierarchical Data Format An example of how data is stored in HDF5 somefile.h5 datasets / Reads/ Alignments/ is_sorted groups References attributes July 9, 2010 5 www.hdfgroup.org
  • 6. Benefits of BioHDF • Portability and data sharing: Platform independent, endian independent, self describing, common data models. • High performance: Fast random access and efficient, scalable, petabyte level compressed storage. • Widespread adoption: MATLAB, IDL, NASA-Earth Observing System, Pacific Biosciences, SOLiD, 100's of products. • 20 year history: Robust, performance tuned, and well supported by The HDF Group, an independent non-profit entity. July 9, 2010 6 www.hdfgroup.org
  • 7. HDF in Bioinformatics • Baylor Imaging Group • Life Technologies • Pacific Biosciences • Oxford Nanopore • GenomeData (UW) • Geospiza • Others July 9, 2010 www.hdfgroup.org
  • 8. Data Stored The prototype BioHDF stores Reads Alignments Annotations Clusters of Aligned Reads Reference Sequences Indexes (NCList or simple) July 9, 2010 8 www.hdfgroup.org
  • 9. Data Stored Additional user-specific data can be stored without breaking the library or tools. Similar to how BioHDF adding additional Data tables to a database schema does not invalidate existing queries. User-Specific Data July 9, 2010 9 www.hdfgroup.org
  • 10. Project Stages A "pipeline prototype " set of tools to demonstrate the suitability of HDF5 for NGS data storage. An version 1.0 release of a BioHDF library and C API targeting the functionality of samtools. A higher-level C API that abstracts out and hides the underlying storage technology. July 9, 2010 10 www.hdfgroup.org
  • 11. HDF5 API and Applications BioHDF Applications and Wrappers (e.g. Perl, Python) High-Level API BioHDF API HDF5 API Physical Storage July 9, 2010 11 www.hdfgroup.org
  • 12. A Higher-Level API A high-level API will encapsulate and hide the underlying storage technology. low-level C APIs samtools BioHDF API high-level tool C API BAM wrapper API July 9, 2010 12 www.hdfgroup.org
  • 13. Acknowledgements Geospiza Todd Smith Mark Welsh The HDF Group Mike Folk BioHDF is supported by NIH SBIR Phase II grant HG003792 awarded to Geospiza, Inc. July 9, 2010 13 www.hdfgroup.org
  • 14. The HDF Group Thank you for your time! If you are interested in using or contributing to BioHDF, please contact us! Dana Robinson (derobins@hdfgroup.org) http://www.biohdf.org BOSC BoF: Friday 5:10-6:00 ISMB Poster J18: Monday, July 12: 12:40-2:30 ISMB BoF: Tuesday, July 13 1-2 pm, room 306 July 9, 2010 14 www.hdfgroup.org

Hinweis der Redaktion

  1. My goal here is to show people how data is stored in HDF5 (groups, datasets, attributes), not to speak about NGS data storage in BioHDF. I get the impression that people have little understanding of what HDF5 is so I'd like to give them a bare-bones overview.
  2. The reason people will be discouraged from using the HDF5 API directly is that would encourage them to meddle with low-level data elements that can change. This would make their software more brittle.
  3. A first implementation of this will probably be at the linker level (e.g. samtools-biohdf and samtools-bam). Further down the road, we might implement a plugin architecture to handle this.
  4. A first implementation of this will probably be at the linker level (e.g. samtools-biohdf and samtools-bam). Further down the road, we might implement a plugin architecture to handle this.