SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Statistical and Visualization Methods for Metagenomic Analysis
Héctor Corrada Bravo
Center for Bioinformatics and Computational Biology
• metagenomeSeq
– 16S differential abundance
– R/Bioconductor infrastructure for
metagenomic assays
– Longitudinal data
• metagenomicFeatures
– Incipient attempt regularizing 16S feature
annotations in R/Bioconductor
– E.g., greengenes13.5MgDb
• msd16s
– Example data, as infrastructure object
R/Bioconductor Strengths
• Infrastructure objects
– Interoperability, speed up startup time for method development
• Strict development practices
– Documentation, use cases, vignettes
• Annotation infrastructure
– Again, interoperability across experiments and data types
• Exploratory analysis
• Reproducibility
– Vignettes, Rmarkdown, etc.
• Recently, exploratory and interactive visualization
– Shiny, epiviz
Integrative, visual and computational
exploratory analysis of genomic data
• Browser-based
• Interactive
• Integration of data
• Reproducible dissemination
• Communication with R/Bioconductor: epivizr package
software systems to support creative exploratory analysis of large genome-wide datasets...
• Computed Measurements: create new measurements from
integrated measurements and visualize
• Summarization: summarize integrated measurements
(computed on data subsets)
Dynamically extensible: Easily integrate new data sources, data
types and add new visualizations.
Data providers define coordinate
space
One interpretation of Big Data is many sources of relevant
contextual data
• Easily access/integrate contextual data
• Driven by exploratory analysis of immediate
data
• Iterative process
• Visual and computational exploration go
hand in hand
Visualization design goals
Context
• Integrate and align multiple data sources;
navigate; search
• Connect: brushing
• Encode: map visualization properties to
data on the fly
• Reconfigure: multiple views of the same
data
Visualization design goals
Data
• Select and filter: tight-knit integration with
R/Bioconductor
• (current work) filters on visualization
propagate to data environment
Model
• New 'measurements' the result of
modeling; suggested by data context
Metagenomic Visualization
• How to effectively navigate large datasets
where features are organized hierarchically?
• Metaviz: browser-based, interactive
exploratory analysis of metagenomic
data
• Connection to R/Bioconductor with
metavizr package
• Built on metagenomeSeq and
metagenomeFeatures infrastructure
Metaviz
• Exploration of hierarchically organized
features
• Geared towards 16S for now
– Hierarchical organization relevant to WGS
• Integration is a big part of design
– Framework designed for data integration
Acknowledgements
Brianna Lindsey, O. Colin Stine, Owen White, Anup Mahurkar: University of Maryland Baltimore
Jim Nataro: University of Virginia
NIGMS, Genentech
Florin Chelaru
(now @ MIT)
Joseph Paulson
(now @ Harvard)
Mihai Pop
(@ UMD)
Hmp 201512

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (15)

Semantic mediawiki
Semantic mediawikiSemantic mediawiki
Semantic mediawiki
 
Context-free data analysis with Transcendental Information Cascades.
Context-free data analysis with Transcendental Information Cascades.Context-free data analysis with Transcendental Information Cascades.
Context-free data analysis with Transcendental Information Cascades.
 
Linked Data media experiment
Linked Data media experimentLinked Data media experiment
Linked Data media experiment
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
20170621_System requirements of data journal platform
20170621_System requirements of data journal platform20170621_System requirements of data journal platform
20170621_System requirements of data journal platform
 
Dacena
DacenaDacena
Dacena
 
2013 04 g8opendata-ag_infra
2013 04 g8opendata-ag_infra2013 04 g8opendata-ag_infra
2013 04 g8opendata-ag_infra
 
Are you talking to me? Researching a scenario for linking objects and publica...
Are you talking to me? Researching a scenario for linking objects and publica...Are you talking to me? Researching a scenario for linking objects and publica...
Are you talking to me? Researching a scenario for linking objects and publica...
 
RDAP 15: “This is just for me”: Researchers on their data documentation pract...
RDAP 15: “This is just for me”: Researchers on their data documentation pract...RDAP 15: “This is just for me”: Researchers on their data documentation pract...
RDAP 15: “This is just for me”: Researchers on their data documentation pract...
 
Linked data representation
Linked data representationLinked data representation
Linked data representation
 
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
 
Towards a comprehensive call ontology for research 2.0
Towards a comprehensive call ontology for research 2.0Towards a comprehensive call ontology for research 2.0
Towards a comprehensive call ontology for research 2.0
 
36. data mining techniques
36. data mining techniques36. data mining techniques
36. data mining techniques
 
Integrating repositories and eLab notebooks through an open science framework
Integrating repositories and eLab notebooks through an open science frameworkIntegrating repositories and eLab notebooks through an open science framework
Integrating repositories and eLab notebooks through an open science framework
 
American Archive of Public Broadcasting: Preservation and Content Continuity
American Archive of Public Broadcasting: Preservation and Content ContinuityAmerican Archive of Public Broadcasting: Preservation and Content Continuity
American Archive of Public Broadcasting: Preservation and Content Continuity
 

Andere mochten auch

Jung y platon. influencias de platon en jung
Jung  y platon. influencias de platon en jungJung  y platon. influencias de platon en jung
Jung y platon. influencias de platon en jung
Hemil Mora
 
New microsoft power point presentation
New microsoft power point presentationNew microsoft power point presentation
New microsoft power point presentation
mwincott
 
Revision paper en Emerald
Revision paper en EmeraldRevision paper en Emerald
Revision paper en Emerald
atrivinho
 
Seguridad informática: virus y otros daños para nuestro PC
Seguridad informática: virus y otros daños para nuestro PCSeguridad informática: virus y otros daños para nuestro PC
Seguridad informática: virus y otros daños para nuestro PC
yireni
 

Andere mochten auch (18)

Indicadores de DESC, su producción y uso
Indicadores de DESC, su producción y usoIndicadores de DESC, su producción y uso
Indicadores de DESC, su producción y uso
 
EL-NAKHEIL OIL SHALE: A PROMISING RESOURCE OF UNCONVENTIONAL RAW MATERIAL FOR...
EL-NAKHEIL OIL SHALE: A PROMISING RESOURCE OF UNCONVENTIONAL RAW MATERIAL FOR...EL-NAKHEIL OIL SHALE: A PROMISING RESOURCE OF UNCONVENTIONAL RAW MATERIAL FOR...
EL-NAKHEIL OIL SHALE: A PROMISING RESOURCE OF UNCONVENTIONAL RAW MATERIAL FOR...
 
Jung y platon. influencias de platon en jung
Jung  y platon. influencias de platon en jungJung  y platon. influencias de platon en jung
Jung y platon. influencias de platon en jung
 
Kernel
KernelKernel
Kernel
 
141112pdfrazoneshuelga2 octavilla
141112pdfrazoneshuelga2 octavilla141112pdfrazoneshuelga2 octavilla
141112pdfrazoneshuelga2 octavilla
 
Archivos
ArchivosArchivos
Archivos
 
Informe ejecutivi fase1_wilson_pinto
Informe ejecutivi fase1_wilson_pintoInforme ejecutivi fase1_wilson_pinto
Informe ejecutivi fase1_wilson_pinto
 
Datos curiosos
Datos curiososDatos curiosos
Datos curiosos
 
Class 8 Cbse Chemistry Sample Paper Term 2 Model 2
Class 8 Cbse Chemistry Sample Paper Term 2 Model 2Class 8 Cbse Chemistry Sample Paper Term 2 Model 2
Class 8 Cbse Chemistry Sample Paper Term 2 Model 2
 
Datos y probabilidades
Datos y probabilidadesDatos y probabilidades
Datos y probabilidades
 
G616926US201S_sp
G616926US201S_spG616926US201S_sp
G616926US201S_sp
 
New microsoft power point presentation
New microsoft power point presentationNew microsoft power point presentation
New microsoft power point presentation
 
Mohamed El Nady C.V
Mohamed El Nady C.VMohamed El Nady C.V
Mohamed El Nady C.V
 
Pasos para formatear una usb
Pasos para formatear una usbPasos para formatear una usb
Pasos para formatear una usb
 
Arte Contemporáneo
Arte ContemporáneoArte Contemporáneo
Arte Contemporáneo
 
Revision paper en Emerald
Revision paper en EmeraldRevision paper en Emerald
Revision paper en Emerald
 
Kernel
KernelKernel
Kernel
 
Seguridad informática: virus y otros daños para nuestro PC
Seguridad informática: virus y otros daños para nuestro PCSeguridad informática: virus y otros daños para nuestro PC
Seguridad informática: virus y otros daños para nuestro PC
 

Ähnlich wie Hmp 201512

Making project data avalialble eNanomapper through Database
Making project data avalialble eNanomapper through  DatabaseMaking project data avalialble eNanomapper through  Database
Making project data avalialble eNanomapper through Database
Nina Jeliazkova
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
SEAD
 
AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016
Manjula Ambur
 

Ähnlich wie Hmp 201512 (20)

Linked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and LuzzuLinked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and Luzzu
 
BlueBrain Nexus Technical Introduction
BlueBrain Nexus Technical IntroductionBlueBrain Nexus Technical Introduction
BlueBrain Nexus Technical Introduction
 
A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...
 
UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningUNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data Mining
 
Cytoscape Network Visualization and Analysis
Cytoscape Network Visualization and AnalysisCytoscape Network Visualization and Analysis
Cytoscape Network Visualization and Analysis
 
Making project data avalialble eNanomapper through Database
Making project data avalialble eNanomapper through  DatabaseMaking project data avalialble eNanomapper through  Database
Making project data avalialble eNanomapper through Database
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Adelaide Rhodes Resume March 2023
Adelaide Rhodes Resume March 2023Adelaide Rhodes Resume March 2023
Adelaide Rhodes Resume March 2023
 
Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
 
COBWEB - infrastructure and platform for Environmental Crowd Sensing and Big ...
COBWEB - infrastructure and platform for Environmental Crowd Sensing and Big ...COBWEB - infrastructure and platform for Environmental Crowd Sensing and Big ...
COBWEB - infrastructure and platform for Environmental Crowd Sensing and Big ...
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 

Kürzlich hochgeladen

The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 

Kürzlich hochgeladen (20)

Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

Hmp 201512

  • 1. Statistical and Visualization Methods for Metagenomic Analysis Héctor Corrada Bravo Center for Bioinformatics and Computational Biology
  • 2. • metagenomeSeq – 16S differential abundance – R/Bioconductor infrastructure for metagenomic assays – Longitudinal data • metagenomicFeatures – Incipient attempt regularizing 16S feature annotations in R/Bioconductor – E.g., greengenes13.5MgDb • msd16s – Example data, as infrastructure object
  • 3. R/Bioconductor Strengths • Infrastructure objects – Interoperability, speed up startup time for method development • Strict development practices – Documentation, use cases, vignettes • Annotation infrastructure – Again, interoperability across experiments and data types • Exploratory analysis • Reproducibility – Vignettes, Rmarkdown, etc. • Recently, exploratory and interactive visualization – Shiny, epiviz
  • 4. Integrative, visual and computational exploratory analysis of genomic data • Browser-based • Interactive • Integration of data • Reproducible dissemination • Communication with R/Bioconductor: epivizr package software systems to support creative exploratory analysis of large genome-wide datasets...
  • 5. • Computed Measurements: create new measurements from integrated measurements and visualize
  • 6. • Summarization: summarize integrated measurements (computed on data subsets)
  • 7. Dynamically extensible: Easily integrate new data sources, data types and add new visualizations. Data providers define coordinate space
  • 8. One interpretation of Big Data is many sources of relevant contextual data • Easily access/integrate contextual data • Driven by exploratory analysis of immediate data • Iterative process • Visual and computational exploration go hand in hand
  • 9. Visualization design goals Context • Integrate and align multiple data sources; navigate; search • Connect: brushing • Encode: map visualization properties to data on the fly • Reconfigure: multiple views of the same data
  • 10. Visualization design goals Data • Select and filter: tight-knit integration with R/Bioconductor • (current work) filters on visualization propagate to data environment Model • New 'measurements' the result of modeling; suggested by data context
  • 11. Metagenomic Visualization • How to effectively navigate large datasets where features are organized hierarchically? • Metaviz: browser-based, interactive exploratory analysis of metagenomic data • Connection to R/Bioconductor with metavizr package • Built on metagenomeSeq and metagenomeFeatures infrastructure
  • 12.
  • 13.
  • 14.
  • 15. Metaviz • Exploration of hierarchically organized features • Geared towards 16S for now – Hierarchical organization relevant to WGS • Integration is a big part of design – Framework designed for data integration
  • 16. Acknowledgements Brianna Lindsey, O. Colin Stine, Owen White, Anup Mahurkar: University of Maryland Baltimore Jim Nataro: University of Virginia NIGMS, Genentech Florin Chelaru (now @ MIT) Joseph Paulson (now @ Harvard) Mihai Pop (@ UMD)