SlideShare ist ein Scribd-Unternehmen logo
1 von 33
MPDB - Integrated system for storage
and analysis of metabolomic data
Design and implementation of the
data acquisition and analysis
pipeline
Alexander Raskind, SFRES MTU
Omics data availability
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
Transcriptomics data:
ArrayExpress – 3670 experiments,
109666 hybridizations
http://www.ebi.ac.uk/microarray-as/aer/
Proteomics data:
PRIDE – 3,537 Experiments
645,869 Identified Proteins
http://www.ebi.ac.uk/microarray-as/aer/
Metabolomics data:
MMCD – 20.306 compounds
http://mmcd.nmrfam.wisc.edu/
Human Metabolome Database –
2500 compounds
http://www.hmdb.ca/
Shifting research paradigm
genome.uiowa.edu
http://www.shimadzu.com
Targeted analysis High-throughput analysis
Populus as model system
• Wide ecological range
• Small genome relative to other trees
• Relatively easy transformation and cloning
• Belongs to Salicaceae – Willow family,
produces large amount of phenolic
compounds that may influence carbon
sequestration
Project rationale
• Affordable equipment generates limited
amount of metabolomic data with modest
quality
• Proper information storage and maximal
extraction of useful information are essential
• Free open source laboratory information
system tailored to metabolomics workflow
would benefit to a large scientific community
System requirements
• Easy access to large arrays of analytical
results and biological metadata
• Tools for data analysis
• Addition of analysis modules
• Accommodation of other types of analytical
data
• USER FRIENDLY
Analysis workflow
Major analytical problems
• Chemical complexity of the sample
o human metabolome - 2500 metabolites, plants – much more
• Wide dynamic range of response
o difference between most and least abundant components may be more
than 10,000
• Biological variation
• Matrix effects
o Interactions between sample componets leading to shifts in retention time
and sensitivity of detection comparative to pure compounds
• Instrument effects
o Shifting retention time (column wearing out and maintenance)
o Changes in sensitivity
Data analysis pipeline
• Raw data cleanup, peak detection,
deconvolution and quantification
• Compound identification (library search)
• Export of analysis results and biological
metadata to the database
• Peak alignment and normalization
• Final data analysis
System Outline
Analyzer-Pro
Result (XML format)
MP-align
GC/MS or LC/MS raw data
MPDB
Offline
Online
Data analysis
Biological
information
Compound identification
• NIST 2002 database for GCMS (MS only,
~140,000 entries)
• In-house database of essential metabolites
(MS and retention time, ~200 entries)
Why we need alignment
Single batch Multiple batches
Spectra similarity
Alignment algorithm
Peak
list
RI
MS
Grou
p
Consist
ency
Aligned
groups
Signal normalization
Raw data Normalized to TIC
User interface - tasks
• Data entry
• New analysis
• Review analysis
• Quality control
• Help
Data set definition
Sample groups review and annotation
Alignment results
Data export
Data sorting and filtering
Data assessment and analysis
• Data for individual compound groups
• Data for individual samples and compounds
• Principal component analysis
• Clustering of samples and compounds
• Graphical maps of compound ratios
Individual compound group data
Mass spectral data for the group
Individual sample and peak details
PCA
Clustering
Compound ratios
Quality control
Sample analysis – effects of nitrogen
stress on the Populus leaf metabolism
• Plants grown hydroponically
• N-stress for 8 weeks
• Samples taken from leaves at different
developmental stages (lamina and mid-vien)
• Metabolites fractionated by SPE
• Hydrophylic fractions additionally analyzed at 1:20
dilution
• Fractions were also subjected to glucosidase
hydrolysis and LPE
• 3-5 biological and 1-2 technical replicas
Leaf hydrophilic fraction
• Up-regulated by N-stress:
o Galacturonic acid (X7), D-Arabinonate,
o Turanose, Syringin
o Ribose(?), methyl-Galactoside, 3-Hydroxy-3-
methylglutaric acid (HMGA), D-(-)-3-
Phosphoglyceric acid
Leaf hydrophilic fraction
• Down-regulated by N-stress:
o Most of free aminoacids and polyamines below
detection level or strongly reduced. Also some
sugars and polyols, but not clearly identified)
o Small organic acids (fumaric, succinic, threonic,
citric, malic, oxaloacetic)
o Sugar phosphates (glucose, fructose)
o Xylose, melibiose, cellobiose
Acknowledgements
• Prof. Scott Harding
• Prof. Chung-Jui Tsai
• Dr. Changyu Hu
• Prof. Meir Edelman (WIS)

Weitere ähnliche Inhalte

Was ist angesagt?

Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDBMetabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDBDinesh Barupal
 
Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in BioinformaticsMeghaj Mallick
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticssarwat bashir
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databasesCharu Sharma
 
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)R.P MAURYA
 
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...ASIS&T
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...marcosmartinezromero
 
Biological data bioinformatics
Biological data bioinformatics Biological data bioinformatics
Biological data bioinformatics AakifahAmreen
 
A guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databasesA guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databasesYannick Pouliot
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introductionDrGopaSarma
 
DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinksRafael C. Jimenez
 

Was ist angesagt? (20)

Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDBMetabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
 
Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in Bioinformatics
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Biological Database
Biological DatabaseBiological Database
Biological Database
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformatics
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databases
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
 
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
 
Biological data bioinformatics
Biological data bioinformatics Biological data bioinformatics
Biological data bioinformatics
 
A guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databasesA guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databases
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinks
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data DashboardsAccessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 

Andere mochten auch

How to create more value from government open data
How to create more value from government open dataHow to create more value from government open data
How to create more value from government open datatheODI
 
Opendata: Visión estratégica y aspectos técnicos
Opendata: Visión estratégica y aspectos técnicosOpendata: Visión estratégica y aspectos técnicos
Opendata: Visión estratégica y aspectos técnicosAntonio Sánchez Zaplana
 
CRtB - Locality Lyn Kesterton
CRtB - Locality Lyn KestertonCRtB - Locality Lyn Kesterton
CRtB - Locality Lyn KestertonHACThousing
 
Hact community led housing - may 2014
Hact community led housing - may 2014Hact community led housing - may 2014
Hact community led housing - may 2014HACThousing
 
Visitas desde smartphones y tablets en webs de turismo y cultura
Visitas desde smartphones y tablets en webs de turismo y culturaVisitas desde smartphones y tablets en webs de turismo y cultura
Visitas desde smartphones y tablets en webs de turismo y culturaLaMagnética
 

Andere mochten auch (6)

Harmful interupts
Harmful interuptsHarmful interupts
Harmful interupts
 
How to create more value from government open data
How to create more value from government open dataHow to create more value from government open data
How to create more value from government open data
 
Opendata: Visión estratégica y aspectos técnicos
Opendata: Visión estratégica y aspectos técnicosOpendata: Visión estratégica y aspectos técnicos
Opendata: Visión estratégica y aspectos técnicos
 
CRtB - Locality Lyn Kesterton
CRtB - Locality Lyn KestertonCRtB - Locality Lyn Kesterton
CRtB - Locality Lyn Kesterton
 
Hact community led housing - may 2014
Hact community led housing - may 2014Hact community led housing - may 2014
Hact community led housing - may 2014
 
Visitas desde smartphones y tablets en webs de turismo y cultura
Visitas desde smartphones y tablets en webs de turismo y culturaVisitas desde smartphones y tablets en webs de turismo y cultura
Visitas desde smartphones y tablets en webs de turismo y cultura
 

Ähnlich wie MPDB Presentation

WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
 
Cpgr services brochure 14 may 2013 - v 16
Cpgr services brochure   14 may 2013 - v 16Cpgr services brochure   14 may 2013 - v 16
Cpgr services brochure 14 may 2013 - v 16Reinhard Hiller
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 
American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013Dmitry Grapov
 
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...geraintduck
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biologyNeil Swainston
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorLevi Waldron
 
Cncp 2010
Cncp 2010Cncp 2010
Cncp 2010ygc
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...ChemAxon
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformaticsJoel Ricci-López
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 

Ähnlich wie MPDB Presentation (20)

WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
Cpgr services brochure 14 may 2013 - v 16
Cpgr services brochure   14 may 2013 - v 16Cpgr services brochure   14 may 2013 - v 16
Cpgr services brochure 14 may 2013 - v 16
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013
 
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/Bioconductor
 
Cheminformatics approaches to support chemical identification delivered via t...
Cheminformatics approaches to support chemical identification delivered via t...Cheminformatics approaches to support chemical identification delivered via t...
Cheminformatics approaches to support chemical identification delivered via t...
 
Cncp 2010
Cncp 2010Cncp 2010
Cncp 2010
 
Consensus ranking and fragmentation prediction for identification of unknowns...
Consensus ranking and fragmentation prediction for identification of unknowns...Consensus ranking and fragmentation prediction for identification of unknowns...
Consensus ranking and fragmentation prediction for identification of unknowns...
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
Qi liu 08.08.2014
Qi liu 08.08.2014Qi liu 08.08.2014
Qi liu 08.08.2014
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformatics
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
Applications of the US EPA’s CompTox chemicals dashboard to support structure...Applications of the US EPA’s CompTox chemicals dashboard to support structure...
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
 
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
 

MPDB Presentation

  • 1. MPDB - Integrated system for storage and analysis of metabolomic data Design and implementation of the data acquisition and analysis pipeline Alexander Raskind, SFRES MTU
  • 2. Omics data availability http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html Transcriptomics data: ArrayExpress – 3670 experiments, 109666 hybridizations http://www.ebi.ac.uk/microarray-as/aer/ Proteomics data: PRIDE – 3,537 Experiments 645,869 Identified Proteins http://www.ebi.ac.uk/microarray-as/aer/ Metabolomics data: MMCD – 20.306 compounds http://mmcd.nmrfam.wisc.edu/ Human Metabolome Database – 2500 compounds http://www.hmdb.ca/
  • 4. Populus as model system • Wide ecological range • Small genome relative to other trees • Relatively easy transformation and cloning • Belongs to Salicaceae – Willow family, produces large amount of phenolic compounds that may influence carbon sequestration
  • 5. Project rationale • Affordable equipment generates limited amount of metabolomic data with modest quality • Proper information storage and maximal extraction of useful information are essential • Free open source laboratory information system tailored to metabolomics workflow would benefit to a large scientific community
  • 6. System requirements • Easy access to large arrays of analytical results and biological metadata • Tools for data analysis • Addition of analysis modules • Accommodation of other types of analytical data • USER FRIENDLY
  • 8. Major analytical problems • Chemical complexity of the sample o human metabolome - 2500 metabolites, plants – much more • Wide dynamic range of response o difference between most and least abundant components may be more than 10,000 • Biological variation • Matrix effects o Interactions between sample componets leading to shifts in retention time and sensitivity of detection comparative to pure compounds • Instrument effects o Shifting retention time (column wearing out and maintenance) o Changes in sensitivity
  • 9. Data analysis pipeline • Raw data cleanup, peak detection, deconvolution and quantification • Compound identification (library search) • Export of analysis results and biological metadata to the database • Peak alignment and normalization • Final data analysis
  • 10. System Outline Analyzer-Pro Result (XML format) MP-align GC/MS or LC/MS raw data MPDB Offline Online Data analysis Biological information
  • 11. Compound identification • NIST 2002 database for GCMS (MS only, ~140,000 entries) • In-house database of essential metabolites (MS and retention time, ~200 entries)
  • 12. Why we need alignment Single batch Multiple batches
  • 15. Signal normalization Raw data Normalized to TIC
  • 16. User interface - tasks • Data entry • New analysis • Review analysis • Quality control • Help
  • 18. Sample groups review and annotation
  • 21. Data sorting and filtering
  • 22. Data assessment and analysis • Data for individual compound groups • Data for individual samples and compounds • Principal component analysis • Clustering of samples and compounds • Graphical maps of compound ratios
  • 24. Mass spectral data for the group
  • 25. Individual sample and peak details
  • 26. PCA
  • 30. Sample analysis – effects of nitrogen stress on the Populus leaf metabolism • Plants grown hydroponically • N-stress for 8 weeks • Samples taken from leaves at different developmental stages (lamina and mid-vien) • Metabolites fractionated by SPE • Hydrophylic fractions additionally analyzed at 1:20 dilution • Fractions were also subjected to glucosidase hydrolysis and LPE • 3-5 biological and 1-2 technical replicas
  • 31. Leaf hydrophilic fraction • Up-regulated by N-stress: o Galacturonic acid (X7), D-Arabinonate, o Turanose, Syringin o Ribose(?), methyl-Galactoside, 3-Hydroxy-3- methylglutaric acid (HMGA), D-(-)-3- Phosphoglyceric acid
  • 32. Leaf hydrophilic fraction • Down-regulated by N-stress: o Most of free aminoacids and polyamines below detection level or strongly reduced. Also some sugars and polyols, but not clearly identified) o Small organic acids (fumaric, succinic, threonic, citric, malic, oxaloacetic) o Sugar phosphates (glucose, fructose) o Xylose, melibiose, cellobiose
  • 33. Acknowledgements • Prof. Scott Harding • Prof. Chung-Jui Tsai • Dr. Changyu Hu • Prof. Meir Edelman (WIS)