This document describes the design and implementation of an integrated system called MPDB for the storage and analysis of metabolomics data. MPDB was created as a free open-source laboratory information system tailored for the metabolomics workflow. It includes tools for raw data cleanup, compound identification, peak alignment across samples, data normalization, and statistical analysis. The system pipeline allows users to efficiently store large amounts of analytical results and associated biological metadata, perform multi-sample analysis and data mining, and gain new biological insights from metabolomics experiments. As an example application, the document outlines a study analyzing the effects of nitrogen stress on the leaf metabolism of Populus trees using MPDB.
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
MPDB Presentation
1. MPDB - Integrated system for storage
and analysis of metabolomic data
Design and implementation of the
data acquisition and analysis
pipeline
Alexander Raskind, SFRES MTU
4. Populus as model system
• Wide ecological range
• Small genome relative to other trees
• Relatively easy transformation and cloning
• Belongs to Salicaceae – Willow family,
produces large amount of phenolic
compounds that may influence carbon
sequestration
5. Project rationale
• Affordable equipment generates limited
amount of metabolomic data with modest
quality
• Proper information storage and maximal
extraction of useful information are essential
• Free open source laboratory information
system tailored to metabolomics workflow
would benefit to a large scientific community
6. System requirements
• Easy access to large arrays of analytical
results and biological metadata
• Tools for data analysis
• Addition of analysis modules
• Accommodation of other types of analytical
data
• USER FRIENDLY
8. Major analytical problems
• Chemical complexity of the sample
o human metabolome - 2500 metabolites, plants – much more
• Wide dynamic range of response
o difference between most and least abundant components may be more
than 10,000
• Biological variation
• Matrix effects
o Interactions between sample componets leading to shifts in retention time
and sensitivity of detection comparative to pure compounds
• Instrument effects
o Shifting retention time (column wearing out and maintenance)
o Changes in sensitivity
9. Data analysis pipeline
• Raw data cleanup, peak detection,
deconvolution and quantification
• Compound identification (library search)
• Export of analysis results and biological
metadata to the database
• Peak alignment and normalization
• Final data analysis
22. Data assessment and analysis
• Data for individual compound groups
• Data for individual samples and compounds
• Principal component analysis
• Clustering of samples and compounds
• Graphical maps of compound ratios
30. Sample analysis – effects of nitrogen
stress on the Populus leaf metabolism
• Plants grown hydroponically
• N-stress for 8 weeks
• Samples taken from leaves at different
developmental stages (lamina and mid-vien)
• Metabolites fractionated by SPE
• Hydrophylic fractions additionally analyzed at 1:20
dilution
• Fractions were also subjected to glucosidase
hydrolysis and LPE
• 3-5 biological and 1-2 technical replicas
31. Leaf hydrophilic fraction
• Up-regulated by N-stress:
o Galacturonic acid (X7), D-Arabinonate,
o Turanose, Syringin
o Ribose(?), methyl-Galactoside, 3-Hydroxy-3-
methylglutaric acid (HMGA), D-(-)-3-
Phosphoglyceric acid
32. Leaf hydrophilic fraction
• Down-regulated by N-stress:
o Most of free aminoacids and polyamines below
detection level or strongly reduced. Also some
sugars and polyols, but not clearly identified)
o Small organic acids (fumaric, succinic, threonic,
citric, malic, oxaloacetic)
o Sugar phosphates (glucose, fructose)
o Xylose, melibiose, cellobiose