4. Institute for Computing in Science (ICiS)2010 Summer Session, Snowbird, Utah July 17-24|Computational Methods and Terabase Metagenomics | J. Gilbert, F. Meyer, R. Stevens Participants: 13 University, 9 Government and 3 Industry; 13 sessions These discussions became the first meeting of the Earth Microbiome Project and enabled the definition of a working committee, an implementation group, and a three-year plan. July 24-31| Future of the Field| F. Streitz and A. White Participants: 18 University, 4 Government and 6 Industry; 15 sessions Steering committee members and a select group of participants met to assess the state-of-the-art in scientific computing and identified areas for future programs. July 31-Aug. 7 |Optimization in Energy Systems | M. Anitescu and J. Meza Participants: 16 University, 10 Government and 3 Industry; 24 sessions Researchers from different areas discussed the major challenges facing the energy sector, and in particular, problems arising in optimization. Aug. 7-14 |Integrating, Representing, and Reasoning over Human Knowledge| J. Evans, I. Foster, A. Rzhetsky Participants: 18 University, 4 Government and 6 Industry; 16 sessions Participants were encouraged to think broadly about opportunities for transformative changes in knowledge that may become possible as data, computing, and collaboration are harnessed at exceptionally large scales.
5. A Core Group Emerged Jack Gilbert Folker Meyer Rob Knight Jonathan Eisen Jed Fuhrman Janet Jansson Bin Hu Mark Bailey Rick Stevens
6. We need a new Idea Sequencing is getting cheap.. VERY cheap Terabase project becoming increasingly feasible Diversity studies are limited by sampling depth Need combination of breadth and depth Computing is scaling up to handle large data Supercomputing capabilities will keep scaling for a while Interest in range of metagenomics questions Thousands of uncoordinated studies Crowdsourcing of samples increasingly feasible But how to agree on protocols
7. EMP High-Level Concept Goal: A community approach to systematically approach the problem of characterizing microbial life on earth Strategy: combination of extremely deep metagenomics sequencing and very-large scale horizontal surveys to refine our understanding of: Global microbial diversity, dispersion and biogeography Microbial community structure and dynamics Microbial contributions to the global nutrient cycles
8. Big Science? Earth Microbiome Project Map % fraction of microbiological habitats Volume > 100x larger > 1 PB of data ~1M samples > 100K new genomes Millions of novel proteins Largest reference collection of metagenomics, field guide to the microbial universe used by scientists for decades to come Sloan Digital Sky Mapped ¼ sky Volume 100x larger 15 TB data Position/Brightness of > 100M objects Distance to 100K quasars New types of objects The SDSS will be a new reference point, a field guide to the universe that will be used by scientists for decades to come.
9. But its not a Complete Parallel EMP will have distributed sampling EMP will have distributed sequencing EMP will have distributed analysis EMP will have common protocols EMP will have common standards EMP might have centralized archive of data EMP might have repository of samples
10. What is the EMP model? A framework of standard practices that enables massively comparable meta-analyses of independent projects An network oriented organizational model to advance Large-scale Microbial Ecology research – establishing and coordinating projects proposed by the community which can be advanced using the EMP framework of standards and access to partner Centers
13. Common standards for: Sampling -> Methods tailored to environment Georeferenced metadata DNA Extraction -> MoBio kit Sequencing -> 515/806 for 16S, Illumina PE Analysis -> QIIME (16S), MG-RAST/IMG, etc. Concept: begin with defined, open (though imperfect) protocols, bless with “EMP seal of approval” new protocols that show equivalence
14. Why do we need the EMP? Microbial life is vast 1030 organisms on Earth 106 – 109 or more species, massive gene/protein diversity Requires a systematic approach with a common framework Reduce duplication, maximize coverage, improve comparability between studies Structures existing studies led by different PI’s into clusters of Driving Projects EMP standard protocols allow much better comparability between projects Leverage community structures and crowd sourcing
15. EMP Pilot Projects High-Impact science targets Large-scale survey projects to identify diversity hotspots and plan deeper studies Small number of very deep demonstrations Hypothesis driven programmatic problems Technical targets to debug the EMP approach Community sourcing with standard protocols High-levels of multiplexed sequencing Environmental parameter characterization Metadata and sample database Analysis pipelines
16. Earth Microbiome Project: Attacking Basic Science Questions Coordination of community efforts to address long standing issues in environmental microbiology How much diversity is there, what is driving it and where do we find it? Are there diversity hotspots? Does microbial biogeography exist, if so what patterns are present and can we predict the patterns? Are some taxa endemic and if so how unique are they? Does global dispersal happen, how much and between where and is there support for Baas Becking hypothesis? Are the long tails of community distributions covergent in taxa? Are rare taxa somewhere abundant? How many places do we have to look to capture X diversity? How do the patterns in microbial communities relate to macro ecological patterns?
17. Curtis and Sloan on Microbial Diversity Perhaps patterns in global microbial diversity affect community composition, stability and functionality at a local level. If, as we argue, diversity matters, then patterns in global diversity could have a substantial effect on studies that seek to link community function and structure, strategies for seeking new drugs, for probiotics, bioaugmentationor studies to determine the persistence of chemicals. Curtis and Sloan, Current Opinion in Microbiology 2004, 7:221-226
18. Curtis and Sloan Continued, To understand a microbial system at a local level we will have to understand something of the metacommunity from which it is drawn. Moreover, we will have to correctly understand the relationship between random factors and deterministic factors. Curtis and Sloan, Current Opinion in Microbiology 2004, 7:221-226
19. What can we learn from extremely Deep Sequencing? Latitude, Ph, Mineral Content, Rainfall, Mean Temperature, Insolation, etc.
20. Estimates of Global Diversity NT/Nmax ~ 10 for soil NT/Nmax ~ 4 for aquatic Curtis, T.P. et al. (2002) Estimating prokaryotic diversity and its limits. Proc. Natl. Acad. Sci. U. S. A. 99, 10494–10499
21. Pedros-Alio 2006 Are Most microbial taxa rare? Possibly Inactive?
22. Does a microbial biogeography exist? If yes can we map it? From Martiny et al 2006 “Microbial biogeography Review”
23. How Cosmopolitan are Mircrobes? From Martiny et al 2006 “Microbial biogeography Review”
24. Earth Microbiome Project: Attacking Programmatic Questions Improve understanding of microbial processes underlying the global carbon and nitrogen cycles Support process models development and uncertainty analysis for DOE mission critical environments (e.g. permafrost, oceans, subsurface) Discovery of novel microbial medicated global carbon pathways Improve our understanding of community structure/diversity/productivity/stability relationships Support community engineering and community design for targeting applications Search for novel biological functions relevant to bioprocessing, biofuels and bioremediation Targeting searching for organisms and communities containing DOE relevant to synthesis and degradation pathways Novel pathway discovery
25. The abundance of prokaryotic carbon and other elements may be compared with the statement of Kluyverthat about one-half of the ‘‘living protoplasm’’ on earth is microbial (2). Because most of the plant biomass is made up of extracellular material such as cell walls and structural polymers, the protoplasmic biomass of prokaryotes probably far exceeds that of plants, and Kluyver’s well-accepted estimate is probably much too conservative.
28. RMF returns a list of metabolites and whether those metabolites are more or less likely to be consumed or synthesized in one environment relative to another.
31. EMP at the Right Time Leverages the availability of continued advances in sequencing capacity Terabases to Petabases and beyond Evolution of sequencing center Models Push towards aggregation of projects (i.e. scale up) Community driven but coordinated Open, Real-time coordination, immediate data availability Novel approaches to address the scaling issues in sample collection and prep Crowd sourced samples, distributed prep? Targets both wide survey and deep sampling “Mapping” followed by targeted attacks
32. EMP Products and Deliverables Metagenomics datasets from many thousands of environments with standardized metadata Georeferenced inventory of global microbial 16s sequences Reference genomes recovered from the shotgun metagenomics datasets Community structure profiles for many thousands of communities Microbial protein catalog capturing global protein and gene diversity
36. Temperature – Antarctica, Brazil, North America, Arctic Tundra, Hydrothermal vents
37. Light availability – Water columns in the Pacific and Atlantic from surface to the abyssal plain.
38. pH – UK, North America, China biogeographic soils
39.
40. Projects request deep 16S rRNA sequencing of representative samples:
41. Globally distributed soil samples from China, Australia, India, Argentina, Peru, USA and Antarctica
42. Globally distributed time series samples from English Channel, Barrier Reef in Australia, Bermudan North Atlantic, Temperate Pacific and Tropical Pacific
43.
44. Projects using deep shotgun metagenomics to explore modeled metabolomics:
45. Temporal and spatial distributed samples from the gulf oil spill
46. Samples spanning the northern tundra belt from Canada, USA, Russia, Sweden
47.
48.
49. At the bottom individual or consortium led hypothesis driven proposals
52. Earth Microbiome Project Potential Dataflows Annotation & Statistical Analysis 16S/18S rRNA Metagenomics Metatranscriptomics Genome Assembly modelSEED & RMF Environmental Parameters Metagenome Datasets (1,000’s of Campaigns) Provision of targets for novel enzymes Model Metabolome Characterization of Novel Proteins Metametabolomics GC/MS & NMR Gap-filling for model
53. EMP needs new kinds of interfaces toSequencingworkflows Large-scale community projects will by necessity develop internal tracking systems Sampling, LIMS etc. Transacting with Seq Centers could be enhanced by interfacing between the internal/external tracking and LIMS systems Large-scaleEMPpilots could help develop this Services partners will also need this type of interfaces
54. What would change this strategy? Availability of “direct” interrogation of complex microbial environments Geochemical environmental mapping (nm->um) Environmental metabolomics and proteomics Roving cellular scale reporters and probes Dramatic improvements in microbial microcosm experimental capabilities Artificial community construction Time dependent high-resolution measurements
55. Phases of EMP Timeline 2011 Expert-Group consensus on EMP standards: sampling, extraction, sequencing, informatics Building the Global Environmental Sample Database (GESD) Pilot Project: 10,000 samples acquired, extracted, sequenced and analyzed by five core centers (ANL, LBNL, UC-Boulder, JGI, and BGI). 2012 and beyond- Ongoing EMP: Biological Driver Projects “collect” individual science driven sequencing proposals (e.g. JGI-CSP, BGI, ANL, etc.) EMP acts as a conceptual framework to allow comparative analysis within and between Driver Projects.
56. Thanks to the EMP Leadership Jack Gilbert Folker Meyer Rob Knight Jonathan Eisen Jed Fuhrman Janet Jansson Bin Hu Mark Bailey