1. Metagenomics: a gene-centric approach for the human gut microbiome research Masahira HATTORI Center for Omics and Bioinformatics / Dept. of Computational Biology Graduate School of Frontier Sciences, University of Tokyo http://www.cb.k.u-tokyo.ac.jp/hattorilab ・ Human-associated pathogens and commensals including intestinal microbiota ・ Symbionts in insects ・ Host-microbial interactions The Kashiwa campus
2. The total number of these bacterial cells is estimated to be more than 10 14 , representing 10 times more than the total number of eukaryotic cells that compose a human individual . An enormous number of microorganisms, of which the majority is bacterial species, are known to colonize and form complex communities (called the human microbiota ) at various human body sites The human microbiota Among them, the largest and most complex is the gut microbiota, which is composed of more than 1,000 different intestinal microbes. B o d y S i t e b a c t e r i a / m l o r g r a m # s p e c i e s ? N o s e 1 0 3 - 1 0 4 O r a l 1 0 1 0 t o t a l > 7 0 0 S a l i v a 1 0 8 - 1 0 1 0 G i n g i v a l c r e v i c e 1 0 1 2 T o o t h s u r f a c e 1 0 1 1 G a s t r o i n t e s t i n a l T r a c t 1 0 1 4 t o t a l > 1 0 0 0 S t o m a c h 1 0 0 - 1 0 4 S m a l l i n t e s t i n e s 1 0 4 - 1 0 7 C o l o n ( f e c e s ) 1 0 1 1 - 1 0 1 2 S k i n 1 0 1 2 t o t a l ? S u r f a c e 1 0 5 U r o g e n i t a l 1 0 1 2 t o t a l ? ? V a g i n a 1 0 9 H u m a n c e l l s 1 0 1 3 t o t a l Oral Gastrointestinal Skin Urogenital Nasal
3. * 多彩な代謝機能(ヒトとの共生関係 ) Many metabolic capabilities (mutualism between them and us) * 宿主の腸管上皮細胞の増殖と分化 Proliferation and differentiation of host epithelial cells * 宿主の免疫系の成熟化(恒常性の維持) Development of the host immune system * 感染病原菌の防御 Protection against pathogens * 細菌叢組成はさまざまな疾患の素因となる。 Imbalance of the gut microbiota composition predisposes individuals to a variety of disease states ranging from inflammatory bowel diseases such as Crohn’s disease and ulcerative colitis to allergy, colon cancer, obesity and diabetes. Human gut microbiota possess a strong impact on human physiology
4. The process of metagenomic analysis of the human gut microbiota Microbial DNA (Metagenomic DNA) Shotgun library Fragmentation of DNA to ~ 3kb … GGATCCATCGTACCGATTC… … TTACAATTTACGGCCATCC… … CCATGCGATCGATCGGAAT… … CCATGGCCGAAATTTCGTA… … AGCTAAAATTACCGGGGAT… Shotgun reads (~ 800 bases) Contig Contig Singleton Assembly Non-redundant sequence of microbial DNA Gut microbiota Lysis of microbiota Sequencer Intensive analysis of the sequences by bioinformatics
5. Contigs Contigs Singletons Non-redundat microbial sequences Gene set Classification of COGs to functional categories Replication Novel genes Amino acid metabolism Transcription Carbohydrate metabolism Lipid metabolism Functional profile of microbiome Metagenomics: a gene-centric analysis to explore the biological nature of microbiome, the collective genomes of microbiota Clustering and similarity search (COG assignment) COGs: Clusters of orthologous groups Importantly, the functional profile becomes constant and unique to the community when the sequence amount is beyond the threshold which depends on the complexity of microbial composition.
6. Enriched COGs only in microbiome H Comparative metagenomics between different microbiomes is powerful to identify enriched or depleted genes in an individual microbiome Frequency H High G B C D E F A COG Commonly enriched COGs among all microbiomes Various environmental microbiomes Depleted COGs in microbiome A Low
7. Timeline of sequence-based metagenome projects since 2003 Hugenholtz P and Tyson GW: Nature 455, 481-483 (2008) 3730 dye-terminator shotgun sequencing (black) Fosmid library sequencing (pink) 454 Pyrosequencing (green) 200 projects Sep. 09
8. Subjects 13 healthy Japanese individuals including 7 adults, 2 weaned children and 4 unweaned infants, from 3 months to 45 years old, and 2 unrelated families. Metagenomics of 13 healthy Japanese gut microbiomes Kurokawa K et al. DNA Res. 14 , 169-181 (2007). family family
9. Metagenomics of 13 Japanese gut microbiomes Gut microbiota Bacterial DNA DNA sequencing About 500 Mb assembled unique sequences from about 730 Mb data 660,000 genes found of which 160,000 were novel gene candidates Further analyses Kurokawa K et al. DNA Res. 14 , 169-181 (2007).
10. Metagenomics of 13 healthy Japanese gut microbiomes Total: 1,065,392 reads (727 Mb) / 13 samples 80,000 sanger reads (55 Mb) / sample 20,063-67,740 genes (≧ 20 a.a.) / sample 662,548 genes / 13 samples 1,617-2,921 COGs / sample 3,268 COGs / 13 samples 162,647 novel gene candidates (25%) Sequencing Gene identification in 479 Mb non-redundant sequence Clustering and similarity search / COG assignment of genes Kurokawa K et al. DNA Res. 14 , 169-181 (2007).
11. Protein-coding gene prediction A program, MetaGene , based on a hidden Markov Model (HMM) algorithm : Noguchi H , Takagi T et al. MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. NAR, 34, 5623-5630 (2006). Genes were predicted from ORFs having ≥ 20 a.a. in non-redundant sequences.
12. The same COG Gut microbiome Ref-DB Orthologous genes NR of the gut microbiome / NR of Ref-DB = Enrichment value ≧ 2 Normalized ratio (NR) = the number of genes / the total number of genes Comparison of genes in the 13 human gut microbiomes with those in Ref-DB (constructed from genes in 243 microbes excluded gut microbes) Survey of enriched COGs in human gut microbiomes >> Enriched COGs : COGs that contains orthologous genes with statistically higher frequency in the human gut microbiome than in Ref-DB.
13. Ave. 0.9 Enriched COGs ≧ 2 Distribution of COG enrichment values for 126 eCOGs clustered by 150 essential genes of E. coli and B.subtilis 0.3 ≦ Enrichment values of 125 eCOGs ≦ 1.9 Depleted COGs <0.3
14. 179 78 58 Adult-type (237) Infant-type (136) Total: 315 COGs Identification of 315 gut-enriched COGs in 13 human intestinal microbiomes
15. Functional categories of the 315 gut-enriched COGs identified in 13 human microbiomes Carbohydrate Conserved but function unknown Repair/modification Cell wall/membrane Energy production Inorganic ion Amino-acid Adult-type (237) Infant-type (136) Total: 315 COGs Overlapped (58) 20% 33%
16.
17. Content of gut-enriched genes (adult) in sequenced genomes of 371 representative microbes isolated from various environments Ave. 9.2% Ave. 4.0% Ave. 2.7% Human gut microbes may have evolved by acquiring and accumulating adaptive genes to gut habitat.
18. >2-fold higher than DB >4-fold higher than DB >10-fold higher than DB Genes remarkably varied in frequency among individual microbiomes Lower than DB Deconjugation Vitamin B12 biosynthesis COG3250, 3119, 4225 COG1270, 1010 Deconjugation and Vitamin B 12 biosynthesis Glucuronated conjugates Sulfonated conjugates Relative frequencies Max/Min
19. Comparison of the KEGG pathways between the human gut and the sea surface Sea-specific pathways Gut-specific pathways Sphingolipid metabolism Arachidonic/linoleic acid metabolism
20. The adult- and infant-types Weaning may be the time to change from the infant type to the adult-type. No strong association was found within family samples Adult-type : stable, robust to environment Infant-type : unstable, sensitive to environment Overall sequence similarity of genes between individual microbiomes by reciprocal pairwise blastp analyses Adults/ children Americans Unweaned infants Soil Sea Whale fall ・ Relatively high similarity among adults and weaned children ・ Relatively high variation among unweaned infants The gut microbiota may be unique to individual
21.
22. Sample: a human gut microbiome ABI 3730xl (Sanger) 79,163 54.9 Mb 700 Production 30 days Relative cost 1 Total bases Read length Read# Metagenomic sequencing of gut microbomes by 454FLXTi based on pyrosequencing Roche 454FLX Ti (1 run) 1,166,204 433 Mb 371.3 5 days 0.1 Gene# 40,300 186,000 No cloning process, no bacterial culture
23. Metagenomic sequencing of human gut microbiomes by 454 GSFLX Titanium Problem in 454 data : artifact reads = reads having the same starting base (*Reason: multiple beads for one DNA molecule in one emulsion) Total num of reads Human sequences Artifact reads Unique reads APr01S00 1,423,122 0.40% 18.24% 81.36% APr09S00 1,133,611 0.45% 14.57% 84.98% APr16S00 818,894 0.55% 22.25% 77.19% APr20S00 1,044,786 0.47% 16.94% 82.58% APr29S00 1,117,685 0.39% 27.18% 72.42% 454 data (1 run) APr01S00 APr06S00 APr16S00 APr20S00 APr29S00 Total num of reads 1,423,122 1,133,611 818,894 1,044,786 1,117,685 Num of unique reads 1,157,883 963,351 632,118 862,794 809,466
24. Sequencing of human microbes By 3730xl only or 3730xl + 454FLX Human microbes (in-house): 56 strains HMP :247 strains (draft) released. HMP International Human Microbiome Project
25. Shotgun reads or genes identified in individual samples Genome 1 Genome 2 Genome 3 Genome 4 Genome 5 Reference genomes Accurate assignment of shotgun reads or metagenomic genes to bacterial genomes Mapping to reference genomes
26. Mapped reads: 47% (average) Mapping of metagenomic reads on 1,236 reference bacterial genomes (including 247 HMP and 56 in-house strains) Mapped Unmapped ≥ 90% identity, ≥ 100 bases 454 data (1 run) APr01S00 APr06S00 APr16S00 APr20S00 APr29S00 Num of unique reads 1,157,883 963,351 632,118 862,794 809,466
27. Taxonomic analysis of the Japanese gut microbiota based on mapping of metagenomic reads (Phylum level) Actinobacteria Bacteroidetes Firmicutes
28. The genomes of 27 Bacteroides species have been sequenced. Bacterial composition at the species level in the same genus by mapping of metagenomic reads
29. Bacterial composition at the species level in the Bifidobacteria The genomes of 14 Bifidobacteriaum species have been sequenced.
30. The microbial composition is highly varied but the functionality is uniform between individuals. Turnbaugh PJ et al. Nature 2009
31. Conclusion 1. The functionality of gut microbiome is largely affected by diet. 2. Intestinal microbes may have evolved to acquire and accumulate functions advantageous for colonization of gut habitat, while eliminating undesired appendages that could result in sensing for pro-inflammatory responses, towards maintenance of host homeostasis. 3. Many function-unknown genes are conserved and are present in intestinal microbes. 4. The microbial diversity is highly varied but the functionality is similar between individuals 5. The gut microbiota may be unique to individual and the origin of intestinal microbiota is unknown . (No strong association of the microbiota was found within the family)
32.
33. Host genetic factors Human genome Genetic variation Interactions To explore and identify both host and bacterial genes or their products as genetic and environmental factors involved in health promotion and maintenance as well as the etiology of diseases such as IBD (Crohn’s disease and ulcerative colitis) and allergy. Our goal is… High-throughput sequencing technology + Bioinformatics Intestinal microbiome Genetic diversity Environmental factors Whole genome sequencing Sequence-based metagenomics
34. 16S sequencing Metagenomic sequencing Genome sequencing Sampling of microbiota from gastrointestinal and urogenital tracts, nasal, oral and skin of several hundreds of healthy and disease-afflicted subjects Microbial diversity Genetic and functional diversity Sequencing of >1,000 species as reference genomes Integrated database of human microbiomes and microbes International Human Microbiome Project International Human Microbiome Consortium (IHMC) Australia, Canada, China, France (as EU), Ireland, Japan, Korea, Singapore, UK and US Launched in 2008 + Metadata of the subjects
35. Members in Human MetaGenome Consortium Japan (HMGJ) Kikuji Itoh All in a day’s catch ! Ken Kurokawa, Hiroshi Mori, Takehiko Itoh, Hideki Noguchi Graduate School of Information Science, Tokyo Institute of Technology Institute of Health Biosciences, University of Tokushima Graduate School Tomomi Kuwahara Frontier Science Research Center, University of Miyazaki Tetsuya Hayashi, Yoshitoshi Ogura RIKEN Genomic Sciences Center Hidehiro Toh , Atsushi Toyoda, Vineet K. Sharma, Tulika P. Srivastava Todd D. Taylor , Yoshiyuki Sakaki Japan Agency for Marine-Earth Science and Technology Hideto Takami Graduate School of Frontier Sciences, University of Tokyo Kenshiro Oshima, Kim Sok-Won, Chie Yoshino , Hiromi Inaba, Keiko Furuya, Yasue Hattori, Erika Iioka, Kanako Motomura, and Masahira Hattori School of Veterinary Medicine, Azabu University Hidetoshi Morita Graduate School of Agricultural and Life Sciences, University of Tokyo 26 persons /10 Universities and Institutes Hiroshi Ohno , Shinji Fukuda RIKEN Center for Allergy & Immunology
Editor's Notes
I talk about the next-generation sequencer Roche 454 based on pyrosequencing from this slide. This is one example for metagenomic sequencing of a human gut microbiome by 454 and shows the comparison with the sanger sequencing by ABI 3730xl. A shown here, one run by 454 produced more than 1 million reads with read length of 370 bases and 433 Mb about 8 times more bases were obtained. The production rate was 5 days for the 454, which is 6 time faster than the ABI 3730xl. And, we could identify 4-5 times more genes in the metagenomic sequences produced by 454 than the previous analysis by sanger sequencing. Conversely, the cost of 454 was reduced to one tenth of the sanger sequencing. And we have no cloning process, no colony picker and no bacterial culture any more. Anyway, from this result, we can easily realize that the 454 sequencer promises much deeper sequence-based metagenomics than the previous sanger sequencing.
We have sequenced human microbomes by 454 and obtained nearly 1 million reads without any problem. But the 454 reads contained many so-called artifact reads, which have the same starting base and almost same sequence each other. In these cases, 10-20% of reads were filtered out as the artifact reads. These artifact reads have no influence on the assembly but have a big problem on the quantitative mapping analysis of 454 reads to the reference genomes. Therefore, we need to remove these artifact reads prior to the mapping analysis. Artifact reads may be generated by incorrect ratio between beads number and DNA molecule. Usually, this ratio in one emulsion is one, but when multiple beads are present in one emulsion containing one DNA molecule, each bead has the same template, which produce the same sequence in different pores in which each bead embeds.
We have done more than 50 strains so far by the sanger only or sanger + 454. HMP has also released 247 draft genomes sequenced by mainly 454 to the NCBI database. My group are releasing only finished data to the public domain, simply because high quality data might be more useful for the researchers.
Sequencing of individual human microbes provides the reference genomes which is very useuful to deeply interpret the metagenomic data. For example, we can simply, accurately and directly map metagenomic shotgun reads or genes identified in the metagenomic data to bacterial genomes to assign them to the species. Very little number of genomes of human microbes were sequenced before the HMP started, but the number is dramatically increasing after the nest-gen sequencers are commercialized. Our group also working on the sequencing of human microbes particularly microbes isolated from the Japanese.