Vector Databases 101 - An introduction to the world of Vector Databases
Lisbon genome diversity
1. Human genome diversity: Frequently asked questions Guido Barbujani Dipartimento di Biologia ed Evoluzione, Università di Ferrara [email_address]
2. Total size 3 272 480 987(haploid) N of protein-coding genes 22 320 N of RNA-coding genes 9 922 N of gene exons 530 906 N of transcripts 142 707 N of segregating sites 15 040 632 Nucleotide differences with chimp 1.23% Chimp orthologue genes 13 454 Human genes missing in chimp 36 totally, 17 largely Classes of genes with max. differences immune response, reproduction, olfaction A few human genome statistics From www.ensembl.org version 57.37b (Jan. 2010)
5. Phylogenetic tree of human (n=70), chimpanzee (n=30), bonobo (n=5), gorilla (n=11) and orang-utan (n=14), based on 10,000 bp sequences of a noncoding Xq13.3 region. Kaessmann et al. (2001). Individual genetic diversity among humans is the lowest of all primates
6. Genomic estimates of F ST for the global human population are 0.12 Human populations display 12% of the maximum possible diversity, given their allele frequencies N of markers Samples F ST Reference 599,356 SNPs 209 individuals from 4 populations: Caucasian, Chinese, Japanese, Yoruba 0.13 Weir et al. 2005 1,034,741 SNPs 71 individuals from 4 populations: Caucasian, Chinese, Japanese, Yoruba 0.10 Weir et al. 2005 1,007,329 SNPs 269 individuals from 4 populations: Caucasian, Chinese, Japanese, Yoruba 0.12 International HapMap Consortium 2005 443,434 SNPs 3845 worldwide distributed individuals 0.052 Auton et al. 2009 2,841,354 SNPs 210 individuals from 4 populations: Caucasian, Chinese, Japanese, Yoruba 0.11 Barreiro et al. 2008 243,855 SNPs 554 individuals from 27 worldwide populations 0.123 Xing et al. 2009 100 Alu insertions 710 individuals from 23 worldwide populations 0.095 Watkins et al. 2008 67 CNVs 270 individuals from 4 populations with ancestry in Europe, Africa or Asia 0.11 Redon et al. 2006
7. 0.38 0.32 0.12 Genetic diversity among human populations is the lowest of all primates F ST Geographically-variable selection Small population sizes Little gene flow Isolation Stabilizing selection Large population sizes Extensive gene flow Admixture
8. Li et al. (2009) Clinal variation in the geographical space is the rule for human populations Cavalli-Sfdorza et al. (1994)
9. Methods 1 : Estimating variances from sequence comparisons - TA C GAACATC A GGC - - TA T GAACATC A GGC - - TA T GAACATC G GGC - Polymorphic DNA sites
10. Genetic variances within and between populations Population 1 Population 2 variance between pops. 100% 19% 0%
11. Independent studies of genetic variances yield very similar results: 85, 5, 10 Lewontin (1972) 17 loci 85% 8% 6% Latter (1973) 18 86% 5% 9% Barbujani et al. (1997) 109 85% 5% 10% Jorde et al. (2000) 100 85% 2% 13% Romualdi et al. (2002) 32 83% 8% 9% Rosenberg et al. (2002) 377 93% 3% 4% Excoffier & Hamilton (2003) 377 88% 3% 9% Ramachandran et al. (2005) 17 90% 5% 5% Bastos-Rodriguez et al. (2006) 40 86% 2% 12% Li et al. (2008) 650 000 89% 2% 9% MEDIANA within populations between populations between races or continents 85% 5% 10%
12. What does it mean, in practice? 100% 100% 100% Members of our community are only slightly less different from us than members of distant populations 85% 85% 85%
13. Mind the numbers Humans and chimps share >98% of their genomes Among the 1.8% differences, 1.7% are fixed differences within species The remaining fraction, 0.1%, contains all human genomic variation The differences among the main continental groups represent 10% of 0.1% of the total, that is, 0.01% But 0.1% of >3 billion DNA sites means >3 million polymorphic DNA sites (3,213,401 according to Levy et al. 2007)
14. Methods 2 : Clustering genotypes or haplotypes K=3 K=4 Rosenberg et al., 2002
15. SNPs Haplotypes CNV Jakobsson et al. (2008) Structure inferred from SNPs and haplotypes differs from that inferred from Copy Number Variation
16. Genes, as well as morphology, suggest inconsistent clusterings of genotypes Y chromosome: Romualdi et al. 2002 X chromosome: Wilson et al. 2001 377 STR loci: Barbujani and Belle 2006 377 STR loci: Rosenberg et al. 2002 Europe, Ethiopia S. Africa N. Guinea Asia Africa Asia, Europe, Australia, Americas Americas Melanesia Eurasia N Africa N America Maya S. Africa E Africa C Africa Piapoco Suruì Karitiana Kalash W. Eurasia E. Asia Africa Americas Oceania
22. Two persons from the same continent may share fewer SNPs than persons of different continents
23. 81% of SNPs cosmopolitan. Alleles present in one continent only: 0.91% in Africa, 0.75% in Asia, practically 0 elsewhere. Jakobsson et al. 2008 (525910 SNPs, 396 CNVs)
24. In the 117 megabases (Mb) of sequenced exome-containing intervals, the average rate of nucleotide difference between a pair of the Bushmen was 1.2 per kb, compared to an average of 1.0 per kb between a European and Asian individual. Schuster et al. (2010) Greater differences between Africans than between European and Asians
25. Genetic diversity out of Africa is often a subset ot the African genetic diversity Tishkoff et al. (1998)
26. LD decreasing with physical distance between loci and with geographic distance from East Africa Jakobsson et al. 2008
27. Gene diversity declines as a function of distance from Africa Best fit of the model for an African exit 56,000 years ago Liu et al. (2006)
28. Patterns of morphological and genetic variation are compatible with the effects of dispersal from Africa Manica et al. (2007)
29. Models with an African population replacing previous human continental groups explain the data better than any alternative models Fagundes et al. (2007)
38. Divergence from modern humans Neandertals fall inside the variation of present-day humans. Overall divergence greater for the three Neandertal genomes (modes ~11%), whereas the San mode is ~9% and for the other present-day humans ~8%. For the Neandertals, 13% of windows have a divergence above 20%, whereas this is the case for 2.5% to 3.7% of windows in the current humans
39.
40. 1. Comparison with the HapMap sequences and 5 newly sequenced individuals 2. No comparison within Eurasia (Papuan-French-Han) or within Africa (Yoruba- San) shows significant skews in D 3. All comparisons of non-Africans and Africans show that the Neandertal is closer to the non-Africans 4. All or almost all the gene flow detected was from Neandertals into modern humans 5. Some old haplotypes most likely owe their presence in non Africans to gene flow from Neandertals
41. Four processes potentially accounting for the data Between 1 and 4% of the genomes of people in Eurasia are derived from Neandertals 1. From erectus to Neandertal 2. From late Neandertals into the first Europeans 3. From early Neandertals into the first Eurasians 4. Ancient population structure, preserved from before the Neandertal – modern sapiens separation
42. Enza Colonna And if you want to read more about all this Trends in Genetics , July 2010