2. “If we possessed a perfect pedigree of
mankind, a genealogical arrangement
of the races of man would afford the
best classification of the various
languages now spoken throughout the
world...”
-Charles Darwin, The Origin of Species,
1859
3. In turn, language data are useful to help us understand
biological diversity and migration processes
A
C
B
History
4. A common language frequently signifies a
common origin and a related language
indicates a common origin further back in
time. Such commonality of origin should
be reflected by genetic relationship,
despite several complicating factors.
Robert R. Sokal (1988) Proc. Natl. Acad. Sci. USA
5. Summary
1.Migrations are population, not molecular, processes
2.Classical comparisons of genes and languages
3.The trouble with the vocabulary and an alternative approach
4.Comparing genes across language families
6. Fig. 1. The first principal component of gene frequencies from 38 independent alleles at
the human loci: ABO, Rh, MNS, Le, Fy, Hp, PGMi, HLA-A, and HLA-B. Shades
indicate different intensities of the first principal component, which accounts for 27
percent of the total variation
It all began from this
P. Menozzi, A. Piazza & L.L. Cavalli-Sforza (1978) Science
8. Diffusion of Neolithic artifacts in Europe
P. Balaresque et al. (2010) PLoS Biology, interpolated from data by R. Pinhasi et
9. Rationale for the proposal of a Neolithic demic diffusion
European genetic diversity distributed in gradients. Only gene flow can generate such
patterns on the continental scale
No documented migration in post-Neolithic times spanning the area from the Levant
to the Atlantic coasts
Neolithic technologies may have spread by cultural contact or by migration (most
likely, by a combination thereof)
Diffusion of Neolithic artifacts cannot produce genetic clines if it is caused only by
cultural contacts
Demic diffusion: expanding Neolithic people carried in Europe their know-how, their
genes, and perhaps their languages too.
10. E. Kitchen et al. (2009) Proc. R. Soc. B
Their languages too?
11. C. Renfrew (1987) Archaeology and language: The puzzle of Indo-European origins
Their languages too?
12. Conditions for the origin of genetic gradients
by demic diffusion
1. Demographic growth of farmers
2. Diffusion, incomplete
admixture
3. Farmers continue to grow in
numbers, hunter-gatherers don’t
A.J. Ammerman & L.L. Cavalli-Sforza (1984) The Neolithic
Transition and the Genetics of Populations in Europe
But…
0. Low population density
13. In the first DNA studies (mtDNA)
very old ages are estimated for the
main European mutations
“Each cluster can be assigned, in its entirety, to
one of the proposed migration phases; the
age of each cluster approximates very
closely the timing of the migratory event”
“The main mitochondrial variants in Europe
predate the Neolithic expansion”
(M. Richards et al. (1996, 2000) Am. J. Hum.
Genet.
14. Estimated ages of mitochondrial haplogroups (x 1000)
Richards Sykes Richards
et al. 1996 1999 et al. 2000
H 23.5 11.0-14.0 15.0 - 17.2
J 23.5 8.5 6.9 - 10.9
T 35.5 11.0-14.0 9.6 - 17.7
IWX 50.5 11.0-14.0
X: 20.0 I: 19.9 - 32.7
K 17.5 11.0-14.0 10.0 - 15.5
U 36.5 5: 50.0 44.6 - 54.4
Neolithic contribution overestimated in
preDNA studies? Hans Bandelt
Haplogroup H, “the signature of the
Paleolithic expansion in Europe”
15. Two basic models
Palaeolithic model Neolithic model
(Cultural diffusion of food- (Demic diffusion of food-
production technologies production technologies
G. Barbujani (2012) Curr. Biol.
16. Ok folks, all those with haplogroup H
come with me, let’s do the Paleolithic
migration. No way Steve, not you. You’re
a J, damn it, a J! Wait until the Neolithic!
“Each cluster can be assigned, in its entirety,
to one of the proposed migration phases; the age of each cluster
approximates very closely the timing of the migratory event”
17. Ancient DNA evidence: Neolithic Europeans did not only carry
the J hg, no evidence of the H hg in Paleolithic Europeans
20,000
55,000
45,000
7,700
22,700
13,600
12,000
Haplogroup estimated age
21 pre-neolithic hunter-gatherers 105 Nolithic farmers
18. Post Pr (Model B): 1,655 to 2,691 folds as
high as Post Pr (Model A)
Genetic continuity since Paleolithic times very unlikely in ABC
analyses of mtDNA
2 individuals from the Upper Paleolithic, 43 from
the Mesolithic (including the two La Braña
specimens) and 121 from the Neolithic
19. It is people who migrate, not haplogroups
Haplogroup ages are not estimates of migration times
20. Summary
1.Migrations are population, not molecular, processes
2.Classical comparisons of genes and languages
3.The trouble with the vocabulary and an alternative approach
4.Comparing genes across language families
21. Often, genetic isolates are also linguistic isolates
F. Calafell & J. Bertranpetit (1993) Am. J. Phys. Anthropol.
22. In Europe, linguistically-related populations are
genetically closer than unrelated populations
separated by the same geographic distance
Correlations Positive, significant
• GEO,LANG 26 / 26
• GEO,GEN 22 / 26
• GEN,LANG 16 / 26
• GEN,LANG.GEO 11 / 26
R.R. Sokal (1988) Proc. Natl. Acad. Sci USA
23. In agreement with Renfrew’s predictions, four African-
Eurasian gradients corresponding to four language families
G. Barbujani & A. Pilastro (1993) Proc. Natl. Acad. Sci .USA
24. R.D. Gray & Q.D. Atkinson
(2003) Nature
In agreement with Renfrew’s
predictions, estimated
divergence between Indo-
European languages
between 7,800 and 9,500
years BP
25. R. Bouckaert et al. (2012) Science
In agreement with Renfrew’s predictions, geographic origin of the
Indo-European family inferred in Anatolia
27. A simple, global correspondence
between genetic and linguistic
diversity?
1. Do we speak different languages
because our genes influence
language learning?
2. Do we carry different alleles because
we speak different languages?
L.L. Cavalli-Sforza et al. (1988) Proc. Natl. Acad. Sci. USA
28. Summary
1.Migrations are population, not molecular, processes
2.Classical comparisons of genes and languages
3.The trouble with the vocabulary and an alternative approach
4.Comparing genes across language families
29. Many linguists disagree
Controversial linguistic classifications
Random similarities due to the limited number
of sounds humans can produce
Impossibility to tell random from significant
correspondences if etimologies cannot be
traced
Overlapping cultural boundaries
31. An alternative to vocabulary comparisons: Structural
features of languages in grammar and syntax
Word order
English
equivalent
Proportion
of languages
Example
languages
SOV "She him loves." 45% Pashtoon, Japanese, Afrikaans
SVO "She loves him." 42% English, Hausa, Mandarin
VSO "Loves she him." 9% Hebrew, Tuareg, Zapotec
VOS "Loves him she." 3% Malgasy, Baure
OVS "Him loves she." 1% Hixkaryana
OSV "Him she loves." <1% Warao
The Parametric Comparison Method
G. Longobardi & C. Guardiano (2009) Lingua
32. Summary
1.Migrations are population, not molecular, processes
2.Classical comparisons of genes and languages
3.The trouble with the vocabulary and an alternative approach
4.Comparing genes across language families
33. 5,886 subjects genotyped at 500,568 loci using the Affymetrix 500K single nucleotide polymorphism
(SNP) chip.
POPRES populations that match our linguistic database in Europe
Genetic data
Populations: England, France,
Germany, Greece, Hungary, Ireland,
Italy, Poland, Portugal, Romania,
Russia, Serbia, Croatia, Spain
34. 20 Spanish Basques
+ Basque
+ Finnish
93 Finns
Final sample size:
805 individuals for ~ 220,000 SNPs
(MAF > 0.01 and genotyping rate > 98%)
36. A matrix summarizing structural variation in 15 European languages
PCAnalysis of languages
37. Language diversity
Genomic diversity
Common elements and
differences between
PCA plots of genomic
and linguistic diversity
Main inconsistencies:
1.Hungarians genomes close to those
of Indo-European speakers
2.Romanian genomes close to those
of their geographical, non-Romance
speaking, neighbours
38. Among Indo-European languages, distances inferred from
vocabulary and syntax suggest similar clusterings
Vocabulary Syntax
39. In Europe, distances inferred from syntax and DNA suggest similar
clusterings
Syntax Genetic distances
40. Path difference distance between
linguistic and genetic UPGMA
Comparison with those obtained in
100,000 pairs of random topologies
drawn, with replacement, from the
total set of the possible topologies for
15 taxa
Probability to obtain smaller distance
values than observed, P<0.004
The close relationship between trees inferred from linguistic and
genetic distances is very unlikely to have arisen just by chance
42. Mantel and partial Mantel correlations between distance matrices
Bonferroni P=0.006
43. Main inconsistencies:
1.Hungarians genomes close to
those of Indo-European
speakers
2.Romanian genomes close to
those of their geographical,
non-Romance speaking,
neighbours
Recent admixture accounts for some PCA inconsistencies
44. To summarize:
1.Within the Indo-European family, similar trees inferred from vocabulary and
syntactic comparisons
2.European populations speaking similar languages also tend to resemble each
other at the genomic level
3.Syntax appears to offer a better prediction of genomic distances than geography
4.Contacts between populations after their separation from a common ancestor
can be recognized, and better accounted for, by comparing genomic and linguistic
patterns of variation
46. Henn et al. (2012) Proc Natl
Acad Sci USA
Scally and Durbin (2012)
Nature Rev Genet
47. Fossil, archaeological and genomic evidence place divergence among continental
populations in the interval. 120-60 k years ago. When did the main language
families diverge?
Correlation suggests, but does not prove, common causation. Would it be possible
that the same geographic constraints led to parallel genetic and linguistic change,
in different time moments?
Darwin had in mind population trees; but how sure are we that genetic evolution
and linguistic change really occurred in a tree-like fashion?
Indoeuropean Documentation Center, Utexas at Austin
Several open questions
48. Silvietta Ghirotto
Francesca Tassi
Pino Longobardi
York University
Davide Pettener
University of Bologna
http://www.langelin.org/
Cristina Guardiano
University of Modena