Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Visualising Typological Relationships: Plotting WALS with Heat Maps
1. Visualising Typological
Relationships: Plotting WALS
with Heat Maps
Richard Littauer¹, Rory
Turnbull², Alexis Palmer¹
1 Universität des Saarlandes
2 Ohio State University
2. Why?
• Data deluge in science
• Typology has been shown to be useful for
linguistic studies (Greenberg 1963, Chomsky
2000, Dunn et al. 2001).
• Showing typological diversity visually can help
cut down on research time and illuminate new
areas of possible research.
3. Basic Overview
• Our visualisation technique combines:
– geographic
– phylogenetic
– linguistic data.
World Atlas of Language Structures (WALS)
(Dryer and Haspelmath, 2011).
4. Previous Work
Similar visualisation work:
- Language Typology: Mayer et al., 2010;
Rohrdantz et al., 2010
- Phylogeny: Multitree, 2009
- Geographical variation: Wieling et al., 2011
Work with WALS:
- Daumé & Campbell 2007, Daumé 2009
5. Pruning
WALS:
– 2,678
– 192 feature options (out of 144 features)
– 16% of the data filled
Pruning:
– 372 Languages
– Average of 96 features
– Only languages with 30% or more filled
6. Phylogenetic Distance
WALS’ Tree Hierarchy:
– Three different levels
– Doesn’t take into account language contact.
• Family: ‘Sino- Tibetan’;
• Sub-family: ‘Tibeto-Burman’;
• Genus: ‘Northern Naga’.
– We used geographical proximity as a proxy for
language contact.
7. Geographical Proximity Filtering
• Each language in WALS is associated with a
geographical coordinate.
• Haversine formula
• Within limits: geography, fullness in WALS.
8. Geographical Proximity Filtering
• First approach:
– Arbitrary radius from centroid in order to create a
decision boundary for clustering neighbouring
languages.
– 500 kilometres provided a sufficient number of
examples after cleaning WALS.
9. Geographical Proximity Filtering
• Second approach:
– Arbitrary lower bound for near languages.
– Sufficient remainder.
– Under-representative of contact languages.
– Not as good as the radius method.
10. WALS Languages and Sparsity
-3
Language Family
Kiwaian
Kwomtari-Baibai
-4
Lower Sepik-Ramu
Other
Trans-New Guinea
Longitude
-5
Feature density
0.1
0.2
0.3
-6
0.4
0.5
0.6
0.7
-7
-8
140 142 144 146
Latitude
11. Geographically Focused Map
Alignment of Verbal Person Marking
O & V Ordering and the Adposition & NP Order ing
Person Marking on Adpositions
Gender Distinctions in Independent Personal Pronouns
O & V Ordering and the Adj & N Order ing
Order of Adjective and Noun
Order of Adposition and Noun Phrase
Position of Tense−Aspect Affixes
Order of Genitive and Noun
Negative Morphemes
Position of Negative Word With Respect to S, O, & V
Postverbal Negative Morphemes
Preverbal Negative Morphemes
Order of Negative Morpheme and Verb
Order of Object and Verb
Arapesh (Mountain)
Una
Imonda
Waskia
Amele
Usan
Kobon
Yimas
Alamblak
Kewa
Tauya
Hua
Yagaria
Dumo
Awtuw
Hamtai
Sentani
12. Phylogenetic Focused Map
Comparative Constructions
Order of Degree Word and Adjective
Adjoined relative clauses
SVNegO Order
SNegVO Order
NegSVO Order
Optional Double Negation in SVO languages
Different word order in negative clauses
Order of Person Markers on the Verb
Position of Polar Question Particles
Correlative relative clauses
Reciprocal Constructions
Double−headed relative clauses
Postnominal relative clauses
Order of Numeral and Noun
Temne
Kisi
Grebo
Bambara
Supyire
Akan
Koromfe
Dagbani
Ewe
Yoruba
Gwari
Igbo
Babungo
Mumuye
Ewondo
Doyayo
Kongo
Gbeya Bossangoa
Sango
Luvale
Nkore−Kiga
Zulu
Swahili
Wolof
Ijo (Kolokuma)
Fula (Nigerian)
Diola−Fogny
Birom
Fyem
W E
13. More Maps
Alignment of Verbal Person Marking
Alignment of Verbal Person Marking Alignment of Verbal Person Marking
O & V Ordering and the Adposition & NP Order ing
Relationship between the Order of O & V and the Order of Adp & NP Relationship between the Order of O & V and the Order of Adp & NP
Person Marking on Adpositions Person Marking on Adpositions
Person Marking on Adpositions
Gender Distinctions in Independent Personal Pronouns Gender Distinctions in Independent Personal Pronouns
Gender Distinctions in Independent Personal Pronouns
O & V Ordering and the Adj & N Ordering Relationship between the Order of O & V and the Order of Adj & N
Relationship between the Order of O & V and the Order of Adj & N
Order of Adjective and Noun Order of Adjective and Noun
Order of Adjective and Noun
Order of Adposition and Noun Phrase Order of Adposition and Noun Phrase
Order of Adposition and Noun Phrase
Position of Tense−Aspect Affixes Position of Tense−Aspect Affixes
Position of Tense−Aspect Affixes
Order of Genitive and Noun Order of Genitive and Noun
Order of Genitive and Noun
Negative Morphemes Negative Morphemes
Negative Morphemes
Position of Negative Word With Respect to S, O, & V Position of Negative Word With Respect to Subject, Object, and Verb
Position of Negative Word With Respect to Subject, Object, and Verb
Postverbal Negative Morphemes Postverbal Negative Morphemes
Postverbal Negative Morphemes
Preverbal Negative Morphemes Preverbal Negative Morphemes
Preverbal Negative Morphemes
Order of Negative Morpheme and Verb Order of Negative Morpheme and Verb
Order of Negative Morpheme and Verb
Order of Object and Verb Order of Object and Verb
Order of Object and Verb
Arapesh (Mountain)
Una
Imonda
Waskia
Amele
Usan
Kobon
Hamtai
Awtuw
Yimas
Alamblak
Kewa
Tauya
Hua
Yagaria
Dumo
Sentani
Arapesh (Mountain)
Dani (Lower Grand Valley)
Hua
Amele
Usan
Kobon
Yimas
Alamblak
Dumo
Imonda
Una
Kewa
Tauya
Waskia
Yagaria
Asmat
Awtuw
Sentani
Arapesh (Mountain)
Una
Suena
Waskia
Amele
Hua
Kewa
Kobon
Yagaria
Usan
Marind
Imonda
Awtuw
Alamblak
Tauya
Yimas
Hamtai
Alignment of Verbal Person Marking Alignment of Verbal Person Marking
Alignment of Verbal Person Marking
Relationship between the Order of O & V and the Order of Adp & NP Relationship between the Order of O & V and the Order of Adp & NP
Relationship between the Order of O & V and the Order of Adp & NP
Person Marking on Adpositions Person Marking on Adpositions
Person Marking on Adpositions
Gender Distinctions in Independent Personal Pronouns Gender Distinctions in Independent Personal Pronouns
Gender Distinctions in Independent Personal Pronouns
Relationship between the Order of O & V and the Order of Adj & N Relationship between the Order of O & V and the Order of Adj & N
Relationship between the Order of O & V and the Order of Adj & N
Order of Adjective and Noun Order of Adjective and Noun
Order of Adjective and Noun
Order of Adposition and Noun Phrase Order of Adposition and Noun Phrase
Order of Adposition and Noun Phrase
Position of Tense−Aspect Affixes Position of Tense−Aspect Affixes
Position of Tense−Aspect Affixes
Order of Genitive and Noun Order of Genitive and Noun
Order of Genitive and Noun
Negative Morphemes Negative Morphemes
Negative Morphemes
Position of Negative Word With Respect to Subject, Object, and Verb Position of Negative Word With Respect to Subject, Object, and Verb
Position of Negative Word With Respect to Subject, Object, and Verb
Postverbal Negative Morphemes Postverbal Negative Morphemes
Postverbal Negative Morphemes
Preverbal Negative Morphemes Preverbal Negative Morphemes
Order of Negative Morpheme and Verb Preverbal Negative Morphemes
Order of Negative Morpheme and Verb
Order of Object and Verb Order of Object and Verb Order of Negative Morpheme and Verb
Order of Object and Verb
Dani (Lower Grand Valley)
Arapesh (Mountain)
Una
Waskia
Hua
Amele
Kewa
Kobon
Tauya
Imonda
Yagaria
Dumo
Marind
Hamtai
Awtuw
Alamblak
Yimas
Usan
Sentani
Dani (Lower Grand Valley)
Arapesh (Mountain)
Hua
Tauya
Waskia
Kobon
Imonda
Alamblak
Dumo
Usan
Amele
Kewa
Una
Yagaria
Sentani
Awtuw
Yimas
Arapesh (Mountain)
Una
Dumo
Yagaria
Kewa
Tauya
Yimas
Kobon
Usan
Alamblak
Amele
Hua
Waskia
Imonda
Suena
Hamtai
Awtuw
14. Conclusion
• A newly applied method for looking at sparse
data
• Combines phylogenetic, geographic, and
typological data
15. Final Remarks
Future work:
• Integrating Ethnologue or Multitree for
language families.
• Further exploration showing more natural
organisation of the linguistic features
All code and visualisations available here:
https://github.com/RichardLitt/visualizing-language
Hinweis der Redaktion
The map is centred on Yimas, a language spoken in New Guinea. The bar at the top of the image represents the language family of the language in that column: Pink = Border; Red = Trans-New Guinea; Blue = Sepik; Brown = Lower Sepik-Ramu; Purple = Torri- celli; Green = Skou; and Orange = Sentani.Thecolour of each cell represents the normalised value of a linguistic feature according to WALS. Languages with the same colour in a given row have the same value forthat typological feature.We graphed only the most commonly-occurring features across that selectedset of languages.We first centred the source language in the map. So, the centre values are more likely to be near each other than the extremes. Insights: Most prominently, these languages are quite homogenous for the selected features, which is expected, as most are related. In the 5th row (‘O&V Ordering and the Adj&N Ordering’), we see via the cluster of red cells a partial grouping of languages close to Yimas, with less similarity at a greater distance. The nearly alternating pattern we see for ‘Position of Negative Word With Respect to S,O,&V’ may suggest areal groups that have been split by the data-centring function. Also, the checkerboard pattern for this feature and the one below (‘Postverbal Negative Morphemes’) suggests a possible negative correlation between these two linguistic features.
Here, we have the Niger-Congo family, arranged from east to West. This gets rid of some of the problems with the centric map – we can look at this and see more clearly what the geographical distribution is. Of course, this only works on selected areasA number of the western languages show red cells for features related to relative clauses; these can be compared to mostly blue cells in the eastern languages. We also see some apparent groupings for variable word order in negative clauses (red cells in western languages) and for NegSVO Order (purple cells in western languages). For some pairs of adjacent languages (most notably Bambara and Supyire), we see clusters of shared features. Why is this important? Well, Bambara has been used to dispute claims about the Chomskyan language hierarchy (Culy, 1985), this graph is an excellent example of visualisation pointing out an intriguing area for closer analysis. We should look closer at Supyire to see if it is similar.And that’s the whole point of this – identifying related languages, and seeing diversity clearly. We hope that more of these can be used.