1. Surnames as Indicators of Cultural Regions James Cheshire PhD Supervisors: Prof. Paul Longley, Dr Pablo Mateos Department of Geography, University College London Research Blog: jamescheshire.co.uk Email: james.cheshire@ucl.ac.uk
6. Data 2001 Enhanced Electoral Roll 45.6 Million People 1,597, 805 Surnames 1,457, 681< 10 occurrences 1.5 million postcodes, 436 Districts 1881 Census 29 Million People 425, 793 Surnames 345, 781 <10 occurrences 657 Districts Worldnames Database Approx. 300 million individuals, 26 Countries
7.
8. Creating Regions: Aggregating Surname Data - Each district in Britain is assigned a position in âsurname spaceâ based on a matrix Laskerâs Distances. 95Z 99ZZ OOLN 00BL 7.520982 7.336616 7.219516 00BM 7.428889 7.315671 7.425037 00BN 7.347616 7.356772 7.394888 00BP 7.452982 7.299915 7.330886 00BQ 7.410027 7.300150 7.387787 Yarmouth Yeovil York Aberayron 6.389540 6.289929 6.438361 Aberdeen 6.356152 7.019357 6.213222 Abergavenny 6.412893 6.361753 6.566717 Aberystwith 6.327093 6.319481 6.467985 Abingdon 6.353814 6.559106 6.621873 2001 Matrix 1881 Matrix District x Laskerâs Distance
9. District x Laskerâs Distance Creating Regions: Grouping Laskerâs Distance - Multidimensional Scaling - Clustering: Wardâs Hierachical Clustering K- Means
10. Creating Regions: Multidimensional Scaling http://www.let.rug.nl/~kleiweg/indexs.html North East North West Yorkshire and the Humber East Midlands West Midlands East of England South East South West Wales Scotland Northern Ireland 1881
16. Corby: A Scottish Town? In 1932 Stewarts and Lloyds built a new iron and steel works in Corby. Workforce sourced from closing Scottish steelworks, mainly in Lanarkshire. Into the 1970s, 50% of the incoming population Scottish. Transformed population from 1,500 to 34,000 . Annual Highland Games.
21. References Lasker Distance: Lasker, G. W. and C. G. N. Mascie-Taylor (2001). "The genetic structure of English villages: surname diversity changes between 1976 and 1997." Annals of Human Biology 28(5): 546-553. K-Means: Adnan, M., Singleton, A.D., Brunsdon, C., Longley, P.A. 2009. Moving to Real-Time Segmentation: Efficient Computation of Geodemographic Classification. GISRUK 2009. Multidimensional Scaling Plots: Kleiweg, P. : http://www.let.rug.nl/~kleiweg/L04/ Monmonier Algorithm: Manni, F., E. Guerard, et al. (2004). "Geographic Patterns of (Genetic, Morphologic, Linguistic) Variation: How Barriers Can Be Detected by Using Monmonierâs Algorithm." Human Biology 76(2): 173-190. KDE: Crimestat Workbook: http://www.icpsrdirect.org/CRIMESTAT/workbook/CrimeStat_III_Workbook_PowerPoint.ppt R Packages: Adegenet, cluster, maptools, rgl, sm, spdep , splancs from http://cran.r-project.org iL04_1.13 from http://www.let.rug.nl/~kleiweg/L04/ All boundary data from the maps Crown Copyright Ordnance Survey 2009.
It is worth noting that the trends/ regions I am seeking to identify are best represented in âAnglo-Saxonâ names or those with origins in Britain. Migrant names, although interesting, are included in the calculations but do not exert significant influence on regional characteristics. The exception is London in the 2001 data.
Kernel Density Estimation maps to show the areas of highest frequency of a particular name in Britain. Two extremely common names at the top, two rarer names at the bottom.
Data analyzed at district level in this study. The two years are kept separate and provide interesting comparisons of the changing regions. No study has utlized this volume of data in the mapping or regionalizing of British surnames. Only a couple have attempted a study on the national scale, and none have attempted comparisons with a historical dataset.
Laskerâs Coefficient of Isonymy is widely used for surname studies and extends the idea of monophyly (sharing a single common ancestor) between two populations. Measure explained as the probability of members of two populations or subpopulations having genes in common by descent as estimated from sharing the same surnames. No the intention of this talk to go into significant depth regarding this measure.
In this example, you can see from the 1881 Matrix that Yeovil is more similar in surname structure to Aberayron than Aberdeen. Diagram represents how the districts would look if projected into Laskerâs Distance Space. Clustering (represented by dashed circles) enables groups of districts with similar names to be regionalized.
K-m eans is stochastic- only a couple of results from the Corby and Audlem examples have maps produced from K -means results due to time constraints in this talk.
Animated cube on the left shows the relative position of each of the 2001 districts following the MDS applied to the Laskerâs Distances between districts. One can clearly see that the clusters in the cube represent geographical locations. Animated colour cube on the right illustrates how each of the MDS coordinates are converted into colour values according to the axes of the right hand cube. These values allow colours to be assigned to districts to produce the choropleth map in the next slide. Districts that are more similar have coordinates that place them closer together in the colour cube. This means that they receive similar colour values.
Results from the MDS analysis for of the Laskerâs Distances for 1881 (left) and 2001 (right). The more gradual change for 2001 clearly shows isolation of names by distance but also a more homogenous Britain than the one mapped for 1881.
Explain what the tree means from the top down: In 1881 the first split occurs between England and Wales, followed by a north/ south split in England then a split between north England and Scotland. In 2001 the first split occurs between England and Scotland, then England, Wales and Scotland, then North/ South England.
Map of Wardâs Clustering, splitting Britain into 15 clusters. Despite the fact that spatial information regarding the geographical locations of the districts has not been included in the clustering and that there are no continuity constraints, the resulting regions at 15 clusters are surprisingly homogenous.
Map of Wardâs Clustering, splitting Britain into 15 clusters. Despite the fact that spatial information regarding the geographical locations of the districts has not been included in the clustering and that there are no continuity constraints, the resulting regions at 15 clusters are surprisingly homogenous.
The town of Corby is consistently clustered/ highlighted as a Scottish District in 2001, not a central England as would be expected given its location in Northamptonshire. This is not the case with the 1881 data, suggesting a Scottish migration into the area.
This migration theory appears to be plausible.
This migration theory appears to be plausible.
Finally, the town that voted to be Welsh. Do the surnames of its population get clustered into the Welsh group or an English one?
Political motives, such as free prescriptions, rather than genealogical or cultural motives appear to be driving the locals to vote to be Welsh. It could of course also have been tongue in cheek!.
Suggest that the MDS is a more elegant way of mapping surnames. It does not require a preconception of the number of clusters and also facilitates an impression of gradual change in surname structure, if one exists, rather than the abrupt changes inevitably inferred by the Wardâs and K-means clustering.