SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Mining and mapping places with multiple names
James Butler & Christopher Donaldson
Lancaster University
1901
Corpus of Lake District
Literature
1688 1789 1837
• 80 texts, comprising more than
1,500,000 words
• Mixture of canonical and non-
canonical literature about the Lake
District, mainly from c18 and c19
(78 out of 80 works)
• Mixture of genres, including
guidebooks, travelogues, novels,
poems, journals, and private letters
34 Texts
650K words
22 Texts
250K words
22 Texts
613K words
Sample sentence collocation: beautiful
‘Again entering the boat, we passed up the channel between Lord’s
Island the shore, from whence beautiful prospects are obtained of the
majestic form of Skiddaw, with the woods of Castlehead and
Cockshot Park in the foreground.’ (Edward Baines, A Companion to the
Lakes [1829] 121.)
±5 tokens: No place-names identified
±10 tokens: 2 place-names identified – Lord’s Island & Skiddaw
Within sentence: 4 place-names identified – Lord’s Island, Skiddaw, Castlehead &
Cockshot Park.
Average sentence length
Lake District corpus = 29.8 words
British National Corpus (BNC) = 16 words
from C. Grover, et al., ‘Use of the Edinburgh Geoparser for Georeferencing Digitized
Historical Collections’, Phil. Trans. R. Soc. A 368 (2010) 3875–89.
Diagram of the Edinburgh Geoparser System
Example of input/output from the Edinburgh Geoparser
System
Geo-referenced Data from the Edinburgh Geoparser
Geo-referenced Data, Corrected
Bowness: ‘the curved headland’, from ON bogi/OE boga ‘bow’ and ON nes/OE naess
‘headland’
*Variant Historical Spellings: Bownus, Bawnas, Bonas, Bonus, Boulness
cf. D. Whaley, A Dictionary of Lake District Place Names
(Nottingham: English Place-Name Society, 2006), 42.
Some of the common generic gazetteer geo-referenced issues…
Spatial misattribution.
Onomastic misassumption
Incorrect weighting
Just for the items that are found!
An extract of our custom manually-collected gazetteer for the corpus
Unique
ID
Topog.
Cat.
Primary Name Secondary Names Regional
Placement
CONISTON (lake):
Thurstan, Coniston Lake, Coniston Water, Thurston, Conistone, Conistone
Lake, Cunnistone Lake, Thurston Lake, Coniston Mere, Lake of Coniston,
Conis- ton, Conyngs Tun, Conyngeston, Thorstane's watter, Turstinus.
Geospatial categories chosen for flexibility and degree of universal referential
specificity
An extract from the latest iteration of the corpus - allowing referential
relationships to be analysed on a whole new level.
Lake, Vale, Specific - Farm, Waterfall

Weitere ähnliche Inhalte

Andere mochten auch

Alma_Implementation_slides_May06_2016
Alma_Implementation_slides_May06_2016Alma_Implementation_slides_May06_2016
Alma_Implementation_slides_May06_2016mahongzn
 
Alma Day Presentations - Lancaster University 2013-06-03
Alma Day Presentations - Lancaster University 2013-06-03Alma Day Presentations - Lancaster University 2013-06-03
Alma Day Presentations - Lancaster University 2013-06-03Lancaster University Library
 
Newcastle University Library - Pop-up Library
Newcastle University Library - Pop-up LibraryNewcastle University Library - Pop-up Library
Newcastle University Library - Pop-up LibraryCILIP PPRG
 
Sparc-Japan-Slow-revolution-in-scholarly-communication
Sparc-Japan-Slow-revolution-in-scholarly-communicationSparc-Japan-Slow-revolution-in-scholarly-communication
Sparc-Japan-Slow-revolution-in-scholarly-communicationhierohiero
 
Social Networking with SEDA
Social Networking with SEDASocial Networking with SEDA
Social Networking with SEDASue Beckingham
 
Public Engagement in the Digital Age and Handling Web Negativity
Public Engagement in the Digital Age and Handling Web NegativityPublic Engagement in the Digital Age and Handling Web Negativity
Public Engagement in the Digital Age and Handling Web NegativityNational Research Center, Inc.
 
Science & Community Public Engagement Workshop
Science & Community Public Engagement WorkshopScience & Community Public Engagement Workshop
Science & Community Public Engagement Workshopwellcome.trust
 
The value of engagement
The value of engagementThe value of engagement
The value of engagementwellcome.trust
 
Session 5 keeping up to date
Session 5   keeping up to dateSession 5   keeping up to date
Session 5 keeping up to dateRLS-Johnrylands
 
The good, the efficient and the open: changing research workflows and the nee...
The good, the efficient and the open: changing research workflows and the nee...The good, the efficient and the open: changing research workflows and the nee...
The good, the efficient and the open: changing research workflows and the nee...hierohiero
 
Alma Live at Imperial College London
Alma Live at Imperial College LondonAlma Live at Imperial College London
Alma Live at Imperial College LondonAndrew Preater
 
Different Media for communicating Science to different groups
Different Media for communicating Science to different groupsDifferent Media for communicating Science to different groups
Different Media for communicating Science to different groupswellcome.trust
 

Andere mochten auch (15)

Measuring research impact with bibliometrics
Measuring research impact with bibliometricsMeasuring research impact with bibliometrics
Measuring research impact with bibliometrics
 
2013 pod travel fellowship announcement final
2013 pod travel fellowship announcement final2013 pod travel fellowship announcement final
2013 pod travel fellowship announcement final
 
Alma_Implementation_slides_May06_2016
Alma_Implementation_slides_May06_2016Alma_Implementation_slides_May06_2016
Alma_Implementation_slides_May06_2016
 
Alma Day Presentations - Lancaster University 2013-06-03
Alma Day Presentations - Lancaster University 2013-06-03Alma Day Presentations - Lancaster University 2013-06-03
Alma Day Presentations - Lancaster University 2013-06-03
 
Newcastle University Library - Pop-up Library
Newcastle University Library - Pop-up LibraryNewcastle University Library - Pop-up Library
Newcastle University Library - Pop-up Library
 
Sparc-Japan-Slow-revolution-in-scholarly-communication
Sparc-Japan-Slow-revolution-in-scholarly-communicationSparc-Japan-Slow-revolution-in-scholarly-communication
Sparc-Japan-Slow-revolution-in-scholarly-communication
 
Social Networking with SEDA
Social Networking with SEDASocial Networking with SEDA
Social Networking with SEDA
 
Public Engagement in the Digital Age and Handling Web Negativity
Public Engagement in the Digital Age and Handling Web NegativityPublic Engagement in the Digital Age and Handling Web Negativity
Public Engagement in the Digital Age and Handling Web Negativity
 
Science & Community Public Engagement Workshop
Science & Community Public Engagement WorkshopScience & Community Public Engagement Workshop
Science & Community Public Engagement Workshop
 
The value of engagement
The value of engagementThe value of engagement
The value of engagement
 
Session 5 keeping up to date
Session 5   keeping up to dateSession 5   keeping up to date
Session 5 keeping up to date
 
The good, the efficient and the open: changing research workflows and the nee...
The good, the efficient and the open: changing research workflows and the nee...The good, the efficient and the open: changing research workflows and the nee...
The good, the efficient and the open: changing research workflows and the nee...
 
M25 2016 Conference Presentation
M25 2016 Conference PresentationM25 2016 Conference Presentation
M25 2016 Conference Presentation
 
Alma Live at Imperial College London
Alma Live at Imperial College LondonAlma Live at Imperial College London
Alma Live at Imperial College London
 
Different Media for communicating Science to different groups
Different Media for communicating Science to different groupsDifferent Media for communicating Science to different groups
Different Media for communicating Science to different groups
 

Ähnlich wie Mining and mapping places with multiple names

Ähnlich wie Mining and mapping places with multiple names (6)

Varvitos
VarvitosVarvitos
Varvitos
 
Health_of_the_Casperkill
Health_of_the_CasperkillHealth_of_the_Casperkill
Health_of_the_Casperkill
 
GLM-Long
GLM-LongGLM-Long
GLM-Long
 
шотландия
шотландияшотландия
шотландия
 
601 l9-dicts+quizrev s10[1]
601 l9-dicts+quizrev s10[1]601 l9-dicts+quizrev s10[1]
601 l9-dicts+quizrev s10[1]
 
Lecture6 radiometricdating
Lecture6 radiometricdatingLecture6 radiometricdating
Lecture6 radiometricdating
 

Mehr von Lancaster University Library

Promoting a culture of Open Research at Lancaster University
Promoting a culture of Open Research at Lancaster UniversityPromoting a culture of Open Research at Lancaster University
Promoting a culture of Open Research at Lancaster UniversityLancaster University Library
 
"We're in the land of poo" - Fertilising your work with knowledge from the field
"We're in the land of poo" - Fertilising your work with knowledge from the field"We're in the land of poo" - Fertilising your work with knowledge from the field
"We're in the land of poo" - Fertilising your work with knowledge from the fieldLancaster University Library
 
Stephen Robinson containers for software preservation
Stephen Robinson containers for software preservationStephen Robinson containers for software preservation
Stephen Robinson containers for software preservationLancaster University Library
 
Kris Geyer retrieving psychological relevant data from smartphones
Kris Geyer retrieving psychological relevant data from smartphonesKris Geyer retrieving psychological relevant data from smartphones
Kris Geyer retrieving psychological relevant data from smartphonesLancaster University Library
 
Running Research as a Service. Implications for Privacy Policies and Ethics
Running Research as a Service. Implications for Privacy Policies and EthicsRunning Research as a Service. Implications for Privacy Policies and Ethics
Running Research as a Service. Implications for Privacy Policies and EthicsLancaster University Library
 
Better data for better justice - Towards data-driven analyses of Family Court...
Better data for better justice - Towards data-driven analyses of Family Court...Better data for better justice - Towards data-driven analyses of Family Court...
Better data for better justice - Towards data-driven analyses of Family Court...Lancaster University Library
 
Sharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and OpportunitiesSharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and OpportunitiesLancaster University Library
 

Mehr von Lancaster University Library (20)

Open Research exercise using Mission Model Canvas
Open Research exercise using Mission Model CanvasOpen Research exercise using Mission Model Canvas
Open Research exercise using Mission Model Canvas
 
Promoting a culture of Open Research at Lancaster University
Promoting a culture of Open Research at Lancaster UniversityPromoting a culture of Open Research at Lancaster University
Promoting a culture of Open Research at Lancaster University
 
PSC2019 - Community Building: How Does It Work?
PSC2019 - Community Building: How Does It Work?PSC2019 - Community Building: How Does It Work?
PSC2019 - Community Building: How Does It Work?
 
"We're in the land of poo" - Fertilising your work with knowledge from the field
"We're in the land of poo" - Fertilising your work with knowledge from the field"We're in the land of poo" - Fertilising your work with knowledge from the field
"We're in the land of poo" - Fertilising your work with knowledge from the field
 
Working with police recorded data
Working with police recorded dataWorking with police recorded data
Working with police recorded data
 
Navigating NHS Administrative Data
Navigating NHS Administrative DataNavigating NHS Administrative Data
Navigating NHS Administrative Data
 
Lancaster 2018-open data
Lancaster 2018-open dataLancaster 2018-open data
Lancaster 2018-open data
 
Data bites
Data bitesData bites
Data bites
 
Documenting Flood Experience
Documenting Flood ExperienceDocumenting Flood Experience
Documenting Flood Experience
 
Stephen Robinson containers for software preservation
Stephen Robinson containers for software preservationStephen Robinson containers for software preservation
Stephen Robinson containers for software preservation
 
Kris Geyer retrieving psychological relevant data from smartphones
Kris Geyer retrieving psychological relevant data from smartphonesKris Geyer retrieving psychological relevant data from smartphones
Kris Geyer retrieving psychological relevant data from smartphones
 
20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong
 
Andrew Moore past-present-potential
Andrew Moore past-present-potentialAndrew Moore past-present-potential
Andrew Moore past-present-potential
 
Barry Rowlingson CHICAS use of git lab
Barry Rowlingson CHICAS use of git labBarry Rowlingson CHICAS use of git lab
Barry Rowlingson CHICAS use of git lab
 
The sensor cloud around us
The sensor cloud around usThe sensor cloud around us
The sensor cloud around us
 
Running Research as a Service. Implications for Privacy Policies and Ethics
Running Research as a Service. Implications for Privacy Policies and EthicsRunning Research as a Service. Implications for Privacy Policies and Ethics
Running Research as a Service. Implications for Privacy Policies and Ethics
 
Security overview at Lancaster University
Security overview at Lancaster UniversitySecurity overview at Lancaster University
Security overview at Lancaster University
 
Better data for better justice - Towards data-driven analyses of Family Court...
Better data for better justice - Towards data-driven analyses of Family Court...Better data for better justice - Towards data-driven analyses of Family Court...
Better data for better justice - Towards data-driven analyses of Family Court...
 
Cloud computing - When is Deletion Deletion?
Cloud computing - When is Deletion Deletion?Cloud computing - When is Deletion Deletion?
Cloud computing - When is Deletion Deletion?
 
Sharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and OpportunitiesSharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and Opportunities
 

Kürzlich hochgeladen

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 

Kürzlich hochgeladen (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 

Mining and mapping places with multiple names

  • 1. Mining and mapping places with multiple names James Butler & Christopher Donaldson Lancaster University
  • 2. 1901 Corpus of Lake District Literature 1688 1789 1837 • 80 texts, comprising more than 1,500,000 words • Mixture of canonical and non- canonical literature about the Lake District, mainly from c18 and c19 (78 out of 80 works) • Mixture of genres, including guidebooks, travelogues, novels, poems, journals, and private letters 34 Texts 650K words 22 Texts 250K words 22 Texts 613K words
  • 3. Sample sentence collocation: beautiful ‘Again entering the boat, we passed up the channel between Lord’s Island the shore, from whence beautiful prospects are obtained of the majestic form of Skiddaw, with the woods of Castlehead and Cockshot Park in the foreground.’ (Edward Baines, A Companion to the Lakes [1829] 121.) ±5 tokens: No place-names identified ±10 tokens: 2 place-names identified – Lord’s Island & Skiddaw Within sentence: 4 place-names identified – Lord’s Island, Skiddaw, Castlehead & Cockshot Park. Average sentence length Lake District corpus = 29.8 words British National Corpus (BNC) = 16 words
  • 4. from C. Grover, et al., ‘Use of the Edinburgh Geoparser for Georeferencing Digitized Historical Collections’, Phil. Trans. R. Soc. A 368 (2010) 3875–89. Diagram of the Edinburgh Geoparser System
  • 5. Example of input/output from the Edinburgh Geoparser System
  • 6. Geo-referenced Data from the Edinburgh Geoparser
  • 8. Bowness: ‘the curved headland’, from ON bogi/OE boga ‘bow’ and ON nes/OE naess ‘headland’ *Variant Historical Spellings: Bownus, Bawnas, Bonas, Bonus, Boulness cf. D. Whaley, A Dictionary of Lake District Place Names (Nottingham: English Place-Name Society, 2006), 42.
  • 9. Some of the common generic gazetteer geo-referenced issues… Spatial misattribution. Onomastic misassumption Incorrect weighting Just for the items that are found!
  • 10. An extract of our custom manually-collected gazetteer for the corpus Unique ID Topog. Cat. Primary Name Secondary Names Regional Placement CONISTON (lake): Thurstan, Coniston Lake, Coniston Water, Thurston, Conistone, Conistone Lake, Cunnistone Lake, Thurston Lake, Coniston Mere, Lake of Coniston, Conis- ton, Conyngs Tun, Conyngeston, Thorstane's watter, Turstinus.
  • 11. Geospatial categories chosen for flexibility and degree of universal referential specificity
  • 12. An extract from the latest iteration of the corpus - allowing referential relationships to be analysed on a whole new level. Lake, Vale, Specific - Farm, Waterfall

Hinweis der Redaktion

  1. Overview of corpus…
  2. Our interest in finding what attributes are given to places mentioned…
  3. The Edinburgh Geoparser: NLP tool on which we’ve relied
  4. What the Geoparser do…
  5. The Geoparser output a bit ropey…
  6. Much correction required..
  7. One of the chief reasons for the poor performance of the geoparser is place-name variation…
  8. Geospatial relationships between environmental types as well as connective strengths between any paired locations.