1. AddressingHistory –
Lessons and Messages
Stuart Macdonald
Associate Data Librarian
EDINA & Data Library
University of Edinburgh
stuart.macdonald@ed.ac.uk
Association of American Geographers Annual Meeting - Working Digitally with Historical Maps, New York Public Library, 25 Feb. 2012
2. Phase 1
JISC-funded Community Content
project
6 months (April 2010 – September
2010)
Partner with National Library of
Scotland
Advisory Board
3. To create an online crowdsourcing tool which will combine
data from digitised historical Scottish Post Office
Directories (PODs) with contemporaneous historical maps
Similar to Australian Historic Newspapers project
provided by National Library of Australia where members
of the public correct and improve OCR’d text of old
newspapers - http://www.nla.gov.au/ndp/project_details/
JISC-funded Great War Archive (Univ. Oxford) that asked
members of the general public to digitise any First World
War artefacts and upload them to a purpose built website.
4. PODs offer a fine-grained spatial
and temporal view on social,
economic and demographic
circumstances
They provide residential names,
occupations, and addresses.
Each contain 3 sub-directories:
general, street, and trades
May also contain misc. trade
directories e.g. banking,
education, law, insurance,
medical
5. Phase 1 focused on 3 vols. of
Edinburgh PODs: 1784-5; 1865;
1905-6
Historic Scottish maps geo-
referenced by NLS
PODs digitised by NLS in
conjunction with the Internet
Archive
c.700 PODs (1773 to 1911)
covering 28 of Scotland's towns
and counties now online
Public domain (CC BY-NC-SA 2.5)
6. Using Open Layers as web-
based mapping client
Tool allows ‘the crowd’ to
georeference a POD entry by
moving a ‘map pin’ on a
digitised map thus facilitating
the addition of an grid reference
to the OCR’d POD held in XML
format in a database structure
(PostgreSQL)
API available allowing web
developers access to the raw
data in multiple output formats
(JSON, XML, CSV)
Geo-coding of POD addresses
parsed against Google
geocoder
7. Interface had to be easy-to-use for a
range of users
Robust and scalable to accommodate
c.700 digitised Scottish PODs
Mechanism to check user-generated
content such as geo-references,
name or address edits/annotations
Crowdsourcing of geo-coded grid
references
View original scanned directory page
Amplification of tool and API via
Social Media Channels – Facebook,
Twitter, Blog, Flickr, YouTube
8. Search people, place, profession
Historic Map
overlay
selected
Record edits
by the ‘crowd’
View original
Search
results
Download
options
9. Phase 2 sought to develop functionality to resonate with JISC’s
vision to build sustainable and durable deliverables and to
compliment phase 1 by broadening both geographic and temporal
coverage
Feb. – Sept. 2011 (EDINA
Sustainability Funding)
New content (Aberdeen, Glasgow,
Edinburgh for 1881 & 1891
Re-evaluate (and enhance) parsing
tool performance
Old parser :
•Exact geotag – 60%
•Professions – 25%
New parser (no configuration file):
•Exact geotag – 72%
•Professions – 76%
New parser (with configuration file)
•Exact geotag – 88%
•Professions – 82%
10. Phase 2
Other additional features include:
• Spatial searching (bounding box)
• Associate map pin with search
results
• Search across multiple address
• Aid searching by applying Standard
Industrial Classification (SIC) codes
to Professions
11. Augmented Reality
An AddressingHistory layer has
been created and published for
use with the ‘Layar’ Application
for either iPhone or Android
Geo-referenced Points of
Interest (POIs) are uploaded
into the BuildAR CMS
POIs (e.g. each profession or
SIC Code) have an image
associated with it
The App allows users to compare their current location (from phone)
with the geo-referenced AH records in order to establish which names
and professions are located in the local vicinity
12.
13. Lessons Learned
Critical mass – does geographic & temporal coverage attract and
engage the crowd?
Separate out parsing from interface and back end
storage - to allow any refinements to be implemented without
impacting on tool and API
Externalise ‘configuration’ files – editable XML-based files
that accommodate repeated OCR and content inconsistencies –
these are run in conjunction with the POD parser to refine the parsed
content hence
improved searching
Parsing and refining process is almost unending -
Identify what is realistically achievable with available resources
and time constraints
- i.e. perform proper requirements analysis
Consult with others - involved in digitising and parsing
city/town/post office directories e.g. Richard Marciano
(UNC Chapel-Hill), Matt Knutzen (NYPL)
14. Sustainability
Given the broad applicability of the
resource a range of communities may be
interested in the longer term curation of
the project tools e.g. the Open Street Map
community, NLS
Evaluation of possible business models
for sustainability:
• revenue generation via online donations
• subscription model (e.g. per annum, per
month, per use)
• ‘freemium model’ (e.g. free API
download of a certain number of records
with payment for further downloads)
• academic advertising.
15. Second last slide…
New content and features to be made available start of
March 2012
Gauging the success of the project goes beyond the
delivery of engaging and innovative online tools. It will
be ultimately be measured by continual and extended
use within the wider community.
16. Website:
http://addressinghistory.edina.ac.uk/
THANKING YOU!
Credits:
Image by aroid - http://www.flickr.com/photos/selago/34843234/ - CC BY 2.0
Image by konqui - http://www.flickr.com/photos/konqui/2301314089/ - CC BY-NC 2.0
Image by mosilager - http://www.flickr.com/photos/mosilager/2260598271/ - CC BY-NC-SA 2.0
Image by racoles - http://www.flickr.com/photos/racoles/5719938981/ - CC BY-NC 2.0
Image by James Bowe - http://www.flickr.com/photos/jamesrbowe/3351247547/ (CC BY 2.0)
Image by yelnoc - http://www.flickr.com/photos/yelnoc/361303918/ - CC BY-NC-SA 2.0
Image by epSos.de - http://www.flickr.com/photos/epsos/3384297473/ - CC BY 2.0
Image by bek30 - http://www.flickr.com/photos/bek30/6107854810/ - CC BY-NC 2.0
Image by karen horton - http://www.flickr.com/photos/karenhorton/3261277303/ - CC BY-NC 2.0
Image by lofaesofa - http://www.flickr.com/photos/lofaesofa/227019975/ - CC BY 2.0
Image by Psycho Delia - http://www.flickr.com/photos/24557420@N05/5588473657/ - CC BY-NC
2.0
Image by wdj(0) - http://www.flickr .com/photos/davidjoyner/534893725/ - CC BY-SA 2.0
Image by Symic - http://www.flickr.com/photos/symic/2870349309/ - CC BY-SA 2.0
Image by ~milj - http://www.flickr.com/photos/21989292@N07/4938052014/ - CC BY-NC-SA 2.0
Acknowledgements:
JISC - http://www.jisc.ac.uk/
NLS Geo-referenced maps and applications - http://geo.nls.uk/
Visualising Urban Geographies (VUG) project – http://geo.nls.uk/urbhist/
Edinburgh City Libraries – http://www.edinburgh.gov.uk/libraries/
Hinweis der Redaktion
UK Digitisation programme Developing Community Content strand of the JISC Digitisation and e-Content programme Welsh Voices of the Great War in Wales – Cardiff University
Online engagement tool based on web 2.0 principles Galaxy Zoo is an online astronomy project which invites members of the public to assist in classifying over sixty million galaxies Old Weather is a web-based effort to transcribe weather observations made by Royal Navy ships around the time of World War I The Great War Archive , was a 2008 project led by the University of Oxford that asked members of the general public to digitise any artefacts they held relating to the First World War and upload them to a purpose built website.
Bank directory listing banks and banking companies Educational directory listing educational institutions and teachers by their subject Law directory listing juridical institutions and practitioners Medical directory listing medical and surgical institutions and practitioners Insurance directory listing insurance companies Rich source of adverts which give an idea as to lifestyles, spending habits, Of interest to genealogists, local or family historians, academic researchers
44,000 historical maps of Scotland – county maps, town plans, admiralty charts (coastline), military maps, Historic OS series Plus 600 of Edinburgh and its environs Images, OCR text Creative Commons licences - IPR free - Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland Internet Archive team based at the National Library of Scotland for scanning the Scottish Post office Directories used in the project.
Registered users Google Geocoding API assigns a georeference with scales of accuracy – from town to street to intersection to building
Identify and fix line returns, identify which fields belong to which column, Fix OCR errors – list of search patters and their replace strings (for names, professions, addresses XML files) Name stop words to remove commercial entries
POI’s in this case are POD entries – namely Address, Name and profession
Critical Mass – it could be argued that the geographical & temporal coverage provided by AH doesn’t provide the critical mass of content required to attract and engage ‘the crowd’? This is borne out in our usage stats (and registered users) – which whilst not small were modest