SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Project Overview
JISC/SSHRC Digging into Data Challenge II
Jan 2012 - Dec 2013
Text mining, data extraction and information
visualisation to explore big historical datasets.
Focus on how commodities were traded across
the globe in the 19th century.
Help historians to discover novel patterns and
explore new research questions.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Project Team
Ewan Klein, Bea Alex, Claire Grover, Richard
Tobin: text mining
Colin Coates, Andrew Watson and
Jim Clifford: historical analysis
James Reid, Nicola Osborne : data
management, social media
Aaron Quigley, Uta Hinrichs: information
visualisation
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Traditional Historical
Research
Gillow and the Use of Mahogany in the Eighteenth
Century, Adam Bowett, Regional Furniture, v.XII,
1998.
Global Fats Supply 1894-98
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Document Collections
Collection # of Documents # of Images
House of Commons
Parliamentary Papers
(ProQuest)
118,526 6,448,739
Early Canadiana Online 83,016 3,938,758
Directors’ Letters of
Correspondence (Kew)
14,340 n/a
Confidential Prints (Adam
Matthews)
1,315 140,010
Foreign and
Commonwealth Office
Collection
1,000 41,611
Asia and the West (Gale) 4,725 948,773 (OCRed: 450,841)
Over 10 million document pages,
Over 7 billion word tokens.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
System Architecture
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Mined Information
Example sentence:
Normalised and grounded entities:
commodity: cassia bark [concept: Cinnamomum cassia]
date: 1871 (year=1871)
location: Padang (lat=-0.94924;long=100.35427;country=ID)
location: America (lat=39.76;long=-98.50;country=n/a)
quantity + unit: 6,127 piculs
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Mined Information
Example sentence:
Extracted entity attributes and relations:
origin location: Padang
destination location: America
commodity–date relation: cassia bark – 1871
commodity–location relation: cassia bark – Padang
commodity–location relation: cassia bark – America
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Edinburgh Geoparser
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Text Mining Output
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Over 31 million commodity mentions in 7 billion words.
Over 15 million commodity-location relations.
The 100 most frequent commodities are repeated over
210,000 times on average (68% of mentions),
mentions repeated >=100 (1775) make up 99.8% of all
mentions.
All information stored in Trading Consequences
database (150GB)
Extract of Early Canadiana Online document 9_00952_3, p. vi.
OCR Errors
Extract of Early Canadiana Online document 9_00952_3, p. vi.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Lessons Learned
Importance of two-way collaboration between
technology and humanities expert in digital HSS
projects.
Value of iterative development and rapid prototyping.
Geo-referencing text is very important for historical
analysis.
Most OCR errors are noise in big data but HSS
scholars need to be made more aware of OCR errors
affecting their search results for historical collections.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Thank You
Contact: balex@inf.ed.ac.uk
Website: http://tradingconsequences.blogs.edina.ac.uk/
Launch of web-based user interface: March 2014.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014

Weitere ähnliche Inhalte

Ähnlich wie Trading Consequences - Bea Alex

Digital History and Big Data: text mining historical documents on trade in th...
Digital History and Big Data: text mining historical documents on trade in th...Digital History and Big Data: text mining historical documents on trade in th...
Digital History and Big Data: text mining historical documents on trade in th...Beatrice Alex
 
Practicum presentation
Practicum presentationPracticum presentation
Practicum presentationumthisisalex
 
Culture hack scotland – handy data guide v1
Culture hack scotland – handy data guide v1Culture hack scotland – handy data guide v1
Culture hack scotland – handy data guide v1festivalslab
 
Digital Resources for the Eighteenth Century
Digital Resources for the Eighteenth CenturyDigital Resources for the Eighteenth Century
Digital Resources for the Eighteenth CenturyAlastair Dunning
 
Research skills in practice - Matthew Stephens
Research skills in practice - Matthew StephensResearch skills in practice - Matthew Stephens
Research skills in practice - Matthew Stephenslearningslnsw
 
An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland dri_ireland
 
Research Skills in Practice - Matthew Stephens
Research Skills in Practice - Matthew StephensResearch Skills in Practice - Matthew Stephens
Research Skills in Practice - Matthew Stephenslearningslnsw
 
Tanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection DatabaseTanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection DatabaseAndrew Prescott
 
Quantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesQuantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesEric Meyer
 
LocScot's Digitisation Day School 18th March 2016
LocScot's Digitisation Day School 18th March 2016LocScot's Digitisation Day School 18th March 2016
LocScot's Digitisation Day School 18th March 2016Fiona Myles
 
Digital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment ScotlandDigital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment ScotlandHistoric Environment Scotland
 
SCA Scotland Forum 210508 Paul Ell
SCA Scotland Forum 210508 Paul EllSCA Scotland Forum 210508 Paul Ell
SCA Scotland Forum 210508 Paul Ellmichellep
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryNora McGregor
 
'Introduction to the concept of Open Access and Digital Preservation'
'Introduction to the concept of  Open Access and Digital Preservation''Introduction to the concept of  Open Access and Digital Preservation'
'Introduction to the concept of Open Access and Digital Preservation'dri_ireland
 
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...Javier Pereda
 
Metadata costs per unit of effort (cpue)
Metadata  costs per unit of effort (cpue)Metadata  costs per unit of effort (cpue)
Metadata costs per unit of effort (cpue)Tom Moritz
 

Ähnlich wie Trading Consequences - Bea Alex (20)

Digital History and Big Data: text mining historical documents on trade in th...
Digital History and Big Data: text mining historical documents on trade in th...Digital History and Big Data: text mining historical documents on trade in th...
Digital History and Big Data: text mining historical documents on trade in th...
 
Practicum presentation
Practicum presentationPracticum presentation
Practicum presentation
 
Culture hack scotland – handy data guide v1
Culture hack scotland – handy data guide v1Culture hack scotland – handy data guide v1
Culture hack scotland – handy data guide v1
 
Digital Resources for the Eighteenth Century
Digital Resources for the Eighteenth CenturyDigital Resources for the Eighteenth Century
Digital Resources for the Eighteenth Century
 
Research skills in practice - Matthew Stephens
Research skills in practice - Matthew StephensResearch skills in practice - Matthew Stephens
Research skills in practice - Matthew Stephens
 
An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland
 
Research Skills in Practice - Matthew Stephens
Research Skills in Practice - Matthew StephensResearch Skills in Practice - Matthew Stephens
Research Skills in Practice - Matthew Stephens
 
Tanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection DatabaseTanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection Database
 
Political Arithmetic, Territorial Geometry and Programmed Cities
Political Arithmetic, Territorial Geometry and Programmed CitiesPolitical Arithmetic, Territorial Geometry and Programmed Cities
Political Arithmetic, Territorial Geometry and Programmed Cities
 
101 This is Digital Scholarship 2016
101 This is Digital Scholarship 2016101 This is Digital Scholarship 2016
101 This is Digital Scholarship 2016
 
Quantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesQuantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archives
 
LocScot's Digitisation Day School 18th March 2016
LocScot's Digitisation Day School 18th March 2016LocScot's Digitisation Day School 18th March 2016
LocScot's Digitisation Day School 18th March 2016
 
Digital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment ScotlandDigital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment Scotland
 
SCA Scotland Forum 210508 Paul Ell
SCA Scotland Forum 210508 Paul EllSCA Scotland Forum 210508 Paul Ell
SCA Scotland Forum 210508 Paul Ell
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British Library
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British Library
 
'Introduction to the concept of Open Access and Digital Preservation'
'Introduction to the concept of  Open Access and Digital Preservation''Introduction to the concept of  Open Access and Digital Preservation'
'Introduction to the concept of Open Access and Digital Preservation'
 
Where is Cultural Heritage in INSPIRE?
Where is Cultural Heritage in INSPIRE?Where is Cultural Heritage in INSPIRE?
Where is Cultural Heritage in INSPIRE?
 
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
 
Metadata costs per unit of effort (cpue)
Metadata  costs per unit of effort (cpue)Metadata  costs per unit of effort (cpue)
Metadata costs per unit of effort (cpue)
 

Mehr von tarastar

Engaging V&A Visitors with Game Design
Engaging V&A Visitors with Game DesignEngaging V&A Visitors with Game Design
Engaging V&A Visitors with Game Designtarastar
 
Engaging the public in tagging and researching the UK's paintings: Two case s...
Engaging the public in tagging and researching the UK's paintings: Two case s...Engaging the public in tagging and researching the UK's paintings: Two case s...
Engaging the public in tagging and researching the UK's paintings: Two case s...tarastar
 
Transcribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of ScotlandTranscribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of Scotlandtarastar
 
Data and Ethics Roundtable
Data and Ethics RoundtableData and Ethics Roundtable
Data and Ethics Roundtabletarastar
 
Studying the Use of Glasgow University's Digital Collections
Studying the Use of Glasgow University's Digital CollectionsStudying the Use of Glasgow University's Digital Collections
Studying the Use of Glasgow University's Digital Collectionstarastar
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigmstarastar
 
Glasgow Life - Glasgow Museums - John Ferry
Glasgow Life - Glasgow Museums - John FerryGlasgow Life - Glasgow Museums - John Ferry
Glasgow Life - Glasgow Museums - John Ferrytarastar
 
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...tarastar
 

Mehr von tarastar (8)

Engaging V&A Visitors with Game Design
Engaging V&A Visitors with Game DesignEngaging V&A Visitors with Game Design
Engaging V&A Visitors with Game Design
 
Engaging the public in tagging and researching the UK's paintings: Two case s...
Engaging the public in tagging and researching the UK's paintings: Two case s...Engaging the public in tagging and researching the UK's paintings: Two case s...
Engaging the public in tagging and researching the UK's paintings: Two case s...
 
Transcribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of ScotlandTranscribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of Scotland
 
Data and Ethics Roundtable
Data and Ethics RoundtableData and Ethics Roundtable
Data and Ethics Roundtable
 
Studying the Use of Glasgow University's Digital Collections
Studying the Use of Glasgow University's Digital CollectionsStudying the Use of Glasgow University's Digital Collections
Studying the Use of Glasgow University's Digital Collections
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigms
 
Glasgow Life - Glasgow Museums - John Ferry
Glasgow Life - Glasgow Museums - John FerryGlasgow Life - Glasgow Museums - John Ferry
Glasgow Life - Glasgow Museums - John Ferry
 
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
 

Kürzlich hochgeladen

Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxDr. Santhosh Kumar. N
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational PhilosophyShuvankar Madhu
 

Kürzlich hochgeladen (20)

Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptx
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational Philosophy
 

Trading Consequences - Bea Alex

  • 1. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 2. Project Overview JISC/SSHRC Digging into Data Challenge II Jan 2012 - Dec 2013 Text mining, data extraction and information visualisation to explore big historical datasets. Focus on how commodities were traded across the globe in the 19th century. Help historians to discover novel patterns and explore new research questions. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 3. Project Team Ewan Klein, Bea Alex, Claire Grover, Richard Tobin: text mining Colin Coates, Andrew Watson and Jim Clifford: historical analysis James Reid, Nicola Osborne : data management, social media Aaron Quigley, Uta Hinrichs: information visualisation Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 4. Traditional Historical Research Gillow and the Use of Mahogany in the Eighteenth Century, Adam Bowett, Regional Furniture, v.XII, 1998. Global Fats Supply 1894-98 Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 5. Document Collections Collection # of Documents # of Images House of Commons Parliamentary Papers (ProQuest) 118,526 6,448,739 Early Canadiana Online 83,016 3,938,758 Directors’ Letters of Correspondence (Kew) 14,340 n/a Confidential Prints (Adam Matthews) 1,315 140,010 Foreign and Commonwealth Office Collection 1,000 41,611 Asia and the West (Gale) 4,725 948,773 (OCRed: 450,841) Over 10 million document pages, Over 7 billion word tokens. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 6. System Architecture Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 7. Mined Information Example sentence: Normalised and grounded entities: commodity: cassia bark [concept: Cinnamomum cassia] date: 1871 (year=1871) location: Padang (lat=-0.94924;long=100.35427;country=ID) location: America (lat=39.76;long=-98.50;country=n/a) quantity + unit: 6,127 piculs Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 8. Mined Information Example sentence: Extracted entity attributes and relations: origin location: Padang destination location: America commodity–date relation: cassia bark – 1871 commodity–location relation: cassia bark – Padang commodity–location relation: cassia bark – America Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 9. Edinburgh Geoparser Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 10. Text Mining Output Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014 Over 31 million commodity mentions in 7 billion words. Over 15 million commodity-location relations. The 100 most frequent commodities are repeated over 210,000 times on average (68% of mentions), mentions repeated >=100 (1775) make up 99.8% of all mentions. All information stored in Trading Consequences database (150GB)
  • 11. Extract of Early Canadiana Online document 9_00952_3, p. vi. OCR Errors Extract of Early Canadiana Online document 9_00952_3, p. vi. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 12. Lessons Learned Importance of two-way collaboration between technology and humanities expert in digital HSS projects. Value of iterative development and rapid prototyping. Geo-referencing text is very important for historical analysis. Most OCR errors are noise in big data but HSS scholars need to be made more aware of OCR errors affecting their search results for historical collections. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 13. Thank You Contact: balex@inf.ed.ac.uk Website: http://tradingconsequences.blogs.edina.ac.uk/ Launch of web-based user interface: March 2014. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014