SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Project Overview
JISC/SSHRC Digging into Data Challenge II
Jan 2012 - Dec 2013
Text mining, data extraction and information
visualisation to explore big historical datasets.
Focus on how commodities were traded across
the globe in the 19th century.
Help historians to discover novel patterns and
explore new research questions.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Project Team
Ewan Klein, Bea Alex, Claire Grover, Richard
Tobin: text mining
Colin Coates, Andrew Watson and
Jim Clifford: historical analysis
James Reid, Nicola Osborne : data
management, social media
Aaron Quigley, Uta Hinrichs: information
visualisation
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Traditional Historical
Research
Gillow and the Use of Mahogany in the Eighteenth
Century, Adam Bowett, Regional Furniture, v.XII,
1998.
Global Fats Supply 1894-98
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Document Collections
Collection # of Documents # of Images
House of Commons
Parliamentary Papers
(ProQuest)
118,526 6,448,739
Early Canadiana Online 83,016 3,938,758
Directors’ Letters of
Correspondence (Kew)
14,340 n/a
Confidential Prints (Adam
Matthews)
1,315 140,010
Foreign and
Commonwealth Office
Collection
1,000 41,611
Asia and the West (Gale) 4,725 948,773 (OCRed: 450,841)
Over 10 million document pages,
Over 7 billion word tokens.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
System Architecture
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Mined Information
Example sentence:
Normalised and grounded entities:
commodity: cassia bark [concept: Cinnamomum cassia]
date: 1871 (year=1871)
location: Padang (lat=-0.94924;long=100.35427;country=ID)
location: America (lat=39.76;long=-98.50;country=n/a)
quantity + unit: 6,127 piculs
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Mined Information
Example sentence:
Extracted entity attributes and relations:
origin location: Padang
destination location: America
commodity–date relation: cassia bark – 1871
commodity–location relation: cassia bark – Padang
commodity–location relation: cassia bark – America
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Edinburgh Geoparser
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Text Mining Output
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Over 31 million commodity mentions in 7 billion words.
Over 15 million commodity-location relations.
The 100 most frequent commodities are repeated over
210,000 times on average (68% of mentions),
mentions repeated >=100 (1775) make up 99.8% of all
mentions.
All information stored in Trading Consequences
database (150GB)
Extract of Early Canadiana Online document 9_00952_3, p. vi.
OCR Errors
Extract of Early Canadiana Online document 9_00952_3, p. vi.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Lessons Learned
Importance of two-way collaboration between
technology and humanities expert in digital HSS
projects.
Value of iterative development and rapid prototyping.
Geo-referencing text is very important for historical
analysis.
Most OCR errors are noise in big data but HSS
scholars need to be made more aware of OCR errors
affecting their search results for historical collections.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
Thank You
Contact: balex@inf.ed.ac.uk
Website: http://tradingconsequences.blogs.edina.ac.uk/
Launch of web-based user interface: March 2014.
Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014

Weitere Àhnliche Inhalte

Ähnlich wie Trading Consequences - Bea Alex

Digital History and Big Data: text mining historical documents on trade in th...
Digital History and Big Data: text mining historical documents on trade in th...Digital History and Big Data: text mining historical documents on trade in th...
Digital History and Big Data: text mining historical documents on trade in th...Beatrice Alex
 
Practicum presentation
Practicum presentationPracticum presentation
Practicum presentationumthisisalex
 
Culture hack scotland – handy data guide v1
Culture hack scotland – handy data guide v1Culture hack scotland – handy data guide v1
Culture hack scotland – handy data guide v1festivalslab
 
Digital Resources for the Eighteenth Century
Digital Resources for the Eighteenth CenturyDigital Resources for the Eighteenth Century
Digital Resources for the Eighteenth CenturyAlastair Dunning
 
Research skills in practice - Matthew Stephens
Research skills in practice - Matthew StephensResearch skills in practice - Matthew Stephens
Research skills in practice - Matthew Stephenslearningslnsw
 
An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland dri_ireland
 
Research Skills in Practice - Matthew Stephens
Research Skills in Practice - Matthew StephensResearch Skills in Practice - Matthew Stephens
Research Skills in Practice - Matthew Stephenslearningslnsw
 
Tanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection DatabaseTanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection DatabaseAndrew Prescott
 
Quantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesQuantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesEric Meyer
 
LocScot's Digitisation Day School 18th March 2016
LocScot's Digitisation Day School 18th March 2016LocScot's Digitisation Day School 18th March 2016
LocScot's Digitisation Day School 18th March 2016Fiona Myles
 
Digital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment ScotlandDigital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment ScotlandHistoric Environment Scotland
 
SCA Scotland Forum 210508 Paul Ell
SCA Scotland Forum 210508 Paul EllSCA Scotland Forum 210508 Paul Ell
SCA Scotland Forum 210508 Paul Ellmichellep
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryNora McGregor
 
'Introduction to the concept of Open Access and Digital Preservation'
'Introduction to the concept of  Open Access and Digital Preservation''Introduction to the concept of  Open Access and Digital Preservation'
'Introduction to the concept of Open Access and Digital Preservation'dri_ireland
 
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...Javier Pereda
 
Metadata costs per unit of effort (cpue)
Metadata  costs per unit of effort (cpue)Metadata  costs per unit of effort (cpue)
Metadata costs per unit of effort (cpue)Tom Moritz
 

Ähnlich wie Trading Consequences - Bea Alex (20)

Digital History and Big Data: text mining historical documents on trade in th...
Digital History and Big Data: text mining historical documents on trade in th...Digital History and Big Data: text mining historical documents on trade in th...
Digital History and Big Data: text mining historical documents on trade in th...
 
Practicum presentation
Practicum presentationPracticum presentation
Practicum presentation
 
Culture hack scotland – handy data guide v1
Culture hack scotland – handy data guide v1Culture hack scotland – handy data guide v1
Culture hack scotland – handy data guide v1
 
Digital Resources for the Eighteenth Century
Digital Resources for the Eighteenth CenturyDigital Resources for the Eighteenth Century
Digital Resources for the Eighteenth Century
 
Research skills in practice - Matthew Stephens
Research skills in practice - Matthew StephensResearch skills in practice - Matthew Stephens
Research skills in practice - Matthew Stephens
 
An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland An Introduction to the Digital Repository of Ireland
An Introduction to the Digital Repository of Ireland
 
Research Skills in Practice - Matthew Stephens
Research Skills in Practice - Matthew StephensResearch Skills in Practice - Matthew Stephens
Research Skills in Practice - Matthew Stephens
 
Tanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection DatabaseTanya Szrajber, The British Museum Collection Database
Tanya Szrajber, The British Museum Collection Database
 
Political Arithmetic, Territorial Geometry and Programmed Cities
Political Arithmetic, Territorial Geometry and Programmed CitiesPolitical Arithmetic, Territorial Geometry and Programmed Cities
Political Arithmetic, Territorial Geometry and Programmed Cities
 
101 This is Digital Scholarship 2016
101 This is Digital Scholarship 2016101 This is Digital Scholarship 2016
101 This is Digital Scholarship 2016
 
Quantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archivesQuantifying the impacts of investment in humanities archives
Quantifying the impacts of investment in humanities archives
 
LocScot's Digitisation Day School 18th March 2016
LocScot's Digitisation Day School 18th March 2016LocScot's Digitisation Day School 18th March 2016
LocScot's Digitisation Day School 18th March 2016
 
Digital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment ScotlandDigital Archiving for Archaeological Units at Historic Environment Scotland
Digital Archiving for Archaeological Units at Historic Environment Scotland
 
SCA Scotland Forum 210508 Paul Ell
SCA Scotland Forum 210508 Paul EllSCA Scotland Forum 210508 Paul Ell
SCA Scotland Forum 210508 Paul Ell
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British Library
 
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British LibraryDigital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British Library
 
'Introduction to the concept of Open Access and Digital Preservation'
'Introduction to the concept of  Open Access and Digital Preservation''Introduction to the concept of  Open Access and Digital Preservation'
'Introduction to the concept of Open Access and Digital Preservation'
 
Where is Cultural Heritage in INSPIRE?
Where is Cultural Heritage in INSPIRE?Where is Cultural Heritage in INSPIRE?
Where is Cultural Heritage in INSPIRE?
 
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
Where Do I Stand? Deconstructing Digital Collections [Research] Infrastructur...
 
Metadata costs per unit of effort (cpue)
Metadata  costs per unit of effort (cpue)Metadata  costs per unit of effort (cpue)
Metadata costs per unit of effort (cpue)
 

Mehr von tarastar

Engaging V&A Visitors with Game Design
Engaging V&A Visitors with Game DesignEngaging V&A Visitors with Game Design
Engaging V&A Visitors with Game Designtarastar
 
Engaging the public in tagging and researching the UK's paintings: Two case s...
Engaging the public in tagging and researching the UK's paintings: Two case s...Engaging the public in tagging and researching the UK's paintings: Two case s...
Engaging the public in tagging and researching the UK's paintings: Two case s...tarastar
 
Transcribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of ScotlandTranscribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of Scotlandtarastar
 
Data and Ethics Roundtable
Data and Ethics RoundtableData and Ethics Roundtable
Data and Ethics Roundtabletarastar
 
Studying the Use of Glasgow University's Digital Collections
Studying the Use of Glasgow University's Digital CollectionsStudying the Use of Glasgow University's Digital Collections
Studying the Use of Glasgow University's Digital Collectionstarastar
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigmstarastar
 
Glasgow Life - Glasgow Museums - John Ferry
Glasgow Life - Glasgow Museums - John FerryGlasgow Life - Glasgow Museums - John Ferry
Glasgow Life - Glasgow Museums - John Ferrytarastar
 
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...tarastar
 

Mehr von tarastar (8)

Engaging V&A Visitors with Game Design
Engaging V&A Visitors with Game DesignEngaging V&A Visitors with Game Design
Engaging V&A Visitors with Game Design
 
Engaging the public in tagging and researching the UK's paintings: Two case s...
Engaging the public in tagging and researching the UK's paintings: Two case s...Engaging the public in tagging and researching the UK's paintings: Two case s...
Engaging the public in tagging and researching the UK's paintings: Two case s...
 
Transcribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of ScotlandTranscribe NLS: Crowdsourcing at the National Library of Scotland
Transcribe NLS: Crowdsourcing at the National Library of Scotland
 
Data and Ethics Roundtable
Data and Ethics RoundtableData and Ethics Roundtable
Data and Ethics Roundtable
 
Studying the Use of Glasgow University's Digital Collections
Studying the Use of Glasgow University's Digital CollectionsStudying the Use of Glasgow University's Digital Collections
Studying the Use of Glasgow University's Digital Collections
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigms
 
Glasgow Life - Glasgow Museums - John Ferry
Glasgow Life - Glasgow Museums - John FerryGlasgow Life - Glasgow Museums - John Ferry
Glasgow Life - Glasgow Museums - John Ferry
 
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
Visualising Metaphorical Connections with the Historical Thesaurus - Brian Ai...
 

KĂŒrzlich hochgeladen

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
HỌC TỐT TIáșŸNG ANH 11 THEO CHÆŻÆ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIáșŸT - Cáșą NĂ...
HỌC TỐT TIáșŸNG ANH 11 THEO CHÆŻÆ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIáșŸT - Cáșą NĂ...HỌC TỐT TIáșŸNG ANH 11 THEO CHÆŻÆ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIáșŸT - Cáșą NĂ...
HỌC TỐT TIáșŸNG ANH 11 THEO CHÆŻÆ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIáșŸT - Cáșą NĂ...Nguyen Thanh Tu Collection
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 

KĂŒrzlich hochgeladen (20)

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
HỌC TỐT TIáșŸNG ANH 11 THEO CHÆŻÆ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIáșŸT - Cáșą NĂ...
HỌC TỐT TIáșŸNG ANH 11 THEO CHÆŻÆ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIáșŸT - Cáșą NĂ...HỌC TỐT TIáșŸNG ANH 11 THEO CHÆŻÆ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIáșŸT - Cáșą NĂ...
HỌC TỐT TIáșŸNG ANH 11 THEO CHÆŻÆ NG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIáșŸT - Cáșą NĂ...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 

Trading Consequences - Bea Alex

  • 1. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 2. Project Overview JISC/SSHRC Digging into Data Challenge II Jan 2012 - Dec 2013 Text mining, data extraction and information visualisation to explore big historical datasets. Focus on how commodities were traded across the globe in the 19th century. Help historians to discover novel patterns and explore new research questions. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 3. Project Team Ewan Klein, Bea Alex, Claire Grover, Richard Tobin: text mining Colin Coates, Andrew Watson and Jim Clifford: historical analysis James Reid, Nicola Osborne : data management, social media Aaron Quigley, Uta Hinrichs: information visualisation Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 4. Traditional Historical Research Gillow and the Use of Mahogany in the Eighteenth Century, Adam Bowett, Regional Furniture, v.XII, 1998. Global Fats Supply 1894-98 Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 5. Document Collections Collection # of Documents # of Images House of Commons Parliamentary Papers (ProQuest) 118,526 6,448,739 Early Canadiana Online 83,016 3,938,758 Directors’ Letters of Correspondence (Kew) 14,340 n/a Confidential Prints (Adam Matthews) 1,315 140,010 Foreign and Commonwealth Office Collection 1,000 41,611 Asia and the West (Gale) 4,725 948,773 (OCRed: 450,841) Over 10 million document pages, Over 7 billion word tokens. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 6. System Architecture Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 7. Mined Information Example sentence: Normalised and grounded entities: commodity: cassia bark [concept: Cinnamomum cassia] date: 1871 (year=1871) location: Padang (lat=-0.94924;long=100.35427;country=ID) location: America (lat=39.76;long=-98.50;country=n/a) quantity + unit: 6,127 piculs Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 8. Mined Information Example sentence: Extracted entity attributes and relations: origin location: Padang destination location: America commodity–date relation: cassia bark – 1871 commodity–location relation: cassia bark – Padang commodity–location relation: cassia bark – America Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 9. Edinburgh Geoparser Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 10. Text Mining Output Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014 Over 31 million commodity mentions in 7 billion words. Over 15 million commodity-location relations. The 100 most frequent commodities are repeated over 210,000 times on average (68% of mentions), mentions repeated >=100 (1775) make up 99.8% of all mentions. All information stored in Trading Consequences database (150GB)
  • 11. Extract of Early Canadiana Online document 9_00952_3, p. vi. OCR Errors Extract of Early Canadiana Online document 9_00952_3, p. vi. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 12. Lessons Learned Importance of two-way collaboration between technology and humanities expert in digital HSS projects. Value of iterative development and rapid prototyping. Geo-referencing text is very important for historical analysis. Most OCR errors are noise in big data but HSS scholars need to be made more aware of OCR errors affecting their search results for historical collections. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014
  • 13. Thank You Contact: balex@inf.ed.ac.uk Website: http://tradingconsequences.blogs.edina.ac.uk/ Launch of web-based user interface: March 2014. Scotland’s National Collections and the Digital Humanities, Edinburgh, 14/02/2014