Digital History Seminar and Archives and Society Seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/
4.18.24 Movement Legacies, Reflection, and Review.pptx
Ihr june15-evans
1. NLP and Data Mining:
From Chartex to Traces Through Time
and beyond
Dr Roger Evans
Natural Language Technology Group &
Cultural Informatics Research Group
University of Brighton
2. One man, two guvnors
ChartEx TTT
‘Deep’
processing
3. Two men, two guvnors
ChartEx TTT
Natural
language
processing
Data mining
4. Two men, two guvnors
ChartEx TTT
Natural
language
processing
Data mining
Brighton
Leiden
5. ChartEx Architecture
1000’s of charters
Virtual
workbench
Data
mining
Natural
language
processing
DM
development
NLP
development
5-10
charters
Markup
scheme
Expert
elicitation
100-200 Charters Marked-up charters
Manual
markup
ChartEx
repository
VWB
development
VWB
requirements
Repository
development
6. ChartEx Architecture
1000’s of charters
Virtual
workbench
Data
mining
Natural
language
processing
DM
development
NLP
development
5-10
charters
Markup
scheme
Expert
elicitation
100-200 Charters Marked-up charters
Manual
markup
ChartEx
repository
VWB
development
VWB
requirements
Repository
development
Runtime
architecture
13. What can Computer Science do?
• State of the art is broadly based on statistics
• Answers are always only approximate
• Different kinds of approximation:
• Precision – focus on making sure answers are right (but
may miss some)
• Recall - focus on getting as many right answers as
possible (but may give some wrong answers too)
15. What does Digital Humanities want?
• Perfect results?
• How do you respond if we say we can’t do that?
• Control over tradeoff?
• How easy is it to understand what control you have?
• Does this help you interpret the results you get?
16. Where are we now, and where
are we going?
• Human in the loop
• Tools always require human interpretation of results
• Is this really just a cop out by computer scientists?
• Or just a pragmatic expression of the state of the art?
• Deskilling
• Do we really mean an expert in the loop?
• Conversations
• Are we really only just at the point of negotiating what is
possible and what is required?