Slides for HNR2020 Keynote presentation
Abstract:
Digitised sources are a treasure trove for scholars, but accessing the information contained in them is far from trivial. Due to scale, traditional methods are insufficient to analyse the big data coming from these sources. Hence, computational methods look to be the solution. Indeed, computational methods can be utilised to identify and model concepts in large digital datasets, however the nature of these datasets as well as that of humanities research questions requires caution. In particular, the ramifications of time and location on understanding concepts cannot be underestimated.
In this talk, Marieke will present ongoing work on computationally tracing concepts through time and across geography using language and semantic web technology. The work illustrates that seemingly simple concepts (e.g. sugar) prove to be much more complex than expected. We discuss the importance of semantics in helping not only to deal with this complexity but reify it so that it can be interrogated both computationally and via expert analysis.
Slides 5, 8, 11, 12, 15, 16, 17, 18, 19, 20 are based the presentation Tabea Tietz gave for the paper "Challenges of Knowledge Graph Evolution from an NLP Perspective" in the WHiSe Workshop @ ESWC 2020 (2 June 2020).
http://hnr2020.historicalnetworkresearch.org/
2. D I G I TA L H U M A N I T I E S L A B
Overview of this talk
• Big (text) Data & Humanities
• Tracing concepts
• Entity spaces
• New horizons
• Wrapping up
3. D I G I TA L H U M A N I T I E S L A B
Big Data & Humanities
• Digitised archives are enabling new types
of research
• Dutch National Library: 100+ million
newspaper, book & magazine pages
• Chronicling America: 100,000
newspaper pages
• Amsterdam City Archives: 160,000
notary deeds
• Bibliothèque Nationale de Luxembourg:
800,000 pages
• & many more sources
4. D I G I TA L H U M A N I T I E S L A B
Zooming in & Zooming out
• Qualitative methods often filter down
to individual records or pages
• Quantitative methods started
scratching the surface
• KNAW HuC focuses on bridging the
gap between quantitative &
qualitative analyses through
advancing natural language
processing and semantic web
methods
Image source: https://upload.wikimedia.org/wikipedia/commons/b/b5/MediaWiki_flame_graph_screenshot_2014-12-15_22.png
5. D I G I TA L H U M A N I T I E S L A B
Digital Humanities
• Involves the understanding of
these cultural heritage data.
• Methods involving Natural
Language Processing supported
by Knowledge Graphs have
entered the humanities research
community (Meroño-Peñuela et
al.)
Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
6. D I G I TA L H U M A N I T I E S L A B
Who has the biggest sweet tooth?
• Sugar consumption patterns
are difficult to trace
• Historical apple pie recipes can
serve as a proxy
• Apple pastries are common in
many cultures
Marieke van Erp & Ulbe Bosma: Divergent patterns of sugar consumption in the wake of the Industrial Revolution: an analysis on the basis of
apple pie recipes. Forthcoming
7. D I G I TA L H U M A N I T I E S L A B
Analysing historical recipes
• Differences in availability of
digitised sources
• Digitisation artefacts hamper
automatic analysis
• Normalisation of quantities is
needed
• Combine quantitative &
qualitative methods
Marieke van Erp & Ulbe Bosma: Divergent patterns of sugar consumption in the wake of the Industrial Revolution: an analysis on the basis of
apple pie recipes. (Forthcoming)
Image source: https://en.wikipedia.org/wiki/Apple_pie#/media/File:For_to_Make_Tartys_in_Applis_(1381).gif
8. D I G I TA L H U M A N I T I E S L A B
Comparing Ingredients in Dutch and American Apple Pie Recipes
Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
9.
10. D I G I TA L H U M A N I T I E S L A B
Comparing sugar quantities in Dutch, American, French and German apple pie recipes
Marieke van Erp & Ulbe Bosma: Divergent patterns of sugar consumption in the wake of the Industrial Revolution: an analysis on the basis of apple pie recipes. (Forthcoming)
11. D I G I TA L H U M A N I T I E S L A B
What is an apple pie?
• The real world is constantly
changing
• Knowledge that was considered
true at one point in time in a
specific cultural and spa7al
setting may not be true in
another context
• Concepts evolve
Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
12. D I G I TA L H U M A N I T I E S L A B
Cultural Context
● What is considered as true in
one cultural setting may not be
in another.
● Apfelstrudel == apple pie?
Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
13. How can we store this type of information at scale?
14. D I G I TA L H U M A N I T I E S L A B
Concept modelling
• Computer Science: Knowledge
Representation/Semantic Web
• Long history: at least since
Aristotle
• Machine readable knowledge was
Sir Tim Berners-Lee’s intent when
he developed the World Wide Web
• To date, we have several large
scale knowledge graphs such as
DBpedia and Wikidata
Image source: https://upload.wikimedia.org/wikipedia/commons/c/c6/Complexity_vs._orderliness.png
15. D I G I TA L H U M A N I T I E S L A B
Knowledge Graphs
• Represent what we consider
true about parts of the world
• Are created and maintained to
continuously compose
knowledge (Bonatti et al.).
Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
16. D I G I TA L H U M A N I T I E S L A B
But:
• Knowledge Graphs are often
static and only reflect one
snippet of reality
• This static representation of the
real world is a problem when
attempting to understand
historical descriptions of
concepts (Bonatti et al., Tasnim
et al.)
Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
17. D I G I TA L H U M A N I T I E S L A B
Concepts
• Are manifested in our cultures’
norms and values
• Are documented through
photographs, newspapers,
books, music, film,
advertisements.
Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
18. D I G I TA L H U M A N I T I E S L A B
Spatio-temporal context
● Distinguish the spatio-
temporal metadata of the
concept itself and the
metadata of its source
● Trace the evolution of the
concept over time and
geographic regions
Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
19. D I G I TA L H U M A N I T I E S L A B
Units
● Modern units
○ imperial vs. metric system (lbs,
kg)
● Historical units
○ ell, zentner
● Natural language description of
measurements
○ “a load of butter”, “a plate of
apples”
Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
20. D I G I TA L H U M A N I T I E S L A B
Concept modelling
● How broad or narrow should
the ontology be modeled to fit
the concept but also capture
its changes over time?
● What are the properties that
define a concept across the
spatio-temporal and cultural
context?
Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
22. D I G I TA L H U M A N I T I E S L A B
Language & Meaning
• Human language is incredibly flexible and
efficient
• We can use the term ‘sugar’ to refer to
• the sugar industry (a sour day for sugar)
• to particular instances of sugar (shall I
put some sugar in?)
• nutritional information (sugar and fiber
intake)
• commodities (grain and sugar are
produced)
• How can computers make sense of this?
Marieke van Erp & Paul Groth (2020) Towards Entity Spaces. In: Proceedings of The 12th Language Resources and Evaluation
Conference (LREC’2020)
23. D I G I TA L H U M A N I T I E S L A B
Proxy for Entity Spaces
Marieke van Erp & Paul Groth (2020) Towards Entity Spaces. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC’2020)
24. D I G I TA L H U M A N I T I E S L A B
Tolerant Entity Linking
• Not every meaning of an entity
or concept is represented in a
knowledge base
• We argue that a link to an entity
space is better than no link
• ‘good enough
interpretation’ (Poesio et al.)
• Proof of concept shows increase
in recall for 8 out of 13 datasets
Marieke van Erp & Paul Groth (2020) Towards Entity Spaces. In: Proceedings of The 12th Language Resources and Evaluation
Conference (LREC’2020)
25. D I G I TA L H U M A N I T I E S L A B
Next steps
• Extending entity spaces beyond
Wikipedia
• Structuring concepts within
entity spaces
• Add temporal dimension
• Intangible concepts
• Scale up
Marieke van Erp & Paul Groth (2020) Towards Entity Spaces. In: Proceedings of The 12th Language Resources and Evaluation
Conference (LREC’2020)
26. D I G I TA L H U M A N I T I E S L A B
New Horizons
• Complex concepts have
multiple dimensions
• Dimensions may go beyond a
single discipline
• Recognising, modelling & using
concepts and knowledge
graphs require team work
27. D I G I TA L H U M A N I T I E S L A B
Unexpected Crews
• Within the KNAW Humanities
Cluster, we harbour
(computational) linguists,
historians, literature scientists,
ethnologists, developers, network
specialists, digital humanists…
• Different disciplines find each
other on intersection of topics/
data/methods
• Use your network!
28. D I G I TA L H U M A N I T I E S L A B
Wrapping Up
• Text analysis and knowledge
representation are becoming
more important to humanities
research
• Big challenges for complex
information extraction and
modelling
• Interdisciplinary collaboration is
needed
29. http://dhlab.nl
Acknowledgments:
Adina Nerghes, Eleonora Marzi,
Fabio Mariani, Harald Sack,
ISWS Summer School, Lientje
Maas, Mehwish Alam, Melvin
Wevers, Mortaza Alinam, Paul
Groth, Tabea Tietz, Ulbe Bosma
& Wouter van den Berg
30. References
• Tabea Tietz, Mehwish Alam, Harald Sack and Marieke van Erp (2020) Challenges of Knowledge
Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
• Marieke van Erp & Paul Groth (2020) Towards Entity Spaces. In: Proceedings of The 12th Language
Resources and Evaluation Conference (LREC’2020)
• Marieke van Erp & Ulbe Bosma: Divergent patterns of sugar consumption in the wake of the
Industrial Revolution: an analysis on the basis of apple pie recipes. (Forthcoming)
• Piero Andrea Bonatti, Stefan Decker, Axel Polleres and Valentina Presutti (2019) Knowledge Graphs:
New Directions for Knowledge Representation on the Semantic Web. Dagstuhl Seminar 18371).
Dagstuhl Reports 8(9), 29–111 (2019). https://doi.org/10.4230/DagRep.8.9.29
• Albert Meroño-Peñuela, Ashkan Ashkpour, Marieke van Erp, Kees Mandemakers, Leen Breure,
Andrea Scharnhorst, Stefan Schlobach, Frank van Harmelen (2015) Semantic technologies for
historical research: A survey. In: Semantic Web Journal
• Mayesha Tasnim, Diego Collarana, Damien Graux, Fabrizio Orlandi and Maria-Esther Vidal (2019)
Summarizing Entity Temporal Evolution in Knowledge Graphs. In: Companion Proceedings of The
2019 World Wide Web Conference
•