SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Capturing the semantics of documentary
evidence for humanities research
DBpedia Day, NLP & DBpedia
09 / 09 / 2021,
Semantics 2021
Amsterdam (& online)
Enrico Daga
The Open University
@enridaga | www.enridaga.net
Motivation
The identification and cataloguing of documentary evidence
from textual corpora is an important part of empirical research in the
humanities (e.g. historiographic methodology).
Semantic databases of documentary evidence: a recent trend
• The Listening Experience Database Project (LED) (over 10.000 unique
experiences) - https://led.kmi.open.ac.uk/ (2 UK AHRC 2012-2019)
• READ-IT: Reading Europe Advanced Data Investigation Tool - https://
readit-project.eu/ (2018-2020)
• Polifonia: Knowledge Graph of Musical Cultural Heritage, with pilots
focusing on scholars in the musical heritage domain - http://polifonia-
project.eu (2021-2023)
Two problems:
• Identification -> find evidence in texts
• Cataloguing -> curate a database of evidence
Identification
The task of identifying pieces of evidence in books is a manual work, which
may include relying on free text search tools (e.g. PDF viewers)
Problems: the activity (a) requires effort / time, (b) is not systematic, (c) is
prone to errors, and (d) the methodology is (often) not documented
"Capturing themed evidence, a hybrid approach."
Enrico Daga and Enrico Motta
In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019.
• Focus on Identification
• We coin the expression themed evidence, to refer to (direct or indirect)
traces of a fact or situation relevant to a theme of interest and study the
problem of identifying them in texts.
• The task of identifying themed evidence is at the intersection between
topical text classification (finding texts relevant to a certain theme) and
event retrieval (find events mentioned in texts).
• Not all topical texts are themed evidence and the nature of the event itself
is often assumed, implicit, and left to the reader
Paper: http://oro.open.ac.uk/67961/
Finding Listening Experiences (theme: music)
• RECMUS-619, positive: Introduced to the Anacreontic Society, consisting of
amateurs who perform admirably the best orchestral works. The usual supper
followed. After propitiating me with a trio from ’Cosi Fan Tutte’, they drew me to
the piano.
• MASONB-31, positive: In the evening we went to Rev. Baptist Noel’s chapel,
where one is always sure of edification from the sermon if not from the psalms.
• MASONB-88, negative: Flags and pendants were suspended from the
windows, [. . . ] the colors of the German States were waving harmoniously
together, and the banners of the Fine Arts, with appropriate inscriptions,
particularly those of music, poetry and painting, were especially honored, and
floated triumphant amidst the standards of electorates, dukedoms, and
kingdoms.
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach."
In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019.
Entity boost. To promote terms mapped to entities
PoS Filter: demote terms other then verbs and
nouns, to privilege factual statements
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach."
In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019.
1) Statistical Relatedness Analysis
2) Themed entity detection
3) Hybridisation
RECMUS-619, positive: Introduced to the
Anacreontic Society, consisting of
amateurs who perform admirably the best
orchestral works. The usual supper
followed. After propitiating me with a trio
from 'Cosi Fan Tutte', they drew me to the
piano.
http://dbpedia.org/resource/Anacreontic_Society
http://dbpedia.org/resource/Orchestra
http://dbpedia.org/resource/Trio_(music)
http://dbpedia.org/resource/Così_fan_tutte
http://dbpedia.org/resource/Piano
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach."
In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019.
http://led.kmi.open.ac.uk/discovery/findler
MASONB-31, positive: In the
evening we went to Rev. Baptist
Noel's chapel, where one is
always sure of edification from the
sermon if not from the psalms.
http://dbpedia.org/resource/
Evening_Prayer_(Anglican)
http://dbpedia.org/resource/Psalms
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach."
In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019.
MASONB-88, negative: Flags and
pendants were suspended from the
windows, [...] the colours of the
German States were waving
harmoniously together, and the
banners of the Fine Arts, with
appropriate inscriptions, particularly
those of music, poetry and painting,
were especially honored, and ︎oated
triumphant amidst the standards of
electorates, dukedoms, and
kingdoms.
http://dbpedia.org/resource/Music
Evaluation
The results are very good: 87% F-Measure & Accuracy
Baseline methods:
• Fo: Random Forest Classifier high precision, low recall, accuracy slightly
above random (on training/test, it performed 80% accuracy:: robust GS!!!)
• ST: Statistical // a dictionary from Gutenberg’s Music shelf // AVG TF/IDF
Variants on our method:
• Em: Statistical relatedness component only (Embeddings)
• En: Themed entity detection component (Entity) slightly above random:
gold standard is pessimistic / robust!!!
• Em+F: Statistical relatedness + PoS Filter (Embeddings - Filtered)
• Hy-F: No filter, only entity boost (Hybrid - Unfiltered) Without applying
noise correction (POS filter), precision is generally lower; shows the impact
of entity detection on recall
• Hy: best of both worlds. Substantial agreement with annotators (Cohen’s
K)
Our method on an alternative case study:
• Hy/R: Our Hybrid approach on the Reading Experience Database (to
test portability). Core concept: book[n] and core entity: dbc:Literature .
The approach is applicable to other domains with small configuration
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach."
In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019.
Cataloguing
“Challenging knowledge extraction to support the curation
of documentary evidence in the humanities. “
Enrico Daga and Enrico Motta
In: Third International Workshop on Capturing Scientific Knowledge (Sciknow). @K-CAP 2019
• Bet: metadata curation could be supported by Knowledge Extraction (KE)
• “Slot filling”
• Approaches in the literature vary in task / scope:
• (Named) Entity Recognition and Classification
• Entity Linking: encyclopedic (DBpedia, WikiData), domain specific (Gazetteers)
• Relation Extraction (e.g. listener of, in place)
• Event extraction (e.g. Performance)
• Semantic Role Labelling, Machine reading, …
• Assumption: the information is IN the text. Is that a valid assumption?
Paper: http://oro.open.ac.uk/67961/
Example #1
"I then went to Amsterdam to conduct Oedipus at the
Concertgebouw, which was celebrating its fortieth
anniversary by a series of sumptuous musical
productions. The fine Concertgebouw orchestra,
always at the same high level, the magnificent male
choruses from the Royal Apollo Society, soloists of
the first rank - among them Mme Hélène Sadoven as
Jocasta, Louis van Tulder as Oedipus, and Paul Huf,
an excellent reader - and the way in which my work
was received by the public, have left a particularly
precious memory that I recall with much enjoyment."
listener: Igor Strawinsky
time: in the beginning of 1928
place: Amsterdam
opera: Oedipus Rex
/by: Igor Strawinsky
performer: Concertgebouw orch.
environment: Public
Igor Stravinksy
An Autobiography (1936), p. 139.
https://led.kmi.open.ac.uk/entity/lexp/1435674909834
Daga, Enrico and Motta, Enrico (2019). Challenging knowledge extraction to support the curation of documentary evidence in the humanities.
In: Third International Workshop on Capturing Scientific Knowledge (Sciknow). Collocated with the K-CAP conference.
Example #2
"Music is certainly a pleasure that may be
reckoned intellectual, and we shall never again
have it in the perfection it is this year, because
Mr. Handel will not compose any more!
Oratorios begin next week, to my great joy, for
they are the highest entertainment to me."
listener: Mrs Delany
time: March, 1737
place: London
opera: Operas and Oratorios
/by: G. F. Handel
environment: Public
From: Mary Granville, and Augusta Hall (ed.),
Autobiography and Correspondence of Mary
Granville, Mrs Delany: with interesting
Reminiscences of King George the Third and Queen
Charlotte, volume 1 (London, 1861), p. 594.
https://led.kmi.open.ac.uk/entity/lexp/1444424772006
Daga, Enrico and Motta, Enrico (2019). Challenging knowledge extraction to support the curation of documentary evidence in the humanities.
In: Third International Workshop on Capturing Scientific Knowledge (Sciknow). Collocated with the K-CAP conference.
Experiments
• Focus on Entity Recognition: Listener & Place
• Scope: 7.3% of the LED with sources available (archive.org) and including
DBpedia entities as place or agent, 690 excerpts from 26 books.
1. Find the position of the evidence text back in the original source
2. Check where the DBpedia entity (listener or place) is mentioned
• Details of the experiments are in the paper
Daga, Enrico and Motta, Enrico (2019). Challenging knowledge extraction to support the curation of documentary evidence in the humanities.
In: Third International Workshop on Capturing Scientific Knowledge (Sciknow). Collocated with the K-CAP conference.
Analysis
• Q1 - in the excerpt? The place is mentioned in the excerpt in
25.9% cases. The listener only in 13.4%.
• Q2 - near the excerpt? Only 10% of the times the place mention
is less than 5 paragraphs from the excerpt. The agent, in 4% of
the cases.
• Q3 - in the source? 83.2% of the times the place is mentioned at
least once in the source. In 11.4% the place hasn’t been found.
• Q4 - in the meta? 64.8% of the listeners are also the authors of
the text - 5874 cases in LED.
Distance of entity (in n of paragraphs)
Daga, Enrico and Motta, Enrico (2019). Challenging knowledge extraction to support the curation of documentary evidence in the humanities.
In: Third International Workshop on Capturing Scientific Knowledge (Sciknow). Collocated with the K-CAP conference.
Polifonia | 2020
16
• Implicit information, based on inference
requiring expertise (e.g. Mr Handel is G.F
Handel, Oedipus is “Oedipus Rex”)
• The role of contextual knowledge is key to
• (1) identify the entities (e.g. metadata);
• (2) common sense reasoning (“the next
year”, "in the beginning of 1928")
• Entities can exist in distributed, heterogeneous
resources (encyclopaedic KBs, domain-specific
taxonomies, gazetteers, …)
• Machine reading generates an ontology
formalising the discourse in the text, reducing the
task to one of ontology alignment (not a
simplification!)
• AI / Knowledge Extraction research is often
focused on common sense & encyclopaedic
knowledge
• Documentary evidence is heavily domain-
specific
• Problem: humanities scholars coin novel
concepts, e.g. LED, READ-IT
• Sitting Experience in Portraiture History (OU
Arts History PhD)
• Polifonia / CHILD pilot: music of/for children
• Polifonia / MEETUPS pilot: encounters and
exchange of ideas
Lessons learnt
This research has partly received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 870811
The communication reflects only the author’s view and the Research Executive Agency is not responsible for any use that may be made of the information it contains
Thank you
Questions?
@enridaga | www.enridaga.net

Weitere ähnliche Inhalte

Ähnlich wie Capturing the semantics of documentary evidence for humanities research

HERA - Creativity and Craft Production in Middle and Late Bronze Age Europe (...
HERA - Creativity and Craft Production in Middle and Late Bronze Age Europe (...HERA - Creativity and Craft Production in Middle and Late Bronze Age Europe (...
HERA - Creativity and Craft Production in Middle and Late Bronze Age Europe (...Arts and Humanities Research Council (AHRC)
 
Medieval Studies: Some Hopes and Fears for the Future
Medieval Studies: Some Hopes and Fears for the FutureMedieval Studies: Some Hopes and Fears for the Future
Medieval Studies: Some Hopes and Fears for the FutureAndrew Prescott
 
The Future of Medieval Studies: Hopes and Fears
The Future of Medieval Studies: Hopes and FearsThe Future of Medieval Studies: Hopes and Fears
The Future of Medieval Studies: Hopes and FearsAndrew Prescott
 
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...Martin Kalfatovic
 
Europeana Aggregators' Fair day 1
Europeana Aggregators' Fair day 1Europeana Aggregators' Fair day 1
Europeana Aggregators' Fair day 1Europeana
 
Gayle levy, 9.5.13
Gayle levy, 9.5.13Gayle levy, 9.5.13
Gayle levy, 9.5.13sarl2007
 
HTAV Calligraphy presentation
HTAV Calligraphy presentationHTAV Calligraphy presentation
HTAV Calligraphy presentationSLV Education
 
Cultural heritage: Tradition, Museums and Wikis
Cultural heritage: Tradition, Museums and WikisCultural heritage: Tradition, Museums and Wikis
Cultural heritage: Tradition, Museums and WikisThomas Tunsch
 
Mediating Media Art. Digital Visual Archives as Mediation-Tools
Mediating Media Art. Digital Visual Archives as Mediation-ToolsMediating Media Art. Digital Visual Archives as Mediation-Tools
Mediating Media Art. Digital Visual Archives as Mediation-Toolsfwiencek
 
Integrated History Unit: How can Friendships and Dance shape History?
Integrated History Unit: How can Friendships and Dance shape History?Integrated History Unit: How can Friendships and Dance shape History?
Integrated History Unit: How can Friendships and Dance shape History?MahriAutumn
 
Rimini 16 5 2008
Rimini 16 5 2008Rimini 16 5 2008
Rimini 16 5 2008Stuart Dunn
 
Celebrations of the International Mother Language Day 2024.
Celebrations of the International Mother Language Day 2024.Celebrations of the International Mother Language Day 2024.
Celebrations of the International Mother Language Day 2024.Christina Parmionova
 
Call for papers, project on the "Continuous Page: Scrolls and Scrolling from ...
Call for papers, project on the "Continuous Page: Scrolls and Scrolling from ...Call for papers, project on the "Continuous Page: Scrolls and Scrolling from ...
Call for papers, project on the "Continuous Page: Scrolls and Scrolling from ...Encyclopaedia Iranica
 
Mobile Technology and the Museum
Mobile Technology and the MuseumMobile Technology and the Museum
Mobile Technology and the MuseumDorota Kawęcka
 
Elisabeth Niggemann, Deutsche Nationalbibliothek
Elisabeth Niggemann, Deutsche NationalbibliothekElisabeth Niggemann, Deutsche Nationalbibliothek
Elisabeth Niggemann, Deutsche NationalbibliothekNational Digital Forum
 

Ähnlich wie Capturing the semantics of documentary evidence for humanities research (20)

Experience planet earth
Experience planet earthExperience planet earth
Experience planet earth
 
HERA - Creativity and Craft Production in Middle and Late Bronze Age Europe (...
HERA - Creativity and Craft Production in Middle and Late Bronze Age Europe (...HERA - Creativity and Craft Production in Middle and Late Bronze Age Europe (...
HERA - Creativity and Craft Production in Middle and Late Bronze Age Europe (...
 
Trier
TrierTrier
Trier
 
Medieval Studies: Some Hopes and Fears for the Future
Medieval Studies: Some Hopes and Fears for the FutureMedieval Studies: Some Hopes and Fears for the Future
Medieval Studies: Some Hopes and Fears for the Future
 
The Future of Medieval Studies: Hopes and Fears
The Future of Medieval Studies: Hopes and FearsThe Future of Medieval Studies: Hopes and Fears
The Future of Medieval Studies: Hopes and Fears
 
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...
 
Europeana Aggregators' Fair day 1
Europeana Aggregators' Fair day 1Europeana Aggregators' Fair day 1
Europeana Aggregators' Fair day 1
 
Gayle levy, 9.5.13
Gayle levy, 9.5.13Gayle levy, 9.5.13
Gayle levy, 9.5.13
 
HTAV Calligraphy presentation
HTAV Calligraphy presentationHTAV Calligraphy presentation
HTAV Calligraphy presentation
 
Cultural heritage: Tradition, Museums and Wikis
Cultural heritage: Tradition, Museums and WikisCultural heritage: Tradition, Museums and Wikis
Cultural heritage: Tradition, Museums and Wikis
 
Mediating Media Art. Digital Visual Archives as Mediation-Tools
Mediating Media Art. Digital Visual Archives as Mediation-ToolsMediating Media Art. Digital Visual Archives as Mediation-Tools
Mediating Media Art. Digital Visual Archives as Mediation-Tools
 
KuneraPeregrinations
KuneraPeregrinationsKuneraPeregrinations
KuneraPeregrinations
 
Integrated History Unit: How can Friendships and Dance shape History?
Integrated History Unit: How can Friendships and Dance shape History?Integrated History Unit: How can Friendships and Dance shape History?
Integrated History Unit: How can Friendships and Dance shape History?
 
Rimini 16 5 2008
Rimini 16 5 2008Rimini 16 5 2008
Rimini 16 5 2008
 
Celebrations of the International Mother Language Day 2024.
Celebrations of the International Mother Language Day 2024.Celebrations of the International Mother Language Day 2024.
Celebrations of the International Mother Language Day 2024.
 
Call for papers, project on the "Continuous Page: Scrolls and Scrolling from ...
Call for papers, project on the "Continuous Page: Scrolls and Scrolling from ...Call for papers, project on the "Continuous Page: Scrolls and Scrolling from ...
Call for papers, project on the "Continuous Page: Scrolls and Scrolling from ...
 
Mobile Technology and the Museum
Mobile Technology and the MuseumMobile Technology and the Museum
Mobile Technology and the Museum
 
Rescue Archival Documents
Rescue Archival DocumentsRescue Archival Documents
Rescue Archival Documents
 
SDoyle Presentation
SDoyle PresentationSDoyle Presentation
SDoyle Presentation
 
Elisabeth Niggemann, Deutsche Nationalbibliothek
Elisabeth Niggemann, Deutsche NationalbibliothekElisabeth Niggemann, Deutsche Nationalbibliothek
Elisabeth Niggemann, Deutsche Nationalbibliothek
 

Mehr von Enrico Daga

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyEnrico Daga
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...Enrico Daga
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEIEnrico Daga
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything projectEnrico Daga
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Enrico Daga
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterEnrico Daga
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesEnrico Daga
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User StudyEnrico Daga
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsEnrico Daga
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionEnrico Daga
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsEnrico Daga
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
 

Mehr von Enrico Daga (16)

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything project
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data Cluster
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tables
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selection
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 

Kürzlich hochgeladen

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 

Capturing the semantics of documentary evidence for humanities research

  • 1. Capturing the semantics of documentary evidence for humanities research DBpedia Day, NLP & DBpedia 09 / 09 / 2021, Semantics 2021 Amsterdam (& online) Enrico Daga The Open University @enridaga | www.enridaga.net
  • 2. Motivation The identification and cataloguing of documentary evidence from textual corpora is an important part of empirical research in the humanities (e.g. historiographic methodology). Semantic databases of documentary evidence: a recent trend • The Listening Experience Database Project (LED) (over 10.000 unique experiences) - https://led.kmi.open.ac.uk/ (2 UK AHRC 2012-2019) • READ-IT: Reading Europe Advanced Data Investigation Tool - https:// readit-project.eu/ (2018-2020) • Polifonia: Knowledge Graph of Musical Cultural Heritage, with pilots focusing on scholars in the musical heritage domain - http://polifonia- project.eu (2021-2023) Two problems: • Identification -> find evidence in texts • Cataloguing -> curate a database of evidence
  • 3. Identification The task of identifying pieces of evidence in books is a manual work, which may include relying on free text search tools (e.g. PDF viewers) Problems: the activity (a) requires effort / time, (b) is not systematic, (c) is prone to errors, and (d) the methodology is (often) not documented
  • 4. "Capturing themed evidence, a hybrid approach." Enrico Daga and Enrico Motta In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019. • Focus on Identification • We coin the expression themed evidence, to refer to (direct or indirect) traces of a fact or situation relevant to a theme of interest and study the problem of identifying them in texts. • The task of identifying themed evidence is at the intersection between topical text classification (finding texts relevant to a certain theme) and event retrieval (find events mentioned in texts). • Not all topical texts are themed evidence and the nature of the event itself is often assumed, implicit, and left to the reader Paper: http://oro.open.ac.uk/67961/
  • 5. Finding Listening Experiences (theme: music) • RECMUS-619, positive: Introduced to the Anacreontic Society, consisting of amateurs who perform admirably the best orchestral works. The usual supper followed. After propitiating me with a trio from ’Cosi Fan Tutte’, they drew me to the piano. • MASONB-31, positive: In the evening we went to Rev. Baptist Noel’s chapel, where one is always sure of edification from the sermon if not from the psalms. • MASONB-88, negative: Flags and pendants were suspended from the windows, [. . . ] the colors of the German States were waving harmoniously together, and the banners of the Fine Arts, with appropriate inscriptions, particularly those of music, poetry and painting, were especially honored, and floated triumphant amidst the standards of electorates, dukedoms, and kingdoms. Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019.
  • 6. Entity boost. To promote terms mapped to entities PoS Filter: demote terms other then verbs and nouns, to privilege factual statements Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019. 1) Statistical Relatedness Analysis 2) Themed entity detection 3) Hybridisation
  • 7. RECMUS-619, positive: Introduced to the Anacreontic Society, consisting of amateurs who perform admirably the best orchestral works. The usual supper followed. After propitiating me with a trio from 'Cosi Fan Tutte', they drew me to the piano. http://dbpedia.org/resource/Anacreontic_Society http://dbpedia.org/resource/Orchestra http://dbpedia.org/resource/Trio_(music) http://dbpedia.org/resource/Così_fan_tutte http://dbpedia.org/resource/Piano Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019. http://led.kmi.open.ac.uk/discovery/findler
  • 8. MASONB-31, positive: In the evening we went to Rev. Baptist Noel's chapel, where one is always sure of edification from the sermon if not from the psalms. http://dbpedia.org/resource/ Evening_Prayer_(Anglican) http://dbpedia.org/resource/Psalms Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019. MASONB-88, negative: Flags and pendants were suspended from the windows, [...] the colours of the German States were waving harmoniously together, and the banners of the Fine Arts, with appropriate inscriptions, particularly those of music, poetry and painting, were especially honored, and ︎oated triumphant amidst the standards of electorates, dukedoms, and kingdoms. http://dbpedia.org/resource/Music
  • 9. Evaluation The results are very good: 87% F-Measure & Accuracy Baseline methods: • Fo: Random Forest Classifier high precision, low recall, accuracy slightly above random (on training/test, it performed 80% accuracy:: robust GS!!!) • ST: Statistical // a dictionary from Gutenberg’s Music shelf // AVG TF/IDF Variants on our method: • Em: Statistical relatedness component only (Embeddings) • En: Themed entity detection component (Entity) slightly above random: gold standard is pessimistic / robust!!! • Em+F: Statistical relatedness + PoS Filter (Embeddings - Filtered) • Hy-F: No filter, only entity boost (Hybrid - Unfiltered) Without applying noise correction (POS filter), precision is generally lower; shows the impact of entity detection on recall • Hy: best of both worlds. Substantial agreement with annotators (Cohen’s K) Our method on an alternative case study: • Hy/R: Our Hybrid approach on the Reading Experience Database (to test portability). Core concept: book[n] and core entity: dbc:Literature . The approach is applicable to other domains with small configuration Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th International Conference on Knowledge Capture, pp. 93-100. 2019.
  • 11. “Challenging knowledge extraction to support the curation of documentary evidence in the humanities. “ Enrico Daga and Enrico Motta In: Third International Workshop on Capturing Scientific Knowledge (Sciknow). @K-CAP 2019 • Bet: metadata curation could be supported by Knowledge Extraction (KE) • “Slot filling” • Approaches in the literature vary in task / scope: • (Named) Entity Recognition and Classification • Entity Linking: encyclopedic (DBpedia, WikiData), domain specific (Gazetteers) • Relation Extraction (e.g. listener of, in place) • Event extraction (e.g. Performance) • Semantic Role Labelling, Machine reading, … • Assumption: the information is IN the text. Is that a valid assumption? Paper: http://oro.open.ac.uk/67961/
  • 12. Example #1 "I then went to Amsterdam to conduct Oedipus at the Concertgebouw, which was celebrating its fortieth anniversary by a series of sumptuous musical productions. The fine Concertgebouw orchestra, always at the same high level, the magnificent male choruses from the Royal Apollo Society, soloists of the first rank - among them Mme Hélène Sadoven as Jocasta, Louis van Tulder as Oedipus, and Paul Huf, an excellent reader - and the way in which my work was received by the public, have left a particularly precious memory that I recall with much enjoyment." listener: Igor Strawinsky time: in the beginning of 1928 place: Amsterdam opera: Oedipus Rex /by: Igor Strawinsky performer: Concertgebouw orch. environment: Public Igor Stravinksy An Autobiography (1936), p. 139. https://led.kmi.open.ac.uk/entity/lexp/1435674909834 Daga, Enrico and Motta, Enrico (2019). Challenging knowledge extraction to support the curation of documentary evidence in the humanities. In: Third International Workshop on Capturing Scientific Knowledge (Sciknow). Collocated with the K-CAP conference.
  • 13. Example #2 "Music is certainly a pleasure that may be reckoned intellectual, and we shall never again have it in the perfection it is this year, because Mr. Handel will not compose any more! Oratorios begin next week, to my great joy, for they are the highest entertainment to me." listener: Mrs Delany time: March, 1737 place: London opera: Operas and Oratorios /by: G. F. Handel environment: Public From: Mary Granville, and Augusta Hall (ed.), Autobiography and Correspondence of Mary Granville, Mrs Delany: with interesting Reminiscences of King George the Third and Queen Charlotte, volume 1 (London, 1861), p. 594. https://led.kmi.open.ac.uk/entity/lexp/1444424772006 Daga, Enrico and Motta, Enrico (2019). Challenging knowledge extraction to support the curation of documentary evidence in the humanities. In: Third International Workshop on Capturing Scientific Knowledge (Sciknow). Collocated with the K-CAP conference.
  • 14. Experiments • Focus on Entity Recognition: Listener & Place • Scope: 7.3% of the LED with sources available (archive.org) and including DBpedia entities as place or agent, 690 excerpts from 26 books. 1. Find the position of the evidence text back in the original source 2. Check where the DBpedia entity (listener or place) is mentioned • Details of the experiments are in the paper Daga, Enrico and Motta, Enrico (2019). Challenging knowledge extraction to support the curation of documentary evidence in the humanities. In: Third International Workshop on Capturing Scientific Knowledge (Sciknow). Collocated with the K-CAP conference.
  • 15. Analysis • Q1 - in the excerpt? The place is mentioned in the excerpt in 25.9% cases. The listener only in 13.4%. • Q2 - near the excerpt? Only 10% of the times the place mention is less than 5 paragraphs from the excerpt. The agent, in 4% of the cases. • Q3 - in the source? 83.2% of the times the place is mentioned at least once in the source. In 11.4% the place hasn’t been found. • Q4 - in the meta? 64.8% of the listeners are also the authors of the text - 5874 cases in LED. Distance of entity (in n of paragraphs) Daga, Enrico and Motta, Enrico (2019). Challenging knowledge extraction to support the curation of documentary evidence in the humanities. In: Third International Workshop on Capturing Scientific Knowledge (Sciknow). Collocated with the K-CAP conference.
  • 16. Polifonia | 2020 16 • Implicit information, based on inference requiring expertise (e.g. Mr Handel is G.F Handel, Oedipus is “Oedipus Rex”) • The role of contextual knowledge is key to • (1) identify the entities (e.g. metadata); • (2) common sense reasoning (“the next year”, "in the beginning of 1928") • Entities can exist in distributed, heterogeneous resources (encyclopaedic KBs, domain-specific taxonomies, gazetteers, …) • Machine reading generates an ontology formalising the discourse in the text, reducing the task to one of ontology alignment (not a simplification!) • AI / Knowledge Extraction research is often focused on common sense & encyclopaedic knowledge • Documentary evidence is heavily domain- specific • Problem: humanities scholars coin novel concepts, e.g. LED, READ-IT • Sitting Experience in Portraiture History (OU Arts History PhD) • Polifonia / CHILD pilot: music of/for children • Polifonia / MEETUPS pilot: encounters and exchange of ideas Lessons learnt This research has partly received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 870811 The communication reflects only the author’s view and the Research Executive Agency is not responsible for any use that may be made of the information it contains