SlideShare a Scribd company logo
1 of 33
Download to read offline
Bibliotheca Digitalis
Reconstitution of Early Modern Cultural Networks
From Primary Source to Data
DARIAH / Biblissima Summer School
Le Mans, 4-8 July 2017
Beyond the Page:
enriching the digital library
Lou Burnard
1st day, July 4th – Digital sources: theoretical fundamentals
Beyond the Page : enriching the digital library
Lou Burnard
1/32
2/32
The Textual Trinity
A document can be described in
terms of...
its physical state (because
texts are made up of glyphs
arranged in particular ways)
its linguistic nature (because
texts are made of words
used in particular ways)
its intentions (because texts
are supposed to tell us
something about the world)
(Burnard 1987, Burnard 1989,
Burnard & Greenstein 1994)
3/32
(Or maybe it’s more than a trinity)
4/32
Software families
Existing software systems tend to specialize ...
document management and production systems
image management and production systems
linguistic analysis and management
database systems
5/32
Convergence
But convergence is now on everyone’s digital agenda. When you
make a mashup combining
a GIS database about places in the Aegean sea
a historical gazeteer of placenames in the same area
a corpus of texts mentioning those placenames
you need to combine the strengths of a database with tools for
linguistic analysis, and with tools for rendering spatial information.
A few examples:
https://pleiades.stoa.org/places/109236
http://www.mappingpaintings.org
https://mapoflondon.uvic.ca/map.htm
6/32
The problem
Today’s digital library
applications still focus on
serving up virtual pages for
the reader: the metaphor of
the book is so pervasive that
we can barely see it.
Self-evidently, digitization
makes it possible to offer
cheaper and more accessible
simulations of printed or
written pages.
But this is not enough...
digital texts should aim to
go ‘beyond the page’
7/32
What use is a digital text ?
Digital applications enable us to do more with a text, and especially
with a collection of texts!
more than simply read it from beginning to end
more than attach annotations to it for others to read,
more than perform brute-force “text mining” on it.
The content of the digital library must therefore be enriched, even if
this requires the use of techniques which are not currently
automatable.
8/32
What’s that noise in the digital library?
A digital edition should capture the intentions and meaning of
a text, not simply its appearance
Otherwise, there can be no analysis beyond the documentary
level, no ‘conversation between books’
9/32
Enrichment or Representation?
When we go from this... ... to this, what is happening?
10/32
Editing
It’s customary to distinguish (at least) these types or levels of
interpretation:
paleographic level : identifying the characters and other
graphemic components
documentary or diplomatic level : determining what was
originally written
editorial or semantic level : determining how it ought to be
read
Digitization provides an opportunity to make each step explicit,
complex, and reversible
11/32
The hermeneutic circle of digital enrichment
12/32
Enrichment
Adding markup to a document determines how it can be
processed. It can concern many different aspects :
the presentation of the document – its use of writing styles or
typefaces, its rendering and layout
the rhetorical organization of a document – its sections and
subsections, its paragraphs and lists and headings and
footnotes
metatextual aspects of the document – its corrections and
additions and deletions and errors and lacunae
linguistic properties of a document – its syntax and
morphology and semantics
the document as an object – information about its origins and
custodial history, its transmission and reception, its social
function and category...
and many others.
13/32
Let’s focus on just one aspect: the treatment of names occurring in
a document.
14/32
Some background theory
Reference is a fundamental semiotic concept
Natural languages often distinguish words associated with
abstract concepts from words associated with (concepts
concerning) specific objects
Proper names, technical terms, etc behave differently from
other kinds of word and often have a different linguistic status
they do not appear in lexicons
they are often ‘non-translatable’
What distinguishes them is chiefly their association with real
(or fictive) entities. ‘king’ is a noun with no particular referent;
‘Martin Luther King’ refers to a specific person, as does (in
context) ‘the king’.
Likewise with places, ‘city’ refers to a type of place, not a
particular one; ‘City of London’ refers to a particular place, as
does (in context) ‘the city’
15/32
named entity recognition is a multi-stage operation
decide which input strings reference named entities
decide which particular entities are intended
(optionally) assemble and associate other information about
each referenced entity
Only the first of these is (more or less) automatable, despite
decades of research.
16/32
The NLP (MUC) ‘Named Entity Recognition’ paradigm
input strings are linguistically analysed (parsed,
morphologically analysed, etc.) for candidate tokens
candidates are resolved and disambiguated using a
(pre-existing) ‘knowledge base’ such as Wikipedia
data mining and language modelling systems work similarly,
though the knowledge base may be less structured
The real challenge is to build the knowledge base ...
17/32
Kinds of entity
persons, historical or fictional : ‘Lou Burnard’, ‘Harry Potter’,
‘Pseudo-Dionysius the Areopagite’
named places, of any kind ‘Le Mans’, ‘Atlantis’, ‘Prussia’, ‘the
Eiffel Tower’
named groupings of people ‘The Drones’, ‘Gallimard’, ‘the
Thracians’
Physical objects, works of art etc. ‘the Alfred Jewel’, ‘Excalibur’,
‘the Mona Lisa’
etc. (Are animals objects or people?)
18/32
Entity properties
What might you want to know about an entity? Some things are
obvious, but the list is in principle unbounded:
the various names associated with them at different times
their chronology (birth, death, creation etc.)
their composition, dimensions, classifications, etc.
their associations with other entities
identifiers used for them in standard authority control lists
The last is particularly important for work in the LOD paradigm.
19/32
Kinds of entity reference
TEI provides several elements for the markup of names and nominal
expressions:
<rs> (‘referring string’) – any phrase which refers to a person or
place, e.g. ‘the girl you mentioned’, ‘10 miles Northeast of
Attica’ ...
<name> – any lexical item recognized as a proper name e.g.
‘Budleigh Salterton’ , ‘Bouallebec’, ‘John Doe’ ...
<persName>, <placeName>, <orgName>: specific types of
name: ‘syntactic sugar’ for <name type="person"> etc.
A rich set of proposals for the components of such elements
A project must decide which approach best suits its needs
20/32
Nominal expressions
often have internal structure
are sometimes ambiguous (same referent, different target)
are often multiform (different referent, same target)
TEI XML markup can help...
21/32
Components of personal names
<persName xml:lang="de">
<forename type="first">Johann</forename>
<forename type="middle">Sebastian</forename>
<surname>Bach</surname>
</persName>
<persName xml:lang="fr">
<forename type="composé">Jean-Sébastien</forename>
<surname>Bach</surname>
</persName>
Not to mention... <roleName> (‘Emperor’, ‘conseiller’), <genName>
(‘the Elder’) <addName> (‘Hammer of the Scots’), <nameLink> (‘van
der’) ...
22/32
Components of place names
names of a specific geo-political type (<district>,
<settlement>, <region>, <country>, <bloc>)
<placeName>
<district>6ème arr.</district>
<settlement type="city">Paris, </settlement>
<country>France</country>
</placeName>
names of geographical features such as a mountains or rivers
and terms for such features (<geogName> and <geogFeat>)
<placeName>
<geogFeat>Mont</geogFeat>
<geogName>Blanc</geogName>
</placeName>
a relational expression
<rs type="place">
<measure>10 miles</measure>
<offset>Northeast of</offset>
<settlement>Attica</settlement>
</rs>
23/32
Resolving referents
Within a single language, in a single document, the same person is
referred to in different ways:
<persName>Clara Schumann</persName> ....
<persName>Clara</persName> ....
<persName>Frau Schumann</persName>
The @ref can be used to show that these are all references to the
same person
<persName ref="#CS">Clara Schumann</persName> ....
<persName ref="#CS">Clara</persName> .... <persName>Clara
Wieck</persName> ...
<persName ref="#CS">Frau Schumann</persName>
24/32
Associating reference and entity
the value of @ref can be any form of URI, pointing to a place
where there is more information about this entity, provided
locally or externally
<persName ref="https://en.wikipedia.org/wiki/Clara_Schumann">
Clara
Schumann</persName>
<persName ref="#CS">Clara Schumann</persName>
<persName ref="myBib:CS">Clara Schumann</persName>
All we want to say about CS can be provided using a <person>
element somewhere
<person xml:id="CS">
<persName notAfter="1840-09-12">Clara Wieck</persName>
<birth when="1819-09-13">
<placeName>Leipzig</placeName>
</birth>
<ref type="VIAF"
target="http://viaf.org/viaf/44499359"/>
<idno type="ISNI">ISN:0000000121305653</idno>
<!--etc -->
</person>
25/32
Resolving ambiguity
Person or place?
<s>Jean likes
<name>Nancy</name>
</s>
We could clarify this by using a more precise tag (<persName> or
<placeName>) rather than <name>. Or we could resolve it by
supplying the appropriate target for the @ref attribute on <name>:
<s>Jean likes
<name ref="#PLACE123">Nancy</name>
</s>
<!-- ... -->
<person xml:id="PERS123">
<persName>
<forename>Nancy</forename>
<surname>Ide</surname>
</persName>
<!-- ... -->
</person>
<place xml:id="PLACE123">
<placeName notBefore="1400">Nancy</placeName>
<placeName notAfter="0056">Nantium</placeName>
<!-- ... -->26/32
Data vs. Text
TEI distinguishes names from things.
The assumption is that names are found in source texts, whereas
things exist in the real world, and are described by additional data.
Data can take a semi-textual form structured in XML, though it need
not do so.
‘Text is not a special type of data; data is a special type of text.’
27/32
For example
Extract from Histoire Chronologique de la Chancelerie de France..., p. 5
personal names (Odolric, Adalric, Gezon, Lothaire, Adaleron,
Arnoul) ...
names of social positions (Grand Chancelier, Secretaire, Roi...)
a nick name (‘dit Le Faineant’)
titles of other sources (pour la donation de l’Abbaie de
Bonneval, Antiquitez de Troyes)
explicit quotation (‘Sinum Lotarii gloriosissimi Regis... ’)
The formatting helps... but only a bit: we need to make these things
explicit.
28/32
Another example: Paris, BnF, ms. français 16753
First page of Registres de permis d’imprimer...
29/32
One possible encoding...
This seems to be text as data...
30/32
.... continued
... and this seems to be data as text...
31/32
Tentative conclusions, intended to provoke debate
reading a text involves identifying and understanding its data
reading many texts at a distance contributes to, but does not
replace, an understanding of the data they represent
data is itself a kind of text, requiring the same nuanced
interpretive judgment
32/32

More Related Content

Similar to Bibliotheca Digitalis Summer school: Beyond the Page: enriching the digital library - Lou Burnard

Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...Toby Burrows
 
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...OpenEdition
 
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Rinke Hoekstra
 
Methods and experiences in cultural heritage enhancement
Methods and experiences in cultural heritage enhancementMethods and experiences in cultural heritage enhancement
Methods and experiences in cultural heritage enhancementFrancesca Tomasi
 
Promoting Digital Humanities in the Philippines
Promoting Digital Humanities in the PhilippinesPromoting Digital Humanities in the Philippines
Promoting Digital Humanities in the PhilippinesDave Marcial
 
Data versus Text: 30 years of confrontation
Data versus Text: 30 years of confrontationData versus Text: 30 years of confrontation
Data versus Text: 30 years of confrontationLou Burnard
 
Leipzig Functional Categorisation 11/12/2013
Leipzig Functional Categorisation 11/12/2013Leipzig Functional Categorisation 11/12/2013
Leipzig Functional Categorisation 11/12/2013Giovanni Colavizza
 
Ontologies introduction - ecoOnto meeting
Ontologies introduction - ecoOnto meetingOntologies introduction - ecoOnto meeting
Ontologies introduction - ecoOnto meetingjchabalier
 
Essay On Library Museum And Archive
Essay On Library Museum And ArchiveEssay On Library Museum And Archive
Essay On Library Museum And ArchiveJessica Rinehart
 
All the world exists to end up in a dictionary
All the world exists to end up in a dictionaryAll the world exists to end up in a dictionary
All the world exists to end up in a dictionaryRossellaDH
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applicationsVasileios Lampos
 
Notational systems and the abstract built environment
Notational systems and the abstract built environmentNotational systems and the abstract built environment
Notational systems and the abstract built environmentJeff Long
 
Linked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenLinked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenStefan Gradmann
 
Putrajaya Talk
Putrajaya TalkPutrajaya Talk
Putrajaya Talkazlaini
 
CIDOC CRM Tutorial
CIDOC CRM TutorialCIDOC CRM Tutorial
CIDOC CRM TutorialISLCCIFORTH
 

Similar to Bibliotheca Digitalis Summer school: Beyond the Page: enriching the digital library - Lou Burnard (20)

Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...Ontologies and the humanities: some issues affecting the design of digital in...
Ontologies and the humanities: some issues affecting the design of digital in...
 
Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005
 
Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005
 
Advances In Wsd Acl 2005
Advances In Wsd Acl 2005Advances In Wsd Acl 2005
Advances In Wsd Acl 2005
 
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
 
Corpus
CorpusCorpus
Corpus
 
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04
 
The CIDOC CRM Family and LOD
The CIDOC CRM Family and LODThe CIDOC CRM Family and LOD
The CIDOC CRM Family and LOD
 
Methods and experiences in cultural heritage enhancement
Methods and experiences in cultural heritage enhancementMethods and experiences in cultural heritage enhancement
Methods and experiences in cultural heritage enhancement
 
Promoting Digital Humanities in the Philippines
Promoting Digital Humanities in the PhilippinesPromoting Digital Humanities in the Philippines
Promoting Digital Humanities in the Philippines
 
Data versus Text: 30 years of confrontation
Data versus Text: 30 years of confrontationData versus Text: 30 years of confrontation
Data versus Text: 30 years of confrontation
 
Leipzig Functional Categorisation 11/12/2013
Leipzig Functional Categorisation 11/12/2013Leipzig Functional Categorisation 11/12/2013
Leipzig Functional Categorisation 11/12/2013
 
Ontologies introduction - ecoOnto meeting
Ontologies introduction - ecoOnto meetingOntologies introduction - ecoOnto meeting
Ontologies introduction - ecoOnto meeting
 
Essay On Library Museum And Archive
Essay On Library Museum And ArchiveEssay On Library Museum And Archive
Essay On Library Museum And Archive
 
All the world exists to end up in a dictionary
All the world exists to end up in a dictionaryAll the world exists to end up in a dictionary
All the world exists to end up in a dictionary
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applications
 
Notational systems and the abstract built environment
Notational systems and the abstract built environmentNotational systems and the abstract built environment
Notational systems and the abstract built environment
 
Linked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenLinked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the Citizen
 
Putrajaya Talk
Putrajaya TalkPutrajaya Talk
Putrajaya Talk
 
CIDOC CRM Tutorial
CIDOC CRM TutorialCIDOC CRM Tutorial
CIDOC CRM Tutorial
 

More from Bibliothèques Virtuelles Humanistes - CESR, Université de Tours, UMR 7323

More from Bibliothèques Virtuelles Humanistes - CESR, Université de Tours, UMR 7323 (20)

Montaigne : derniers développements sur les travaux éditoriaux
Montaigne : derniers développements sur les travaux éditoriauxMontaigne : derniers développements sur les travaux éditoriaux
Montaigne : derniers développements sur les travaux éditoriaux
 
Les BVH & l’étude des matériels d’imprimerie anciens
 Les BVH & l’étude des matériels d’imprimerie anciens Les BVH & l’étude des matériels d’imprimerie anciens
Les BVH & l’étude des matériels d’imprimerie anciens
 
Évolutions de l’infrastructure & de la bibliothèque numérique
Évolutions de l’infrastructure & de la bibliothèque numériqueÉvolutions de l’infrastructure & de la bibliothèque numérique
Évolutions de l’infrastructure & de la bibliothèque numérique
 
Les « Bibliotheques françoises » (BibFr) – Avancée de l’indexation de La Croi...
Les « Bibliotheques françoises » (BibFr) – Avancée de l’indexation de La Croi...Les « Bibliotheques françoises » (BibFr) – Avancée de l’indexation de La Croi...
Les « Bibliotheques françoises » (BibFr) – Avancée de l’indexation de La Croi...
 
Édition numérique et valorisation du livre de compte de la reine Marguerite d...
Édition numérique et valorisation du livre de compte de la reine Marguerite d...Édition numérique et valorisation du livre de compte de la reine Marguerite d...
Édition numérique et valorisation du livre de compte de la reine Marguerite d...
 
Catalogues régionaux des Incunables des bibliothèques publiques de France
Catalogues régionaux des Incunables des bibliothèques publiques de FranceCatalogues régionaux des Incunables des bibliothèques publiques de France
Catalogues régionaux des Incunables des bibliothèques publiques de France
 
Une nouvelle base de données, Scripta Manent : le “Facebook” des années 1530-...
Une nouvelle base de données, Scripta Manent : le “Facebook” des années 1530-...Une nouvelle base de données, Scripta Manent : le “Facebook” des années 1530-...
Une nouvelle base de données, Scripta Manent : le “Facebook” des années 1530-...
 
Bilan 2022 & perspectives du programme de recherche BVH
Bilan 2022 & perspectives du programme de recherche BVHBilan 2022 & perspectives du programme de recherche BVH
Bilan 2022 & perspectives du programme de recherche BVH
 
Catalogues régionaux des Incunables des bibliothèques publiques de France : S...
Catalogues régionaux des Incunables des bibliothèques publiques de France : S...Catalogues régionaux des Incunables des bibliothèques publiques de France : S...
Catalogues régionaux des Incunables des bibliothèques publiques de France : S...
 
Architecture de la bibliothèque numérique : Déploiement du protocole IIIF - A...
Architecture de la bibliothèque numérique : Déploiement du protocole IIIF - A...Architecture de la bibliothèque numérique : Déploiement du protocole IIIF - A...
Architecture de la bibliothèque numérique : Déploiement du protocole IIIF - A...
 
Autour du projet BiRayMa : "Bibliothèque de Raymond Marcel" (CollEx-Persée) -...
Autour du projet BiRayMa : "Bibliothèque de Raymond Marcel" (CollEx-Persée) -...Autour du projet BiRayMa : "Bibliothèque de Raymond Marcel" (CollEx-Persée) -...
Autour du projet BiRayMa : "Bibliothèque de Raymond Marcel" (CollEx-Persée) -...
 
Rabelais : Les documents de Berne et l'Almanach d'Alessandria - Assemblée gén...
Rabelais : Les documents de Berne et l'Almanach d'Alessandria - Assemblée gén...Rabelais : Les documents de Berne et l'Almanach d'Alessandria - Assemblée gén...
Rabelais : Les documents de Berne et l'Almanach d'Alessandria - Assemblée gén...
 
Projet Scripta Manent : Une nouvelle base de données : les relations sociales...
Projet Scripta Manent : Une nouvelle base de données : les relations sociales...Projet Scripta Manent : Une nouvelle base de données : les relations sociales...
Projet Scripta Manent : Une nouvelle base de données : les relations sociales...
 
Projet Les Bibliotheques françoises de La Croix du Maine et de Du Verdier - A...
Projet Les Bibliotheques françoises de La Croix du Maine et de Du Verdier - A...Projet Les Bibliotheques françoises de La Croix du Maine et de Du Verdier - A...
Projet Les Bibliotheques françoises de La Croix du Maine et de Du Verdier - A...
 
Architecture de la bibliothèque numérique : Modélisation en XML-TEI - Assembl...
Architecture de la bibliothèque numérique : Modélisation en XML-TEI - Assembl...Architecture de la bibliothèque numérique : Modélisation en XML-TEI - Assembl...
Architecture de la bibliothèque numérique : Modélisation en XML-TEI - Assembl...
 
Architecture de la bibliothèque numérique : Veille fonctionnelle et technique...
Architecture de la bibliothèque numérique : Veille fonctionnelle et technique...Architecture de la bibliothèque numérique : Veille fonctionnelle et technique...
Architecture de la bibliothèque numérique : Veille fonctionnelle et technique...
 
Architecture de la bibliothèque numérique : Modélisation et migrations de don...
Architecture de la bibliothèque numérique : Modélisation et migrations de don...Architecture de la bibliothèque numérique : Modélisation et migrations de don...
Architecture de la bibliothèque numérique : Modélisation et migrations de don...
 
Production BVH : Epistemon (éditions numériques TEI-Renaissance) - Assemblée ...
Production BVH : Epistemon (éditions numériques TEI-Renaissance) - Assemblée ...Production BVH : Epistemon (éditions numériques TEI-Renaissance) - Assemblée ...
Production BVH : Epistemon (éditions numériques TEI-Renaissance) - Assemblée ...
 
Production BVH : Fac-similés (Numérisations) - Assemblée générale 2021, Progr...
Production BVH : Fac-similés (Numérisations) - Assemblée générale 2021, Progr...Production BVH : Fac-similés (Numérisations) - Assemblée générale 2021, Progr...
Production BVH : Fac-similés (Numérisations) - Assemblée générale 2021, Progr...
 
Bilan 2020-2021 & perspectives 2022+ Assemblée générale 2021, Programme de re...
Bilan 2020-2021 & perspectives 2022+ Assemblée générale 2021, Programme de re...Bilan 2020-2021 & perspectives 2022+ Assemblée générale 2021, Programme de re...
Bilan 2020-2021 & perspectives 2022+ Assemblée générale 2021, Programme de re...
 

Recently uploaded

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 

Bibliotheca Digitalis Summer school: Beyond the Page: enriching the digital library - Lou Burnard

  • 1. Bibliotheca Digitalis Reconstitution of Early Modern Cultural Networks From Primary Source to Data DARIAH / Biblissima Summer School Le Mans, 4-8 July 2017 Beyond the Page: enriching the digital library Lou Burnard 1st day, July 4th – Digital sources: theoretical fundamentals
  • 2. Beyond the Page : enriching the digital library Lou Burnard 1/32
  • 4. The Textual Trinity A document can be described in terms of... its physical state (because texts are made up of glyphs arranged in particular ways) its linguistic nature (because texts are made of words used in particular ways) its intentions (because texts are supposed to tell us something about the world) (Burnard 1987, Burnard 1989, Burnard & Greenstein 1994) 3/32
  • 5. (Or maybe it’s more than a trinity) 4/32
  • 6. Software families Existing software systems tend to specialize ... document management and production systems image management and production systems linguistic analysis and management database systems 5/32
  • 7. Convergence But convergence is now on everyone’s digital agenda. When you make a mashup combining a GIS database about places in the Aegean sea a historical gazeteer of placenames in the same area a corpus of texts mentioning those placenames you need to combine the strengths of a database with tools for linguistic analysis, and with tools for rendering spatial information. A few examples: https://pleiades.stoa.org/places/109236 http://www.mappingpaintings.org https://mapoflondon.uvic.ca/map.htm 6/32
  • 8. The problem Today’s digital library applications still focus on serving up virtual pages for the reader: the metaphor of the book is so pervasive that we can barely see it. Self-evidently, digitization makes it possible to offer cheaper and more accessible simulations of printed or written pages. But this is not enough... digital texts should aim to go ‘beyond the page’ 7/32
  • 9. What use is a digital text ? Digital applications enable us to do more with a text, and especially with a collection of texts! more than simply read it from beginning to end more than attach annotations to it for others to read, more than perform brute-force “text mining” on it. The content of the digital library must therefore be enriched, even if this requires the use of techniques which are not currently automatable. 8/32
  • 10. What’s that noise in the digital library? A digital edition should capture the intentions and meaning of a text, not simply its appearance Otherwise, there can be no analysis beyond the documentary level, no ‘conversation between books’ 9/32
  • 11. Enrichment or Representation? When we go from this... ... to this, what is happening? 10/32
  • 12. Editing It’s customary to distinguish (at least) these types or levels of interpretation: paleographic level : identifying the characters and other graphemic components documentary or diplomatic level : determining what was originally written editorial or semantic level : determining how it ought to be read Digitization provides an opportunity to make each step explicit, complex, and reversible 11/32
  • 13. The hermeneutic circle of digital enrichment 12/32
  • 14. Enrichment Adding markup to a document determines how it can be processed. It can concern many different aspects : the presentation of the document – its use of writing styles or typefaces, its rendering and layout the rhetorical organization of a document – its sections and subsections, its paragraphs and lists and headings and footnotes metatextual aspects of the document – its corrections and additions and deletions and errors and lacunae linguistic properties of a document – its syntax and morphology and semantics the document as an object – information about its origins and custodial history, its transmission and reception, its social function and category... and many others. 13/32
  • 15. Let’s focus on just one aspect: the treatment of names occurring in a document. 14/32
  • 16. Some background theory Reference is a fundamental semiotic concept Natural languages often distinguish words associated with abstract concepts from words associated with (concepts concerning) specific objects Proper names, technical terms, etc behave differently from other kinds of word and often have a different linguistic status they do not appear in lexicons they are often ‘non-translatable’ What distinguishes them is chiefly their association with real (or fictive) entities. ‘king’ is a noun with no particular referent; ‘Martin Luther King’ refers to a specific person, as does (in context) ‘the king’. Likewise with places, ‘city’ refers to a type of place, not a particular one; ‘City of London’ refers to a particular place, as does (in context) ‘the city’ 15/32
  • 17. named entity recognition is a multi-stage operation decide which input strings reference named entities decide which particular entities are intended (optionally) assemble and associate other information about each referenced entity Only the first of these is (more or less) automatable, despite decades of research. 16/32
  • 18. The NLP (MUC) ‘Named Entity Recognition’ paradigm input strings are linguistically analysed (parsed, morphologically analysed, etc.) for candidate tokens candidates are resolved and disambiguated using a (pre-existing) ‘knowledge base’ such as Wikipedia data mining and language modelling systems work similarly, though the knowledge base may be less structured The real challenge is to build the knowledge base ... 17/32
  • 19. Kinds of entity persons, historical or fictional : ‘Lou Burnard’, ‘Harry Potter’, ‘Pseudo-Dionysius the Areopagite’ named places, of any kind ‘Le Mans’, ‘Atlantis’, ‘Prussia’, ‘the Eiffel Tower’ named groupings of people ‘The Drones’, ‘Gallimard’, ‘the Thracians’ Physical objects, works of art etc. ‘the Alfred Jewel’, ‘Excalibur’, ‘the Mona Lisa’ etc. (Are animals objects or people?) 18/32
  • 20. Entity properties What might you want to know about an entity? Some things are obvious, but the list is in principle unbounded: the various names associated with them at different times their chronology (birth, death, creation etc.) their composition, dimensions, classifications, etc. their associations with other entities identifiers used for them in standard authority control lists The last is particularly important for work in the LOD paradigm. 19/32
  • 21. Kinds of entity reference TEI provides several elements for the markup of names and nominal expressions: <rs> (‘referring string’) – any phrase which refers to a person or place, e.g. ‘the girl you mentioned’, ‘10 miles Northeast of Attica’ ... <name> – any lexical item recognized as a proper name e.g. ‘Budleigh Salterton’ , ‘Bouallebec’, ‘John Doe’ ... <persName>, <placeName>, <orgName>: specific types of name: ‘syntactic sugar’ for <name type="person"> etc. A rich set of proposals for the components of such elements A project must decide which approach best suits its needs 20/32
  • 22. Nominal expressions often have internal structure are sometimes ambiguous (same referent, different target) are often multiform (different referent, same target) TEI XML markup can help... 21/32
  • 23. Components of personal names <persName xml:lang="de"> <forename type="first">Johann</forename> <forename type="middle">Sebastian</forename> <surname>Bach</surname> </persName> <persName xml:lang="fr"> <forename type="composé">Jean-Sébastien</forename> <surname>Bach</surname> </persName> Not to mention... <roleName> (‘Emperor’, ‘conseiller’), <genName> (‘the Elder’) <addName> (‘Hammer of the Scots’), <nameLink> (‘van der’) ... 22/32
  • 24. Components of place names names of a specific geo-political type (<district>, <settlement>, <region>, <country>, <bloc>) <placeName> <district>6ème arr.</district> <settlement type="city">Paris, </settlement> <country>France</country> </placeName> names of geographical features such as a mountains or rivers and terms for such features (<geogName> and <geogFeat>) <placeName> <geogFeat>Mont</geogFeat> <geogName>Blanc</geogName> </placeName> a relational expression <rs type="place"> <measure>10 miles</measure> <offset>Northeast of</offset> <settlement>Attica</settlement> </rs> 23/32
  • 25. Resolving referents Within a single language, in a single document, the same person is referred to in different ways: <persName>Clara Schumann</persName> .... <persName>Clara</persName> .... <persName>Frau Schumann</persName> The @ref can be used to show that these are all references to the same person <persName ref="#CS">Clara Schumann</persName> .... <persName ref="#CS">Clara</persName> .... <persName>Clara Wieck</persName> ... <persName ref="#CS">Frau Schumann</persName> 24/32
  • 26. Associating reference and entity the value of @ref can be any form of URI, pointing to a place where there is more information about this entity, provided locally or externally <persName ref="https://en.wikipedia.org/wiki/Clara_Schumann"> Clara Schumann</persName> <persName ref="#CS">Clara Schumann</persName> <persName ref="myBib:CS">Clara Schumann</persName> All we want to say about CS can be provided using a <person> element somewhere <person xml:id="CS"> <persName notAfter="1840-09-12">Clara Wieck</persName> <birth when="1819-09-13"> <placeName>Leipzig</placeName> </birth> <ref type="VIAF" target="http://viaf.org/viaf/44499359"/> <idno type="ISNI">ISN:0000000121305653</idno> <!--etc --> </person> 25/32
  • 27. Resolving ambiguity Person or place? <s>Jean likes <name>Nancy</name> </s> We could clarify this by using a more precise tag (<persName> or <placeName>) rather than <name>. Or we could resolve it by supplying the appropriate target for the @ref attribute on <name>: <s>Jean likes <name ref="#PLACE123">Nancy</name> </s> <!-- ... --> <person xml:id="PERS123"> <persName> <forename>Nancy</forename> <surname>Ide</surname> </persName> <!-- ... --> </person> <place xml:id="PLACE123"> <placeName notBefore="1400">Nancy</placeName> <placeName notAfter="0056">Nantium</placeName> <!-- ... -->26/32
  • 28. Data vs. Text TEI distinguishes names from things. The assumption is that names are found in source texts, whereas things exist in the real world, and are described by additional data. Data can take a semi-textual form structured in XML, though it need not do so. ‘Text is not a special type of data; data is a special type of text.’ 27/32
  • 29. For example Extract from Histoire Chronologique de la Chancelerie de France..., p. 5 personal names (Odolric, Adalric, Gezon, Lothaire, Adaleron, Arnoul) ... names of social positions (Grand Chancelier, Secretaire, Roi...) a nick name (‘dit Le Faineant’) titles of other sources (pour la donation de l’Abbaie de Bonneval, Antiquitez de Troyes) explicit quotation (‘Sinum Lotarii gloriosissimi Regis... ’) The formatting helps... but only a bit: we need to make these things explicit. 28/32
  • 30. Another example: Paris, BnF, ms. français 16753 First page of Registres de permis d’imprimer... 29/32
  • 31. One possible encoding... This seems to be text as data... 30/32
  • 32. .... continued ... and this seems to be data as text... 31/32
  • 33. Tentative conclusions, intended to provoke debate reading a text involves identifying and understanding its data reading many texts at a distance contributes to, but does not replace, an understanding of the data they represent data is itself a kind of text, requiring the same nuanced interpretive judgment 32/32