SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Functional Categorisation for 	

Historical Place Types
	


!

!

Giovanni Colavizza	


Leibniz-Institut für Europäische Geschichte (IEG), Mainz 	


!

Colavizza@ieg-mainz.de	


1
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Section 1: introduction and motivations

2
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

The topic
!

Controlled vocabulary: “a pre-selected list of terms used for categorisation.”	

or “an organized arrangement of words and phrases used to index content and/
or to retrieve content through browsing or searching.” @Patricia Harpring	

!

Gazetteer: “a geographical dictionary or directory used in conjunction with a map
or atlas.” @Wikipedia	

!
!

Focus for this talk: 	

Controlled Vocabularies of concepts, not proper names.	

Historical Place Types.	

!

3
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Examples
Terms—label – concept relations—are often not defined.	

Natural language is context and interpretation specific.	

!
!
@Dalia Varanka, A topographic feature 	

taxonomy for a US national topographic 	

mapping ontology, 2009.	

!
!
!
@Excerpt from 	

LinkedGeoData ontology.

4
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Motivations
!
!

Quantitative analysis ↦ Classification	

!

Classification for quantitative analysis ↦ unambiguous, consistent, shared	

!

Controlled vocabularies for place types at the moment:	

• grow out of necessity, are project specific	

• have high degree of ambiguity	

• lack of explicit (formal) definitions of terms	

• are not designed for portability and reuse

5
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Basic definitions - I
!
!

Taxonomy: “a semantic network of concepts (referred to, or labeled, via a
controlled vocabulary), linked by hierarchical relationships. A taxonomy is thus a
limited thesaurus.”	

!

Thesaurus: “a semantic network of concepts (referred to, or labeled, via a
controlled vocabulary), linked by equivalence, hierarchical and associative
relationships.”	

!

Ontology: “formal and explicit specification of a shared conceptualisation.” 	

@Studer, 1998 and Guarino, 2009

6
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Basic definitions - II
!

Taxonomy: contains categories organised hierarchically. Used to classify.	

E. g. “vehicles” ↦ “terrestrial vehicles” ↦ “car”.	

!

Thesaurus: contains concepts and labels for them, organised relationally. Used to
index and search. E. g. “terrestrial vehicles” ↦ “car”@en (alternatives:
“macchina”@it, “voiture”@fr, .. relates_to: “car park”, “highway”, ..)	

!

Ontology: contains classes, properties and logical rules. Eventually instances of
classes. Used to instance and reason.	

E. g. “car” is_subclass_of “vehicle”. “has_horsepower” is a property between an
instance of class “car” and an positive integer. “Audi RS Q3” is_a “car”. 	

And so on.. 	

@Thomas Francart

7
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Getty’s TGN - I
Getty’s Thesaurus of Geographic Names: “a database of places in context.”	

!

Target: professionals in the heritage sector. Always growing by design.	

!

Structure of a record:

8
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Getty’s TGN - II
Possible tensions from two directions:	

• The mix of physical features and administrative entities in the hierarchies, since
“a geographic place is an administrative entity or a physical feature with a
name”. 	

• The account for both current and historical places, types and hierarchies.	

!

Good ideas:	

• Instances of administrative entities. E.g. Ancient Egypt (former nation) is
predecessor of Egypt (modern nation).	

• Instances have time spans.

9
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Getty’s TGN - III
!
!

Place type: “a term that characterises a significant aspect of the place, including its
role, function, political anatomy, size, or physical characteristics.” 	

@TGNGuidelines, section 3.6.1.1	

!

Foundation for the hierarchy of every TGN record via preferred type.	

Organised in flat general categories (Christian types, Physical features types, etc.).	

!

Most place types can be assigned to three macro-areas: physical features,
administrative divisions (geopolitics and internal state structure) and functions
(religious, economic, social, etc.).	

!
!

10
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Getty’s TGN - IV
!

Terminological issues.	

Guideline: prefer the local terminology.	

USA has state and county, Italy region and province. 	

Italian region is merged with region (generic administrative entity) and generic region
(another more generic entity).	

!

Lead to Ontological issues.	

Place types are not themselves structured into a defined thesaurus, neither they
are formally distinguished in different domains, with specific rules to disambiguate
them.	

!
!

11
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Section 2: proposal

12
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Desiderata
!
•
•
•
•
•
•

Allow for comparison beyond single project (data integration)	

Interoperability and portability	

Scalability	

More accurate retrieval	

Reasoning…	

Essentially: make vocabularies more machine-actionable

One possible solution: integrate a more strict knowledge model in the backend
of controlled vocabularies. Express it via thesauri of concepts built abiding to
ontologies.	

Standards already there: ISO 25964 (data model), SKOS (ontology)

13
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

An example - I
!

List of monasteries in France:	

Id

Name

Type

…

1

Manlieu Abbey

tgn:monastery

…

2

Argentan Abbey

tgn:monastery

…

…

…

…

…

Can we improve on the simple tag “monastery”?

14
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

An example - II
Thesaurus of concepts:	


skos:Concept rdf:about=labelling.org/function-concept-10
skos:prefLabel xml:lang=enworship/skos:prefLabel/skos:Concept
skos:Concept rdf:about=labelling.org/function-concept-11
skos:prefLabel xml:lang=enestate administration/skos:prefLabel/
skos:Concept

!

Controlled vocabulary of place types:	


skos:Concept rdf:about=labelling.org/voc7/label-33
skos:prefLabel xml:lang=enmonastery/skos:prefLabel
skos:related rdf:resource=labelling.org/function-concept-10
skos:related rdf:resource=labelling.org/function-concept-11
/skos:Concept

!

In the database:	

Id

Name

Type

1

Manlieu Abbey

voc7/label-33:monastery

2

Argentan Abbey

voc7/label-33:monastery

…

…

…

15
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Idea - concept
An integrated approach:	

1. develop back-end thesauri	

2. vocabularies are built as needed, in natural language, associating tags with
formally defined concepts (avoid late integration)

!

n-m mapping between vocabularies and ontologies. 	

Focus on what’s shared. Add details to the backend.	

Pareto principle: 80% effects (tags we need) come from
20% causes (concepts).

16
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Historical place types - I
Quite problematic:	

• Same nouns mean different things in space, time, culture	

• Generic tags for specific meanings lead to ambiguity	

• Layers of knowledge: historical agents, socio-political contexts, historians’
interpretations, etc.	

!

Example: “palazzo” in Medieval and Early Modern Venice.	

For contemporaries:	

Doge’s palace - Other nobles’ palaces had proper names, e.g. Ca’ Foscari means
House Foscari	

For us:	

A category of (historical) buildings — usually former nobles’ residences	

OR a more generic category of somewhat big buildings

17
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Functional categorisation - I
Historical knowledge is mostly about events and processes, which drive the
production of evidence (sources)	

!
!
!
!
!
!
!
!

@Grossner, Representing Historical Knowledge in Geographic Information Systems, 2010.

18
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Functional categorisation - II
We model representations more than real objects, and we study humans:
purpose and function are the main concern	

!

From nouns to verbs:	

• Most vocabularies of place types/features are already loosely classified by
functionality (economic activity, leisure facility, place of culture, etc.)	

• There are less verbs than nouns (Wordnet synsets: ~82k nouns, ~14k
verbs)	

• Verbs act as bridges between concepts in natural language, linked data
triples, etc…	

!

Not the only perspective (e.g. natural features, institutions), but a starting point	


19
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Example: Barber-shops in Venice
!
!
!
!
!
!
!
!
!
!

@Filippo De Vivo, Patrizi, informatori, barbieri. Politica e comunicazione a Venezia nella prima età moderna. Milan: Feltrinelli,
2012. In English: id., Information and communication in Venice: Rethinking Early Modern Politics. Oxford: Oxford University
Press, 2007.

20
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Historical place types - II
Problems:	

1. Same nouns mean different things in space, time, culture	

2. Generic tags for specific meanings lead to ambiguity	

3. Layers of knowledge: historical agents, socio-political contexts, historians’
interpretations, etc.	

!

Expected outcomes:	

1. We can add specifications of space, time, culture to concepts defining a term	

2. Generic tags can be linked to specific concepts	

3. The process of linking vocabulary terms to concepts helps the historian clarify
its reasoning and the layer of knowledge s/he is representing	

!

21
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Historical place types - III
Solving the “palazzo” problem:	

!

22
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

What is a place - conceptual model I

23
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

What is a place - conceptual model II

24
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Section 3: implementation

25
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

How?
!

Build thesauri of functions with a bottom-up approach, from sources	

Build vocabularies when needed, reusing existing if possible	

Develop a software to integrate such thesauri and the creation/re-use of
controlled vocabularies	

• Raise and foster a community of interest and work together	

•
•
•

!
!

Let’s break down each part…

26
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Building thesauri of functions - I
!

Good starting point: Getty’s AAT function facets: 	

http://www.getty.edu/vow/AATHierarchy?find=logic=ANDnote=subjectid=300054593	


!

Provide a general framework, i.e. functional domains and upper layers: economics,
government, social, education, etc.	

!

Small teams of historians and ontologists: start from sources and make explicit
part of the knowledge entailed in them.	

A process of abstraction from detail and generalisation.	

!

Let’s see an example…	


27
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Building thesauri of functions - II

28
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Building thesauri of functions - III
Giovanni Bartolomeo da
Gabiano, bookseller, publisher
and entrepreneur in Venice.	

!

Business letters from which
we can infer the activities
going on at his shop at Rialto.
“Data in mane de Messer Ioanne Bertolamie a la libraria da la Fontana in Venecia”	

“Given into the hands of Mr Giovanni Bartolomeo, at the bookshop at the Fountain
[insigna] in Venice”

29
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Building thesauri of functions - IV

This letter mentions new editions being made —apparently the market was good for medical treatises:
Avicenna, Aliabate, etc.— and engravings ordered for them.

Various activities, today usually considered as separated:	

• book-selling, accounting, warehousing, etc.	

• publishing and sometimes printing	

• patronage and other social activities	

• …	


30
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Building vocabularies - I
Similar to building thesauri of functions, but without supervision.	

!

Essential to:	

• permit to use the same tags we’re currently finding in controlled vocabularies,
thus natural language and possibly no definitions	

• allow for intuitive linkage with thesauri, and suggest vocabulary tags already
built and close in meaning	

• design for continuous growing: term merge or split, sub-categories, …

31
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Building vocabularies - II
An example: Venetian fiscal declarations, 1514.	

Rented “flats” (litt. small houses: ‘chaseta’) for residence: ‘flat’ (in vocabulary)	

Possible interesting functions according to source: ‘renting’, ‘lodging’/‘dwelling’
under ‘economic functions’.

32
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Building vocabularies - III

@Luzzatto, Sergio,
Pedullà, Gabriele
(editors), Atlante della
letteratura italiana, vol.
1, Torino, Einaudi,
2010.

33
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

The software: Labelling system - I
!

Design requirements:	

!
•
•
•
•
•
•
•
!

Building thesauri of concepts in the back-end. 	

Building controlled vocabularies, for users. 	

Querying the system for such contents (for every agent, openly).	

Administering and linking all these tasks and users into a single system.	

Provide transparent management of the most used data formats.	

Reuse open source solution whenever possible.	

Be very intuitive and easy to use.	


Waiting for possible grant on this…

34
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

The software: Labelling system - II

35
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

The software: Labelling system - III

http://www.vocabularyserver.com/
36
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

The software: Labelling system - IV
!

Implementation is key:	

• we are struggling to have several people from different backgrounds work with
standards such as SKOS and RDF	

• we need a common entry point, as transparent as possible	

• we need to differentiate vocabulary building and thesauri of functions
concretely	

!

37
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

The community - I
Work on integration and alignment requires a community of interest and long
time—slow growth.	

!

Experts’ workshop on Controlled Vocabularies, Mainz 10-11/10/2013:	

• gathered experts from different fields (history, IT, geography, …)	

• discussed extensively about place types and the functional categorisation	

• established a working group to start the process	

!

As of today:	

• circa 30 experts	

• wiki space and mailing list within DARIAH-DE

38
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

The community - II
Space on DARIAH-DE wiki.	

Already populated with references, vocabularies and first alignment projects, plus
the RDF (with SKOS) version of the Getty’s AAT function facets.	

!
!
!
!
!
!
!
!

Send me an e-mail to join us :)

39
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Summary
!
!

1.
2.
3.
4.

Motivation: controlled vocabularies are ambiguous and lack definitions	

Object: Historical place types	

Proposal: use functional categorisation to overcome limitations	

Implementation: community of interest, reuse of standards, ad-hoc software,
bottom-up source-based approach

40
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Future directions
Short term priority:	

development of the Labelling system	

!

Long term:	

• engage with more researchers and projects	

• test the method in different settings	

• steadily grow the vocabulary base	

• integrate existing vocabularies in the system

41
Leipzig eHumanities Seminars	

11/12/2013

Giovanni Colavizza

Thanks!	

!
!

Giovanni Colavizza	


Leibniz-Institut für Europäische Geschichte (IEG), Mainz 	


!

Colavizza@ieg-mainz.de	


42

Weitere ähnliche Inhalte

Ähnlich wie Leipzig Functional Categorisation 11/12/2013

Introducing CIDOC-CRM (Cch KR workshop #2.1)
Introducing CIDOC-CRM (Cch KR workshop #2.1)Introducing CIDOC-CRM (Cch KR workshop #2.1)
Introducing CIDOC-CRM (Cch KR workshop #2.1)
Michele Pasin
 
Knowledge = Information + Context
Knowledge = Information + ContextKnowledge = Information + Context
Knowledge = Information + Context
Stefan Gradmann
 
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
Michele Pasin
 
Li 804 alphild dick
Li 804 alphild dickLi 804 alphild dick
Li 804 alphild dick
Alphild Dick
 
LIS 653 Posters
LIS 653 PostersLIS 653 Posters
LIS 653 Posters
PrattSILS
 

Ähnlich wie Leipzig Functional Categorisation 11/12/2013 (20)

Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
Thatcamp recap
Thatcamp recapThatcamp recap
Thatcamp recap
 
Introducing CIDOC-CRM (Cch KR workshop #2.1)
Introducing CIDOC-CRM (Cch KR workshop #2.1)Introducing CIDOC-CRM (Cch KR workshop #2.1)
Introducing CIDOC-CRM (Cch KR workshop #2.1)
 
20110929 tpdl2011 dl-research-humboldt
20110929 tpdl2011 dl-research-humboldt20110929 tpdl2011 dl-research-humboldt
20110929 tpdl2011 dl-research-humboldt
 
Knowledge = Information + Context
Knowledge = Information + ContextKnowledge = Information + Context
Knowledge = Information + Context
 
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...
 
Share: discovery: a focus on papers
Share: discovery: a focus on papersShare: discovery: a focus on papers
Share: discovery: a focus on papers
 
An Outline Of Type-Theoretical Approaches To Lexical Semantics
An Outline Of Type-Theoretical Approaches To Lexical SemanticsAn Outline Of Type-Theoretical Approaches To Lexical Semantics
An Outline Of Type-Theoretical Approaches To Lexical Semantics
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
 
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
 
Beyond the Silos of the LAMs
Beyond the Silos of the LAMsBeyond the Silos of the LAMs
Beyond the Silos of the LAMs
 
How a Prototype Argues
How a Prototype ArguesHow a Prototype Argues
How a Prototype Argues
 
Computing and Linguistics: A cognitive approach
Computing and Linguistics: A cognitive approachComputing and Linguistics: A cognitive approach
Computing and Linguistics: A cognitive approach
 
Li 804 alphild dick
Li 804 alphild dickLi 804 alphild dick
Li 804 alphild dick
 
Linking Data the ALM way (Boris Zetterlund)
Linking Data the ALM way (Boris Zetterlund)Linking Data the ALM way (Boris Zetterlund)
Linking Data the ALM way (Boris Zetterlund)
 
Mortenson Distinguished Lecture2010
Mortenson Distinguished Lecture2010Mortenson Distinguished Lecture2010
Mortenson Distinguished Lecture2010
 
An Ontology For Historical Research Documents
An Ontology For Historical Research DocumentsAn Ontology For Historical Research Documents
An Ontology For Historical Research Documents
 
Material Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre MonninMaterial Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre Monnin
 
Essay On Library Museum And Archive
Essay On Library Museum And ArchiveEssay On Library Museum And Archive
Essay On Library Museum And Archive
 
LIS 653 Posters
LIS 653 PostersLIS 653 Posters
LIS 653 Posters
 

Mehr von Giovanni Colavizza

Udine Digital Humanities 19/11/2013
Udine Digital Humanities 19/11/2013Udine Digital Humanities 19/11/2013
Udine Digital Humanities 19/11/2013
Giovanni Colavizza
 
Venezia Biblioteche e Digital Humanities 28/10/2013
Venezia Biblioteche e Digital Humanities 28/10/2013Venezia Biblioteche e Digital Humanities 28/10/2013
Venezia Biblioteche e Digital Humanities 28/10/2013
Giovanni Colavizza
 

Mehr von Giovanni Colavizza (14)

Sul ruolo dell’umanista nelle Digital Humanities
Sul ruolo dell’umanista nelle Digital HumanitiesSul ruolo dell’umanista nelle Digital Humanities
Sul ruolo dell’umanista nelle Digital Humanities
 
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
La Venice Time Machine e alcune sfide dei progetti “Big Science” nelle discip...
 
The References of References: Enriching Library Catalogs via Domain-Specific ...
The References of References: Enriching Library Catalogs via Domain-Specific ...The References of References: Enriching Library Catalogs via Domain-Specific ...
The References of References: Enriching Library Catalogs via Domain-Specific ...
 
A Cliometrics’ view on the Garzoni database
A Cliometrics’ view on the Garzoni databaseA Cliometrics’ view on the Garzoni database
A Cliometrics’ view on the Garzoni database
 
Venice 1740 Reconstruction
Venice 1740 ReconstructionVenice 1740 Reconstruction
Venice 1740 Reconstruction
 
On Mining Citations to Primary and Secondary Sources in Historiography
On Mining Citations to Primary and Secondary Sources in HistoriographyOn Mining Citations to Primary and Secondary Sources in Historiography
On Mining Citations to Primary and Secondary Sources in Historiography
 
Notes de bas de page: d’un outil savant aux hyperliens
Notes de bas de page: d’un outil savant aux hyperliensNotes de bas de page: d’un outil savant aux hyperliens
Notes de bas de page: d’un outil savant aux hyperliens
 
Mapping Early Modern News Networks
Mapping Early Modern News NetworksMapping Early Modern News Networks
Mapping Early Modern News Networks
 
Report on Ongoing Digitisation and Information System Design for VTM
Report on Ongoing Digitisation and Information System Design for VTMReport on Ongoing Digitisation and Information System Design for VTM
Report on Ongoing Digitisation and Information System Design for VTM
 
Mapping the News Networks in XVII Italy
Mapping the News Networks in XVII ItalyMapping the News Networks in XVII Italy
Mapping the News Networks in XVII Italy
 
Garzoni conference 11 October 2014
Garzoni conference 11 October 2014Garzoni conference 11 October 2014
Garzoni conference 11 October 2014
 
Linked Books - DH Venice Fall School 2014
Linked Books - DH Venice Fall School 2014Linked Books - DH Venice Fall School 2014
Linked Books - DH Venice Fall School 2014
 
Udine Digital Humanities 19/11/2013
Udine Digital Humanities 19/11/2013Udine Digital Humanities 19/11/2013
Udine Digital Humanities 19/11/2013
 
Venezia Biblioteche e Digital Humanities 28/10/2013
Venezia Biblioteche e Digital Humanities 28/10/2013Venezia Biblioteche e Digital Humanities 28/10/2013
Venezia Biblioteche e Digital Humanities 28/10/2013
 

Leipzig Functional Categorisation 11/12/2013

  • 1. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Functional Categorisation for Historical Place Types ! ! Giovanni Colavizza Leibniz-Institut für Europäische Geschichte (IEG), Mainz ! Colavizza@ieg-mainz.de 1
  • 2. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Section 1: introduction and motivations 2
  • 3. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza The topic ! Controlled vocabulary: “a pre-selected list of terms used for categorisation.” or “an organized arrangement of words and phrases used to index content and/ or to retrieve content through browsing or searching.” @Patricia Harpring ! Gazetteer: “a geographical dictionary or directory used in conjunction with a map or atlas.” @Wikipedia ! ! Focus for this talk: Controlled Vocabularies of concepts, not proper names. Historical Place Types. ! 3
  • 4. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Examples Terms—label – concept relations—are often not defined. Natural language is context and interpretation specific. ! ! @Dalia Varanka, A topographic feature taxonomy for a US national topographic mapping ontology, 2009. ! ! ! @Excerpt from LinkedGeoData ontology. 4
  • 5. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Motivations ! ! Quantitative analysis ↦ Classification ! Classification for quantitative analysis ↦ unambiguous, consistent, shared ! Controlled vocabularies for place types at the moment: • grow out of necessity, are project specific • have high degree of ambiguity • lack of explicit (formal) definitions of terms • are not designed for portability and reuse 5
  • 6. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Basic definitions - I ! ! Taxonomy: “a semantic network of concepts (referred to, or labeled, via a controlled vocabulary), linked by hierarchical relationships. A taxonomy is thus a limited thesaurus.” ! Thesaurus: “a semantic network of concepts (referred to, or labeled, via a controlled vocabulary), linked by equivalence, hierarchical and associative relationships.” ! Ontology: “formal and explicit specification of a shared conceptualisation.” @Studer, 1998 and Guarino, 2009 6
  • 7. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Basic definitions - II ! Taxonomy: contains categories organised hierarchically. Used to classify. E. g. “vehicles” ↦ “terrestrial vehicles” ↦ “car”. ! Thesaurus: contains concepts and labels for them, organised relationally. Used to index and search. E. g. “terrestrial vehicles” ↦ “car”@en (alternatives: “macchina”@it, “voiture”@fr, .. relates_to: “car park”, “highway”, ..) ! Ontology: contains classes, properties and logical rules. Eventually instances of classes. Used to instance and reason. E. g. “car” is_subclass_of “vehicle”. “has_horsepower” is a property between an instance of class “car” and an positive integer. “Audi RS Q3” is_a “car”. And so on.. @Thomas Francart 7
  • 8. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Getty’s TGN - I Getty’s Thesaurus of Geographic Names: “a database of places in context.” ! Target: professionals in the heritage sector. Always growing by design. ! Structure of a record: 8
  • 9. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Getty’s TGN - II Possible tensions from two directions: • The mix of physical features and administrative entities in the hierarchies, since “a geographic place is an administrative entity or a physical feature with a name”. • The account for both current and historical places, types and hierarchies. ! Good ideas: • Instances of administrative entities. E.g. Ancient Egypt (former nation) is predecessor of Egypt (modern nation). • Instances have time spans. 9
  • 10. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Getty’s TGN - III ! ! Place type: “a term that characterises a significant aspect of the place, including its role, function, political anatomy, size, or physical characteristics.” @TGNGuidelines, section 3.6.1.1 ! Foundation for the hierarchy of every TGN record via preferred type. Organised in flat general categories (Christian types, Physical features types, etc.). ! Most place types can be assigned to three macro-areas: physical features, administrative divisions (geopolitics and internal state structure) and functions (religious, economic, social, etc.). ! ! 10
  • 11. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Getty’s TGN - IV ! Terminological issues. Guideline: prefer the local terminology. USA has state and county, Italy region and province. Italian region is merged with region (generic administrative entity) and generic region (another more generic entity). ! Lead to Ontological issues. Place types are not themselves structured into a defined thesaurus, neither they are formally distinguished in different domains, with specific rules to disambiguate them. ! ! 11
  • 12. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Section 2: proposal 12
  • 13. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Desiderata ! • • • • • • Allow for comparison beyond single project (data integration) Interoperability and portability Scalability More accurate retrieval Reasoning… Essentially: make vocabularies more machine-actionable One possible solution: integrate a more strict knowledge model in the backend of controlled vocabularies. Express it via thesauri of concepts built abiding to ontologies. Standards already there: ISO 25964 (data model), SKOS (ontology) 13
  • 14. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza An example - I ! List of monasteries in France: Id Name Type … 1 Manlieu Abbey tgn:monastery … 2 Argentan Abbey tgn:monastery … … … … … Can we improve on the simple tag “monastery”? 14
  • 15. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza An example - II Thesaurus of concepts: skos:Concept rdf:about=labelling.org/function-concept-10 skos:prefLabel xml:lang=enworship/skos:prefLabel/skos:Concept skos:Concept rdf:about=labelling.org/function-concept-11 skos:prefLabel xml:lang=enestate administration/skos:prefLabel/ skos:Concept ! Controlled vocabulary of place types: skos:Concept rdf:about=labelling.org/voc7/label-33 skos:prefLabel xml:lang=enmonastery/skos:prefLabel skos:related rdf:resource=labelling.org/function-concept-10 skos:related rdf:resource=labelling.org/function-concept-11 /skos:Concept ! In the database: Id Name Type 1 Manlieu Abbey voc7/label-33:monastery 2 Argentan Abbey voc7/label-33:monastery … … … 15
  • 16. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Idea - concept An integrated approach: 1. develop back-end thesauri 2. vocabularies are built as needed, in natural language, associating tags with formally defined concepts (avoid late integration) ! n-m mapping between vocabularies and ontologies. Focus on what’s shared. Add details to the backend. Pareto principle: 80% effects (tags we need) come from 20% causes (concepts). 16
  • 17. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Historical place types - I Quite problematic: • Same nouns mean different things in space, time, culture • Generic tags for specific meanings lead to ambiguity • Layers of knowledge: historical agents, socio-political contexts, historians’ interpretations, etc. ! Example: “palazzo” in Medieval and Early Modern Venice. For contemporaries: Doge’s palace - Other nobles’ palaces had proper names, e.g. Ca’ Foscari means House Foscari For us: A category of (historical) buildings — usually former nobles’ residences OR a more generic category of somewhat big buildings 17
  • 18. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Functional categorisation - I Historical knowledge is mostly about events and processes, which drive the production of evidence (sources) ! ! ! ! ! ! ! ! @Grossner, Representing Historical Knowledge in Geographic Information Systems, 2010. 18
  • 19. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Functional categorisation - II We model representations more than real objects, and we study humans: purpose and function are the main concern ! From nouns to verbs: • Most vocabularies of place types/features are already loosely classified by functionality (economic activity, leisure facility, place of culture, etc.) • There are less verbs than nouns (Wordnet synsets: ~82k nouns, ~14k verbs) • Verbs act as bridges between concepts in natural language, linked data triples, etc… ! Not the only perspective (e.g. natural features, institutions), but a starting point 19
  • 20. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Example: Barber-shops in Venice ! ! ! ! ! ! ! ! ! ! @Filippo De Vivo, Patrizi, informatori, barbieri. Politica e comunicazione a Venezia nella prima età moderna. Milan: Feltrinelli, 2012. In English: id., Information and communication in Venice: Rethinking Early Modern Politics. Oxford: Oxford University Press, 2007. 20
  • 21. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Historical place types - II Problems: 1. Same nouns mean different things in space, time, culture 2. Generic tags for specific meanings lead to ambiguity 3. Layers of knowledge: historical agents, socio-political contexts, historians’ interpretations, etc. ! Expected outcomes: 1. We can add specifications of space, time, culture to concepts defining a term 2. Generic tags can be linked to specific concepts 3. The process of linking vocabulary terms to concepts helps the historian clarify its reasoning and the layer of knowledge s/he is representing ! 21
  • 22. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Historical place types - III Solving the “palazzo” problem: ! 22
  • 23. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza What is a place - conceptual model I 23
  • 24. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza What is a place - conceptual model II 24
  • 25. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Section 3: implementation 25
  • 26. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza How? ! Build thesauri of functions with a bottom-up approach, from sources Build vocabularies when needed, reusing existing if possible Develop a software to integrate such thesauri and the creation/re-use of controlled vocabularies • Raise and foster a community of interest and work together • • • ! ! Let’s break down each part… 26
  • 27. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Building thesauri of functions - I ! Good starting point: Getty’s AAT function facets: http://www.getty.edu/vow/AATHierarchy?find=logic=ANDnote=subjectid=300054593 ! Provide a general framework, i.e. functional domains and upper layers: economics, government, social, education, etc. ! Small teams of historians and ontologists: start from sources and make explicit part of the knowledge entailed in them. A process of abstraction from detail and generalisation. ! Let’s see an example… 27
  • 28. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Building thesauri of functions - II 28
  • 29. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Building thesauri of functions - III Giovanni Bartolomeo da Gabiano, bookseller, publisher and entrepreneur in Venice. ! Business letters from which we can infer the activities going on at his shop at Rialto. “Data in mane de Messer Ioanne Bertolamie a la libraria da la Fontana in Venecia” “Given into the hands of Mr Giovanni Bartolomeo, at the bookshop at the Fountain [insigna] in Venice” 29
  • 30. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Building thesauri of functions - IV This letter mentions new editions being made —apparently the market was good for medical treatises: Avicenna, Aliabate, etc.— and engravings ordered for them. Various activities, today usually considered as separated: • book-selling, accounting, warehousing, etc. • publishing and sometimes printing • patronage and other social activities • … 30
  • 31. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Building vocabularies - I Similar to building thesauri of functions, but without supervision. ! Essential to: • permit to use the same tags we’re currently finding in controlled vocabularies, thus natural language and possibly no definitions • allow for intuitive linkage with thesauri, and suggest vocabulary tags already built and close in meaning • design for continuous growing: term merge or split, sub-categories, … 31
  • 32. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Building vocabularies - II An example: Venetian fiscal declarations, 1514. Rented “flats” (litt. small houses: ‘chaseta’) for residence: ‘flat’ (in vocabulary) Possible interesting functions according to source: ‘renting’, ‘lodging’/‘dwelling’ under ‘economic functions’. 32
  • 33. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Building vocabularies - III @Luzzatto, Sergio, Pedullà, Gabriele (editors), Atlante della letteratura italiana, vol. 1, Torino, Einaudi, 2010. 33
  • 34. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza The software: Labelling system - I ! Design requirements: ! • • • • • • • ! Building thesauri of concepts in the back-end. Building controlled vocabularies, for users. Querying the system for such contents (for every agent, openly). Administering and linking all these tasks and users into a single system. Provide transparent management of the most used data formats. Reuse open source solution whenever possible. Be very intuitive and easy to use. Waiting for possible grant on this… 34
  • 35. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza The software: Labelling system - II 35
  • 36. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza The software: Labelling system - III http://www.vocabularyserver.com/ 36
  • 37. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza The software: Labelling system - IV ! Implementation is key: • we are struggling to have several people from different backgrounds work with standards such as SKOS and RDF • we need a common entry point, as transparent as possible • we need to differentiate vocabulary building and thesauri of functions concretely ! 37
  • 38. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza The community - I Work on integration and alignment requires a community of interest and long time—slow growth. ! Experts’ workshop on Controlled Vocabularies, Mainz 10-11/10/2013: • gathered experts from different fields (history, IT, geography, …) • discussed extensively about place types and the functional categorisation • established a working group to start the process ! As of today: • circa 30 experts • wiki space and mailing list within DARIAH-DE 38
  • 39. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza The community - II Space on DARIAH-DE wiki. Already populated with references, vocabularies and first alignment projects, plus the RDF (with SKOS) version of the Getty’s AAT function facets. ! ! ! ! ! ! ! ! Send me an e-mail to join us :) 39
  • 40. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Summary ! ! 1. 2. 3. 4. Motivation: controlled vocabularies are ambiguous and lack definitions Object: Historical place types Proposal: use functional categorisation to overcome limitations Implementation: community of interest, reuse of standards, ad-hoc software, bottom-up source-based approach 40
  • 41. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Future directions Short term priority: development of the Labelling system ! Long term: • engage with more researchers and projects • test the method in different settings • steadily grow the vocabulary base • integrate existing vocabularies in the system 41
  • 42. Leipzig eHumanities Seminars 11/12/2013 Giovanni Colavizza Thanks! ! ! Giovanni Colavizza Leibniz-Institut für Europäische Geschichte (IEG), Mainz ! Colavizza@ieg-mainz.de 42