Venezia Biblioteche e Digital Humanities 28/10/2013
Leipzig Functional Categorisation 11/12/2013
1. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Functional Categorisation for
Historical Place Types
!
!
Giovanni Colavizza
Leibniz-Institut für Europäische Geschichte (IEG), Mainz
!
Colavizza@ieg-mainz.de
1
3. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
The topic
!
Controlled vocabulary: “a pre-selected list of terms used for categorisation.”
or “an organized arrangement of words and phrases used to index content and/
or to retrieve content through browsing or searching.” @Patricia Harpring
!
Gazetteer: “a geographical dictionary or directory used in conjunction with a map
or atlas.” @Wikipedia
!
!
Focus for this talk:
Controlled Vocabularies of concepts, not proper names.
Historical Place Types.
!
3
4. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Examples
Terms—label – concept relations—are often not defined.
Natural language is context and interpretation specific.
!
!
@Dalia Varanka, A topographic feature
taxonomy for a US national topographic
mapping ontology, 2009.
!
!
!
@Excerpt from
LinkedGeoData ontology.
4
5. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Motivations
!
!
Quantitative analysis ↦ Classification
!
Classification for quantitative analysis ↦ unambiguous, consistent, shared
!
Controlled vocabularies for place types at the moment:
• grow out of necessity, are project specific
• have high degree of ambiguity
• lack of explicit (formal) definitions of terms
• are not designed for portability and reuse
5
6. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Basic definitions - I
!
!
Taxonomy: “a semantic network of concepts (referred to, or labeled, via a
controlled vocabulary), linked by hierarchical relationships. A taxonomy is thus a
limited thesaurus.”
!
Thesaurus: “a semantic network of concepts (referred to, or labeled, via a
controlled vocabulary), linked by equivalence, hierarchical and associative
relationships.”
!
Ontology: “formal and explicit specification of a shared conceptualisation.”
@Studer, 1998 and Guarino, 2009
6
7. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Basic definitions - II
!
Taxonomy: contains categories organised hierarchically. Used to classify.
E. g. “vehicles” ↦ “terrestrial vehicles” ↦ “car”.
!
Thesaurus: contains concepts and labels for them, organised relationally. Used to
index and search. E. g. “terrestrial vehicles” ↦ “car”@en (alternatives:
“macchina”@it, “voiture”@fr, .. relates_to: “car park”, “highway”, ..)
!
Ontology: contains classes, properties and logical rules. Eventually instances of
classes. Used to instance and reason.
E. g. “car” is_subclass_of “vehicle”. “has_horsepower” is a property between an
instance of class “car” and an positive integer. “Audi RS Q3” is_a “car”.
And so on..
@Thomas Francart
7
8. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Getty’s TGN - I
Getty’s Thesaurus of Geographic Names: “a database of places in context.”
!
Target: professionals in the heritage sector. Always growing by design.
!
Structure of a record:
8
9. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Getty’s TGN - II
Possible tensions from two directions:
• The mix of physical features and administrative entities in the hierarchies, since
“a geographic place is an administrative entity or a physical feature with a
name”.
• The account for both current and historical places, types and hierarchies.
!
Good ideas:
• Instances of administrative entities. E.g. Ancient Egypt (former nation) is
predecessor of Egypt (modern nation).
• Instances have time spans.
9
10. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Getty’s TGN - III
!
!
Place type: “a term that characterises a significant aspect of the place, including its
role, function, political anatomy, size, or physical characteristics.”
@TGNGuidelines, section 3.6.1.1
!
Foundation for the hierarchy of every TGN record via preferred type.
Organised in flat general categories (Christian types, Physical features types, etc.).
!
Most place types can be assigned to three macro-areas: physical features,
administrative divisions (geopolitics and internal state structure) and functions
(religious, economic, social, etc.).
!
!
10
11. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Getty’s TGN - IV
!
Terminological issues.
Guideline: prefer the local terminology.
USA has state and county, Italy region and province.
Italian region is merged with region (generic administrative entity) and generic region
(another more generic entity).
!
Lead to Ontological issues.
Place types are not themselves structured into a defined thesaurus, neither they
are formally distinguished in different domains, with specific rules to disambiguate
them.
!
!
11
13. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Desiderata
!
•
•
•
•
•
•
Allow for comparison beyond single project (data integration)
Interoperability and portability
Scalability
More accurate retrieval
Reasoning…
Essentially: make vocabularies more machine-actionable
One possible solution: integrate a more strict knowledge model in the backend
of controlled vocabularies. Express it via thesauri of concepts built abiding to
ontologies.
Standards already there: ISO 25964 (data model), SKOS (ontology)
13
14. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
An example - I
!
List of monasteries in France:
Id
Name
Type
…
1
Manlieu Abbey
tgn:monastery
…
2
Argentan Abbey
tgn:monastery
…
…
…
…
…
Can we improve on the simple tag “monastery”?
14
15. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
An example - II
Thesaurus of concepts:
skos:Concept rdf:about=labelling.org/function-concept-10
skos:prefLabel xml:lang=enworship/skos:prefLabel/skos:Concept
skos:Concept rdf:about=labelling.org/function-concept-11
skos:prefLabel xml:lang=enestate administration/skos:prefLabel/
skos:Concept
!
Controlled vocabulary of place types:
skos:Concept rdf:about=labelling.org/voc7/label-33
skos:prefLabel xml:lang=enmonastery/skos:prefLabel
skos:related rdf:resource=labelling.org/function-concept-10
skos:related rdf:resource=labelling.org/function-concept-11
/skos:Concept
!
In the database:
Id
Name
Type
1
Manlieu Abbey
voc7/label-33:monastery
2
Argentan Abbey
voc7/label-33:monastery
…
…
…
15
16. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Idea - concept
An integrated approach:
1. develop back-end thesauri
2. vocabularies are built as needed, in natural language, associating tags with
formally defined concepts (avoid late integration)
!
n-m mapping between vocabularies and ontologies.
Focus on what’s shared. Add details to the backend.
Pareto principle: 80% effects (tags we need) come from
20% causes (concepts).
16
17. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Historical place types - I
Quite problematic:
• Same nouns mean different things in space, time, culture
• Generic tags for specific meanings lead to ambiguity
• Layers of knowledge: historical agents, socio-political contexts, historians’
interpretations, etc.
!
Example: “palazzo” in Medieval and Early Modern Venice.
For contemporaries:
Doge’s palace - Other nobles’ palaces had proper names, e.g. Ca’ Foscari means
House Foscari
For us:
A category of (historical) buildings — usually former nobles’ residences
OR a more generic category of somewhat big buildings
17
18. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Functional categorisation - I
Historical knowledge is mostly about events and processes, which drive the
production of evidence (sources)
!
!
!
!
!
!
!
!
@Grossner, Representing Historical Knowledge in Geographic Information Systems, 2010.
18
19. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Functional categorisation - II
We model representations more than real objects, and we study humans:
purpose and function are the main concern
!
From nouns to verbs:
• Most vocabularies of place types/features are already loosely classified by
functionality (economic activity, leisure facility, place of culture, etc.)
• There are less verbs than nouns (Wordnet synsets: ~82k nouns, ~14k
verbs)
• Verbs act as bridges between concepts in natural language, linked data
triples, etc…
!
Not the only perspective (e.g. natural features, institutions), but a starting point
19
20. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Example: Barber-shops in Venice
!
!
!
!
!
!
!
!
!
!
@Filippo De Vivo, Patrizi, informatori, barbieri. Politica e comunicazione a Venezia nella prima età moderna. Milan: Feltrinelli,
2012. In English: id., Information and communication in Venice: Rethinking Early Modern Politics. Oxford: Oxford University
Press, 2007.
20
21. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Historical place types - II
Problems:
1. Same nouns mean different things in space, time, culture
2. Generic tags for specific meanings lead to ambiguity
3. Layers of knowledge: historical agents, socio-political contexts, historians’
interpretations, etc.
!
Expected outcomes:
1. We can add specifications of space, time, culture to concepts defining a term
2. Generic tags can be linked to specific concepts
3. The process of linking vocabulary terms to concepts helps the historian clarify
its reasoning and the layer of knowledge s/he is representing
!
21
26. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
How?
!
Build thesauri of functions with a bottom-up approach, from sources
Build vocabularies when needed, reusing existing if possible
Develop a software to integrate such thesauri and the creation/re-use of
controlled vocabularies
• Raise and foster a community of interest and work together
•
•
•
!
!
Let’s break down each part…
26
27. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Building thesauri of functions - I
!
Good starting point: Getty’s AAT function facets:
http://www.getty.edu/vow/AATHierarchy?find=logic=ANDnote=subjectid=300054593
!
Provide a general framework, i.e. functional domains and upper layers: economics,
government, social, education, etc.
!
Small teams of historians and ontologists: start from sources and make explicit
part of the knowledge entailed in them.
A process of abstraction from detail and generalisation.
!
Let’s see an example…
27
29. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Building thesauri of functions - III
Giovanni Bartolomeo da
Gabiano, bookseller, publisher
and entrepreneur in Venice.
!
Business letters from which
we can infer the activities
going on at his shop at Rialto.
“Data in mane de Messer Ioanne Bertolamie a la libraria da la Fontana in Venecia”
“Given into the hands of Mr Giovanni Bartolomeo, at the bookshop at the Fountain
[insigna] in Venice”
29
30. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Building thesauri of functions - IV
This letter mentions new editions being made —apparently the market was good for medical treatises:
Avicenna, Aliabate, etc.— and engravings ordered for them.
Various activities, today usually considered as separated:
• book-selling, accounting, warehousing, etc.
• publishing and sometimes printing
• patronage and other social activities
• …
30
31. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Building vocabularies - I
Similar to building thesauri of functions, but without supervision.
!
Essential to:
• permit to use the same tags we’re currently finding in controlled vocabularies,
thus natural language and possibly no definitions
• allow for intuitive linkage with thesauri, and suggest vocabulary tags already
built and close in meaning
• design for continuous growing: term merge or split, sub-categories, …
31
32. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Building vocabularies - II
An example: Venetian fiscal declarations, 1514.
Rented “flats” (litt. small houses: ‘chaseta’) for residence: ‘flat’ (in vocabulary)
Possible interesting functions according to source: ‘renting’, ‘lodging’/‘dwelling’
under ‘economic functions’.
32
33. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Building vocabularies - III
@Luzzatto, Sergio,
Pedullà, Gabriele
(editors), Atlante della
letteratura italiana, vol.
1, Torino, Einaudi,
2010.
33
34. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
The software: Labelling system - I
!
Design requirements:
!
•
•
•
•
•
•
•
!
Building thesauri of concepts in the back-end.
Building controlled vocabularies, for users.
Querying the system for such contents (for every agent, openly).
Administering and linking all these tasks and users into a single system.
Provide transparent management of the most used data formats.
Reuse open source solution whenever possible.
Be very intuitive and easy to use.
Waiting for possible grant on this…
34
37. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
The software: Labelling system - IV
!
Implementation is key:
• we are struggling to have several people from different backgrounds work with
standards such as SKOS and RDF
• we need a common entry point, as transparent as possible
• we need to differentiate vocabulary building and thesauri of functions
concretely
!
37
38. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
The community - I
Work on integration and alignment requires a community of interest and long
time—slow growth.
!
Experts’ workshop on Controlled Vocabularies, Mainz 10-11/10/2013:
• gathered experts from different fields (history, IT, geography, …)
• discussed extensively about place types and the functional categorisation
• established a working group to start the process
!
As of today:
• circa 30 experts
• wiki space and mailing list within DARIAH-DE
38
39. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
The community - II
Space on DARIAH-DE wiki.
Already populated with references, vocabularies and first alignment projects, plus
the RDF (with SKOS) version of the Getty’s AAT function facets.
!
!
!
!
!
!
!
!
Send me an e-mail to join us :)
39
40. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Summary
!
!
1.
2.
3.
4.
Motivation: controlled vocabularies are ambiguous and lack definitions
Object: Historical place types
Proposal: use functional categorisation to overcome limitations
Implementation: community of interest, reuse of standards, ad-hoc software,
bottom-up source-based approach
40
41. Leipzig eHumanities Seminars
11/12/2013
Giovanni Colavizza
Future directions
Short term priority:
development of the Labelling system
!
Long term:
• engage with more researchers and projects
• test the method in different settings
• steadily grow the vocabulary base
• integrate existing vocabularies in the system
41