This document summarizes a talk on experiences developing geographical ontologies and linked data. The talk discusses why geographical ontologies were developed, including issues with integrating diverse geographical data sources. It describes the NeOn methodology used in developing ontologies like Hydrontology for the hydrology domain and PhenomenOntology based on feature catalogs. Guidelines for developing linked data are also covered. The overall structure of the talk is outlined.
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Experiences in the Development of Geographical Ontologies and Linked Data
1. Experiences in the
Development of
Geographical Ontologies
and Linked Data
OntoGeo Workhop, Toulouse, 18 November 2010
Oscar Corcho, Luis Manuel Vilches Blázquez, José Angel Ramos
Gargantilla {ocorcho,lmvilches,jramos}@fi.upm.es
Ontology Engineering Group, Departamento de Inteligencia Artificial,
Facultad de Informática, Universidad Politécnica de Madrid
Credits: Asunción Gómez-Pérez, María del Carmen Suárez de Figueroa, Boris Villazón,
Alex de León, Víctor Saquicela, Miguel Angel García, Juan Sequeda and many others
Work distributed under the license Creative Commons Attribution-
Noncommercial-Share Alike 3.0
2. • Why did we start developing Geographical
Ontologies?
• Methodological guidelines for ontology development
• The NeOn Methodology
• The development process for Hydrontology
• The development process for PhenomenOntology
• Why did we start developing Geographical Linked
Data?
• Methodological guidelines for Linked Data generation
• Ontology and Linked Data usage in
http://geo.linkeddata.es/
Structure of my Talk
3. • Why did we start developing Geographical
Ontologies?
• Methodological guidelines for ontology development
• The NeOn Methodology
• The development process for Hydrontology
• The development process for PhenomenOntology
• Why did we start developing Geographical Linked
Data?
• Methodological guidelines for Linked Data generation
• Ontology and Linked Data usage in
http://geo.linkeddata.es/
Structure of my Talk
5. • Great variety of sources
• Near 20 different producers in Spain (national and local
cartographic institutions with different interest)
• Various degrees of quality and structuring of
information
• Natural language ambiguity
• Synonymy, polysemy and hyperonymy
• Scale factor
Why ontologies? Geographical Information Context
7. • Great variety of sources
• Various degrees of quality and structuring of
information
• ICC has 49 types of features in total
• IGN has (only in the hydrographic domain) 40 types of
features
• Natural language ambiguity
• Synonymy, polysemy and hyperonymy
• Scale factor
Why ontologies? Geographical Information Context
9. • Great variety of sources
• Various degrees of quality and structuring of
information
• Natural language ambiguity
• Synonymy: Different words with the same meaning
» riverside, river bank
• Polysemy: Same word with different meanings. Bank
» Bank: Financial institution
» Bank: Relay upon (trust)
• Hyperonymy: One word includes other.
» Bank and Morgan Bank
• Scale factor
Why ontologies? Geographical Information Context
10. • Great variety of sources
• Various degrees of quality and structuring of
information
• Natural language ambiguity
• Synonymy, polysemy and hyperonymy
• Scale factor
• E.g., one village may be represented as a point X,Y or as an
area XN,YN
• This can act as a filter for geographical information
• Different scales normally present different features
• Generalisation processes are normally a problem, due to
the difficulties in finding “feature overlaps” in different
feature catalogues
Why ontologies? Geographical Information Context
11. • Why did we start developing Geographical
Ontologies?
• Methodological guidelines for ontology development
• The NeOn Methodology
• The development process for Hydrontology
• The development process for PhenomenOntology
• Why did we start developing Geographical Linked
Data?
• Methodological guidelines for Linked Data generation
• Ontology and Linked Data usage in
http://geo.linkeddata.es/
Structure of my Talk
12. O. Specification O. Conceptualization O. ImplementationO. Formalization
1
RDF(S)
OWL
Flogic
NeOn Scenarios
Ontology Restructuring
(Pruning, Extension,
Specialization, Modularization)
8
O. Localization
9
Ontology Support Activities: Knowledge Acquisition (Elicitation); Documentation;
Configuration Management; Evaluation (V&V); Assessment
1,2,3,4,5,6,7,8, 9
O. Aligning
O. Merging
Alignments5
5
5
Ontological Resource
Reengineering
4
4
4
6
6
6
6
Knowledge Resources
Ontological Resources
O. Design Patterns
2
Non Ontological Resources
Thesauri
DictionariesGlossaries Lexicons
Taxonomies
Classification
Schemas
Non Ontological Resource
Reuse
Non Ontological Resource
Reengineering
2
2
O. Repositories and Registries
Flogic
RDF(S)
OWL
Ontology Design
Pattern Reuse
7
3
Ontological Resource
Reuse
3
13. NeOn Scenarios
1. Building ontology networks from scratch without reusing existing
resources.
2. Building ontology networks by reusing and reengineering non
ontological resources.
3. Building ontology networks by reusing ontologies or ontology
modules.
4. Building ontology networks by reusing and reengineering ontologies
or ontology modules.
5. Building ontology networks by reusing and merging ontology or
ontology modules.
6. Building ontology networks by reusing, merging and reengineering
ontologies or ontology modules.
7. Building ontology networks by reusing ontology design patterns.
8. Building ontology networks by restructuring ontologies or ontology
modules.
9. Building ontology networks by localizing ontologies or ontology
modules.
14. NeOn Methodology
Process and activities covered:
Ontology Specification
Scheduling
Non Ontological Resource Reuse
Non Ontological Resource Reengineering
Reuse General Ontologies
Reuse Domain Ontologies
Reuse Ontology Statements
Reuse Ontology Design Patterns
All processes and activities are described with:
A filling card
A workflow
Examples
15. • Why did we start developing Geographical
Ontologies?
• Methodological guidelines for ontology development
• The NeOn Methodology
• The development process for Hydrontology
• The development process for PhenomenOntology
• Why did we start developing Geographical Linked
Data?
• Methodological guidelines for Linked Data generation
• Ontology and Linked Data usage in
http://geo.linkeddata.es/
Structure of my Talk
17. • One of the INSPIRE aims is to harmonise
Geographical information sources to give support to
formulating, implementing and evaluating EU policies
(e.g., Environmental Management).
• Geographical Information Sources: Databases from
EU State Members at local, regional, national and
international levels.
INSPIRE as a context for hydrontology
Luis Manuel Vilches Blázquez
20. • Glossary of hydrOntology terms.
• Feature Catalogues of the Numerical Cartographic Database
(1:25.000; 1:200.000; 1:1.000.000)
• Different Feature Catalogue from other local producers.
• EuroGlobalMap & EuroRegionalMap
• Water Framework Directive
• Alexandria Digital Library, Dewey
• Thesauri (UNESCO, GEMET, Getty Thesaurus of Geographic
Names, etc.)
• National Geographic Gazetteer
• Bibliography (Dictionary, Water, Law, etc.)
• This glossary contains more than 120 concepts
21. Criteria for structuring
• Abstracts concepts from:
• Water Framework Directive
• Proposed by the EU Parliament and EU Council
• List of hydrographic phenomena definition
• Part of the model from:
• SDIGER Project
• INSPIRE pilot project
• Two river basins, two countries, two languages
• Several semantic criteria from:
• WordNet
• Encyclopaedia Britannica
• Diccionario de la Real Academia de la Lengua
• Wikipedia
• Several domain references
• Inheritance: From various actual catalogues
• Meetings with domain experts that belong to IGN-E
23. Modelling the hydrology domain
Nivel superior
Nivel inferior
150+ classes, 47 object properties, 64 data properties and 256 axioms.
24. • Why did we start developing Geographical
Ontologies?
• Methodological guidelines for ontology development
• The NeOn Methodology
• The development process for Hydrontology
• The development process for PhenomenOntology
• Why did we start developing Geographical Linked
Data?
• Methodological guidelines for Linked Data generation
• Ontology and Linked Data usage in
http://geo.linkeddata.es/
Structure of my Talk
27. Knowledge Bases
• National Geographic Gazetteer
has 14 item types and 460,000
toponyms (Spanish, Galician,
Basque, Catalan, and Aranes).
• Conciso Gazetteer, which is
agreed with the United Nations
Conferences
Recommendations on
Geographic Names
Normalization, has 17 item
types and 3667 toponyms.
Conciso Gazetteer
• Gazetteer is a directory of instances of a
class or classes of features than contain
some information regarding position (ISO
19112)
National Geographic Gazetteer
28. Knowledge Bases
• BCN25 was designed as a derived
product from National
Topographic Map and this was
built to obtain cartographic
information that complies with the
required data specifications
exploited inside GIS.
• BCN200 was developed through
analogical map digitalisation of
provincial maps.
• Information is structured in 8
topics (Administrative boundaries,
Relief, Hydrography, Vegetation
and so on)
• Feature catalogue presents the
abstraction of reality, represented in
one or more sets of geographic
data, as a defined classification of
phenomena (ISO 19110)
Numerical Cartographic
Database (BCN25)
Numerical Cartographic
Database (BCN200)
30. Bottom-up process: PhenomenOntology
• Automatic ontology building from
BCN25/BTN25
BCN25/BTN25
• Automatic checking of linguistic differences (linsearch): plurals,
punctuation marks, capital letters and Spanish signs
• Curation process by expert domain of IGN-E
PhenomenOntology
31. Criteria for taxonomy creation
• Group (Road, Hydrographic...)
• Code column
• (Topic) - (030501)
• (Group) – (030501)
• (Subgroup) – (030501)
• Common lexical parts
• Highway with 2 lines
• Highway with 3 lines
• Highway under construction
• Highway (superclass)
• Lexical heterogeneity in
feature names (“Autovía”,
“AUTOVIA”, “Autovia”,
“Autovía-”)
Numerical Cartographic
Database (BCN25)
42. • Why did we start developing Geographical
Ontologies?
• Methodological guidelines for ontology development
• The NeOn Methodology
• The development process for Hydrontology
• The development process for PhenomenOntology
• Why did we start developing Geographical Linked
Data?
• Methodological guidelines for Linked Data generation
• Ontology and Linked Data usage in
http://geo.linkeddata.es/
Structure of my Talk
43. • Generic ontology development methodologies can be
applied with some success
• Hydrontology took a total of 6PM approximately
• Initially done by a domain expert after very initial training
• Ontology debugging was extremely difficult and has provided
interesting results in this area
• Top down vs bottom up approaches
• Large curation process still needed in bottom-up
approaches, which may not advise following it (research
ongoing on this)
• More lightweight ontologies with bottom-up approach,
although easier to relate to underlying catalogues
• Next steps on relating them to upper-level ontologies
(e.g., Dolce) and modularising for improving
reusability
Some conclusions in ontology development
44. • Why did we start developing Geographical
Ontologies?
• Methodological guidelines for ontology development
• The NeOn Methodology
• The development process for Hydrontology
• The development process for PhenomenOntology
• Why did we start developing Geographical Linked
Data?
• Methodological guidelines for Linked Data generation
• Ontology and Linked Data usage in
http://geo.linkeddata.es/
Structure of my Talk
45. What is the Web of Linked Data?
• An extension of the current
Web…
• … where information and services
are given well-defined and explicitly
represented meaning, …
• … so that it can be shared and used
by humans and machines, ...
• ... better enabling them to work in
cooperation
• How?
• Promoting information exchange by
tagging web content with machine
processable descriptions of its
meaning.
• And technologies and infrastructure
to do this
• And clear principles on how to
publish data
data
46. What is Linked Data?
• Linked Data is a term used to describe a
recommended best practice for exposing, sharing,
and connecting pieces of data, information, and
knowledge on the Semantic Web using URIs and
RDF.
• Part of the Semantic Web
• Exposing, sharing and connecting data
• Technologies: URIs and RDF (although others are also
important)
47. The four principles (Tim Berners Lee, 2006)
1. Use URIs as names
for things
2. Use HTTP URIs so
that people can look
up those names.
3. When someone looks
up a URI, provide
useful information,
using the standards
(RDF*, SPARQL)
4. Include links to other
URIs, so that they can
discover more things.
• http://www.w3.org/D
esignIssues/Linked
Data.html
47
http://www.ted.com/talks/tim_berners_lee_on_the_next_web.htm
51. How should we publish data?
• Formats in which data is published nowadays…
• XML
• HTML
• DBs
• APIs
• CSV
• XLS
• …
• However, main limitations from a Web of Data point
of view
• Difficult to integrate
• Data is not linked to each other, as it happens with Web
documents.
52. How do we publish Linked Data?
1. Exposing Relational Databases or other similar formats
into Linked Data
• D2R
• Triplify
• R2O
• NOR2O
• Virtuoso
• Ultrawrap
• …
2. Using native RDF triplestores
• Sesame
• Jena
• Owlim
• Talis platform
• …
3. Incorporating it in the form of RDFa in CMSs like Drupal
52
53. How do we consume Linked Data?
• Linked Data browsers
• To explore things and datasets and to navigate between them.
• Tabulator Browser (MIT, USA), Marbles (FU Berlin, DE),
OpenLink RDF Browser (OpenLink, UK), Zitgist RDF Browser
(Zitgist, USA), Disco Hyperdata Browser (FU Berlin, DE),
Fenfire (DERI, Ireland)
• Linked Data mashups
• Sites that mash up (thus combine Linked data)
• Revyu.com (KMI, UK), DBtune Slashfacet (Queen Mary, UK),
DBPedia Mobile (FU Berlin, DE), Semantic Web Pipes (DERI,
Ireland)
• Search engines
• To search for Linked Data.
• Falcons (IWS, China), Sindice (DERI, Ireland), MicroSearch
(Yahoo, Spain), Watson (Open University, UK), SWSE (DERI,
Ireland), Swoogle (UMBC, USA)
53
Listing on this slide by T. Heath, M. Hausenblas, C. Bizer, R. Cyganiak, O. Hartig
54. One additional motivation: Open Government
• Government and state administration should be
opened at all levels to effective public scrutiny and
oversight
• Objectives:
• Transparency
• Participation
• Collaboration
• Inclusion
• Cost reduction
• Interoperability
• Reusability
• Leadership
• Market & Value
54
•Some Links:
• B. Obama –Transparency and Open
Government
• T. Berners-Lee - Raw data now!
• J. Manuel Alonso - ¿Qué es Open Data?
• Open Government Data
• 8 Principles of Open Government Data
56. Linked Data Mashup (data.gov)
• Clean Air Status and Trends (CASTNET)
• http://data-gov.tw.rpi.edu/demo/exhibit/demo-8-castnet.php
57. • Why did we start developing Geographical
Ontologies?
• Methodological guidelines for ontology development
• The NeOn Methodology
• The development process for Hydrontology
• The development process for PhenomenOntology
• Why did we start developing Geographical Linked
Data?
• Methodological guidelines for Linked Data generation
• Ontology and Linked Data usage in
http://geo.linkeddata.es/
Structure of my Talk
58. GeoLinkedData
• It is an open initiative whose aim is to enrich the Web
of Data with Spanish geospatial data.
• This initiative has started off by publishing diverse
information sources, such as National Geographic
Institute of Spain (IGN-E) and National Statistics
Institute (INE)
• http://geo.linkeddata.es
59. Motivation
» 99.171 % English
» 0.019 % Spanish
Source:Billion Triples dataset at http://km.aifb.kit.edu/projects/btc-2010/
Thanks to Aidan and Richard
The Web of Data is mainly for
English speakers
Poor presence of Spanish
61. Impact of geo.linkeddata.es
• Number of triples in Spanish (July 2010): 1.412.248
• Number of triples in Spanish (September 2010):
21.463.088
61Asunción Gómez Pérez
Before geo.linkeddata.es
en 99,1712875
ja 0,463849377
fr 0,05447229
de 0,034225134
pl 0,02532934
it 0,021982542
es 0,019584648
After geo.linkeddata.es
en 94,18744941
es 5,044085342
ja 0,440538697
fr 0,051734793
de 0,032505155
pl 0,024056418
it 0,020877812
62. Process for Publishing Linked Data on the Web
Identification
of the data sources
Vocabulary
development
Generation
of the RDF Data
Publication
of the RDF data
Linking
the RDF data
Data cleansing
Enable effective
discovery
63. 1. Identification and selection of the data sources
Instituto Geográfico
Nacional
Identification
of the data sources
Vocabulary
development
Generation
of the RDF Data
Publication
of the RDF data
Linking
the RDF data
Data cleansing
Enable effective
discovery
Basque
Catalan
Galician
Spanish
64. 1. Identification and selection of the data sources
Instituto Nacional
de Estadística
Identification
of the data sources
Vocabulary
development
Generation
of the RDF Data
Publication
of the RDF data
Linking
the RDF data
Data cleansing
Enable effective
discovery
Province
Year
66. 2. Vocabulary development
• Features
• Lightweight :
• Taxonomies and a few properties
• Consensuated vocabularies
• To avoid the mapping problems
• Multilingual
• Linked data are multilingual
• The NeOn methodology can help to
• Re-enginer Non ontological resources into ontologie
• Pros: use domain terminology already
consensuated by domain experts
• Withdraw in heavyweight ontologies those features
that you don’t need
• Reuse existing vocabularies
66Asunción Gómez Pérez
Identification
of the data sources
Vocabulary
development
Generation
of the RDF Data
Publication
of the RDF data
Linking
the RDF data
Data cleansing
Enable effective
discovery
67. Vocabulary development: Specification
• Content requirements: Identify the set of questions
that the ontology should answer
• Which one are the provinces in Spain?
• Where are the beaches?
• Where are the reservoirs?
• Identify the production index in Madrid
• Which one is the city with higher production index?
• Give me Madrid latitude and altitude
• ….
• Non-content requirements
• The ontology must be in the four official Spanish languages
67Asunción Gómez Pérez
69. 3. Generation of RDF
• From the Data
sources
• Geographic
information
(Databases)
• Statistic information
(spreadsheets)
• Geospatial information
• Different technologies
for RDF generation
• Reengineering
patterns
• R20 and ODEMapster
• Geometry generation
Identification
of the data sources
Vocabulary
development
Generation
of the RDF Data
Publication
of the RDF data
Linking
the RDF data
Data cleansing
Enable effective
discovery
70. 3. Generation of the RDF Data
INE
NOR2O
ODEMapster
IGN
IGN
Geospatial
column
Geometry2RDF
71. 3. Generation of the RDF Data
• Preliminaries
• Select appropriate URIs
• Difficulties
• Cumbersome URIs in Spanish
• http://geo.linkeddata.es/ontology/Río
• RDF allows UTF-8 characters for URIs
• But, Linked Data URIs has to be URLs as well
• So, non ASCII-US characters have to be %code
• http://geo.linkeddata.es/ontology/R%C3%ADo
72. 3. Generation of the RDF Data / instances
• NOR2O is a software library that implements the transformations
proposed by the Patterns for Re-engineering Non-Ontological
Resources (PR-NOR). Currently we have 16 PR-NORs.
• PR-NORs define a procedure that transforms a Non-Ontological
Resource (NOR) components into ontology elements.
http://ontologydesignpatterns.org/
NOR2O
· Classification schemes
· Thesauri
· Lexicons
NOR2O
FAO Water classification
· Classification scheme
77. 3. Generation of the RDF Data – Geometry2RDF
Oracle STO UTIL package
SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry))
AS Gml311Geometry
FROM "BCN200"."BCN200_0301L_RIO" c
WHERE c.Etiqueta='Arroyo'
80. 3. Generation of the RDF data – RDF graphs
• IGN INE
• So far
• 7 RDF Named Graphs
BTN25 BCN200 IPI….
http://geo.linkeddata.es/dataset/IGN/BTN25 http://geo.linkeddata.es/dataset/IGN/BCN200 http://geo.linkeddata.es/dataset/INE/IPI
81. 4. Publication of the RDF Data
SPARQL
Pubby
Linked DataHTML
Virtuoso 6.1.0
Pubby 0.3
Including Provenance
Support
Identification
of the data sources
Vocabulary
development
Generation
of the RDF Data
Publication
of the RDF data
Linking
the RDF data
Data cleansing
Enable effective
discovery
83. 4. Publication of the RDF Data - License
• Data Licenses
• Official license as published in the Spanish official journal
(BOE - Boletín Oficial del Estado)
• Creative Commons options
• GNU Free Documentation License
• Each dataset has its own specific license
• IGN
• INE
84. 5. Data cleansing
• Lack of documentation of the IGN datasets
• Broken links: Spain, IGN resources
• Lack of documentation of the ontology
• Missing english and spanish labels
• Building a spanish ontology and importing
some concepts of other ontology (in
English):
• Importing the English ontology. Add
annotations like a Spanish label to them.
• Importing the English ontology, creating new
concepts and properties with a Spanish name
and map those to the English equivalents.
• Re-declaring the terms of the English ontology
that we need (using the same URI as in the
English ontology), and adding a Spanish label.
• Creating your own class and properties that
model the same things as the English
ontology.
Identification
of the data sources
Vocabulary
development
Generation
of the RDF Data
Publication
of the RDF data
Linking
the RDF data
Data cleansing
Enable effective
discovery
85. 6. Linking of the RDF Data
• Silk - A Link Discovery Framework for
the Web of Data
• First set of links: Provinces of Spain
• 86% accuracy
GeoLinkedDataDBPedia Geonames
Identification
of the data sources
Vocabulary
development
Generation
of the RDF Data
Publication
of the RDF data
Linking
the RDF data
Data cleansing
Enable effective
discovery
86. 6. Linking of the RDF Data
• http://geo.linkeddata.es/page/Provincia/Granada
86Asunción Gómez Pérez
87. 7. Enable effective discovery
Identification
of the data sources
Vocabulary
development
Generation
of the RDF Data
Publication
of the RDF data
Linking
the RDF data
Data cleansing
Enable effective
discovery
88. • Why did we start developing Geographical
Ontologies?
• Methodological guidelines for ontology development
• The NeOn Methodology
• The development process for Hydrontology
• The development process for PhenomenOntology
• Why did we start developing Geographical Linked
Data?
• Methodological guidelines for Linked Data generation
• Ontology and Linked Data usage in
http://geo.linkeddata.es/
Structure of my Talk
93. Future Work
• Generate more datasets from other domains, e.g.
universities in Spain.
• Identify more links to DBPedia and Geonames.
• Cover complex geometrical information, i.e. not only
Point and LineString-like data; we will also treat
information representation through polygons.
94. • Why did we start developing Geographical
Ontologies?
• Methodological guidelines for ontology development
• The NeOn Methodology
• The development process for Hydrontology
• The development process for PhenomenOntology
• Why did we start developing Geographical Linked
Data?
• Methodological guidelines for Linked Data generation
• Ontology and Linked Data usage in
http://geo.linkeddata.es/
Structure of my Talk
95. • Reusable ontologies available for the community
• Well-founded and well documented
• Now working on multilinguality/multiculturality issues
• Work continuing in understanding how to provide debugging
tools for domain experts.
• Reusable tools for geospatial Linked Data generation
• There is still a lack of understanding of how much
benefit we can get from Linked Geographical Data
• Benefits of linking seem to be clear
• But geo-processing is still unsolved in RDF, as well as
geometry representation
General conclusions
Luis Manuel Vilches Blázquez
96. Experiences in the
Development of
Geographical Ontologies
and Linked Data
OntoGeo Workhop, Toulouse, 18 November 2010
Oscar Corcho, Luis Manuel Vilches Blázquez, José Angel Ramos
Gargantilla {ocorcho,lmvilches,jramos}@fi.upm.es
Ontology Engineering Group, Departamento de Inteligencia Artificial,
Facultad de Informática, Universidad Politécnica de Madrid
Credits: Asunción Gómez-Pérez, María del Carmen Suárez de Figueroa, Boris Villazón,
Alex de León, Víctor Saquicela, Miguel Angel García, Juan Sequeda and many others
Work distributed under the license Creative Commons Attribution-
Noncommercial-Share Alike 3.0