The slideset used to conduct an introduction/tutorial
on DBpedia use cases, concepts and implementation
aspects held during the DBpedia community meeting
in Dublin on the 9th of February 2015.
(slide creators: M. Ackermann, M. Freudenberg
additional presenter: Ali Ismayilov)
1. DBpedia Tutorial 09.02.2015 http://dbpedia.org1
Creating Knowledge out of Interlinked Data
Markus Ackermann, Markus Freudenberg
WG Agile Knowledge and Semantic Web
Universität Leipzig
DBpedia
Extraction of Knowledge
from Wikipedia
2. DBpedia Tutorial 09.02.2015 http://dbpedia.org2
Wikipedia
Wikipedia coverage of the London bombing on July 7, 2005
–the first Wikipedia entry appeared in just 18 minutes
–2500 users provided a 14 page article in only 12 hours
–far more detailed than any other news source
[Tapscott, D. Williams 2006]
3. DBpedia Tutorial 09.02.2015 http://dbpedia.org3
Wikipedia
Wikipedia articles:
–4,7 mio. Articles; 780 article additions per day
–are highly topical
–containing only few errors, which can easily be
revised
–cover often very specific content
→ Wikipedia is the knowledge
compendium of humanity.
4. DBpedia Tutorial 09.02.2015 http://dbpedia.org4
Semantic Web
–Web 3.0 web technology
–a way of linking data between systems or entities
–allows for rich, self-describing interrelations of data
available across the globe
–open up the web of data to artificial intelligence
processes
–encourage companies, organisations and individuals to
publish their data freely, in an open standard format
–encourage businesses to use data already available on
the web (data give/take)
5. DBpedia Tutorial 09.02.2015 http://dbpedia.org5
Linked Data
The means of populating the Semantic Web is Linked Data.
(introduced by Tim Berners-Lee)
Four simple rules :
–Use URIs as names for things
–Use HTTP URIs so that people can look up those names
–When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
–Include links to other URIs. so that they can discover
more things.
7. DBpedia Tutorial 09.02.2015 http://dbpedia.org7
benefits of using Linked Data
Consumer View
- link data from any other place in the web
- discover more related data while consuming
data
- reuse parts of the data
- reuse existing tools and libraries
- combine data safely with other data
- query data over different repositories
Publisher View
- make your data discoverable
- increase the value of your data (by linking it)
- have fine-granular control over the
data items and optimise their access
- design data to fit your domain knowledge
8. DBpedia Tutorial 09.02.2015 http://dbpedia.org8
What's DBpedia?
– DBpedia is a community effort to extract structured
information from Wikipedia and to make this information
available on the Web.
– DBpedia allows you to ask sophisticated queries against
Wikipedia, and to link other data sets on the Web to Wikipedia
data.
– Common goal with WikiData but, different approach
9. DBpedia Tutorial 09.02.2015 http://dbpedia.org9
What's DBpedia?
–DBpedia project was started in 2006
–has been a key factor for the success of the
Linked Open Data initiative
– serves as an interlinking hub for other data
sets
–DBpedia provides a testbed serving real data
spanning various domains
–In more than 120 language editions
10. DBpedia Tutorial 09.02.2015 http://dbpedia.org10
Where is Wikipedia
information useful?
„Which films starred John Cleese without any
other members of Monty Python?“
„What have Dublin and Leipzig in common?“
„Which Software products are developed by an
organisation founded in California?“
„Which populated places in Germany are below
sea level?“
11. DBpedia Tutorial 09.02.2015 http://dbpedia.org11
Where is Wikipedia
information useful?
●
as terminology and concept repository and fact source
for Entity Linking and Disambiguation:
The series follows the adventures of a space-faring crew on board
the starship USS Enterprise (NCC-1701-D), the fifth Federation
vessel to bear the name and registry and the seventh starship by
that name
The Enterprise is commanded by Captain Jean-Luc Picard and is
staffed by first officer Commander William Riker, operations
manager Data, security chief Tasha Yar, ship's counselor
Deanna Troi, chief medical officer Dr. Beverly Crusher, conn
officer Lieutenant Geordi La Forge, and junior officer Lieutenant
Worf.
⇒ no company, no aircraft carrier, no satellite
⇒ correlate the mentionings and concept starship
⇒ Star Trek rank, contemporary or past military or
law enforcement
12. DBpedia Tutorial 09.02.2015 http://dbpedia.org12
Why search engines aren't
always enough
„Which films
starred John
Cleese without
any other
members of
Monty
Python?“
15. DBpedia Tutorial 09.02.2015 http://dbpedia.org15
What is needed to do better?
●
ontological represantation of entities and facts
„An ontology is a specification of a conceptualization.“
(Gruber, 1993)
⇒ formal description of concepts and relationships
16. DBpedia Tutorial 09.02.2015 http://dbpedia.org16
What is needed to do better?
●
ontological represantation of entities and facts
●
well-defined taxonomy of entity types
●
assertions about entities in and their relations
A British Comedy is a kind of Comedy. A Comedy is a kind
of Film.
A British Comedy is a kind of Film.
Clockwise is a British Comedy. John Cleese stars
Clockwise.
John Cleese stars a Film.
●
thoroughly specified, machine-actionable, but flexible
formalism for representation
17. DBpedia Tutorial 09.02.2015 http://dbpedia.org17
A brief introduction to RDF
Resource Description Framework (W3C Standard)
●
flexible language and data model for representation of
information
●
based on (S,P,O) triples denoting simple assertions
S – subject P – property O – object
S I∊ ∪B P ∊ I O ∊ I∪B∪L
I – URIs/IRIs; B – blank nodes; L – Literals
●
URIs/IRIs of named entities are:
●
unambigious, but non-unique identifiers of a resource
●
often dereferencable (in the Semantic Web)
●
aggregate of triple-assertions constitutes a directed
graph with typed edges
19. DBpedia Tutorial 09.02.2015 http://dbpedia.org19
DBpedia -
motivation and use cases
an RDF view of structured Wikipedia information
enables:
●
sophisitated queries
⇒ cross-referencing facts of entities
⇒ filtering of entities based on their types
and fact assertions
●
combining facts from Wikipedia with machine-
actionable knowledge from other structured datasets
(Geodata, Yellowpages, WordNet, ...)
20. DBpedia Tutorial 09.02.2015 http://dbpedia.org20
Another take on
Question Answering
„Which films
starred John
Cleese without
any other
members of
Monty
Python?“
22. DBpedia Tutorial 09.02.2015 http://dbpedia.org22
DBpedia -
contents and datasets
●
Wikipedia article ⇔ DBpedia resource
http://en.wikipedia.org/wiki/Monty_Python
⇔ http://dbpedia.org/resource/Monty_Python
●
mapping-based types and facts governed by the DBpedia
Ontology
23. DBpedia Tutorial 09.02.2015 http://dbpedia.org23
DBpedia -
contents and datasets
●
4.58 mio. entities and 583 mio. triples (Englisch DBpedia
2014)
131,2 mio. fact assertions (devived from info boxes)
168,5 mio. triples representing Wikipedia structure
57,1 mio. links to external datasets
●
DBpedia resources are categorised in several manners:
●
by Wikipedia categories (represented in SKOS)
●
by YAGO classification
●
by links to WordNet Synsets
●
by assignment of classes from the DBpedia ontology
●
Provenance meta-data
⇒ From which part of which Wikipedia page was a triple derived?
24. DBpedia Tutorial 09.02.2015 http://dbpedia.org24
Mappings Wiki
a community effort to:
–develop an ontology schema
–provide mappings from Wikipedia Infoboxes
properties to this ontology
→ creating an alignment between Wikipedia and
Dbpedia
→ eliminating name variations in properties and
classes
→ big boost for Precision
25. DBpedia Tutorial 09.02.2015 http://dbpedia.org25
DBpedia Ontology
cross-domain ontology
–maintained and extended by the community in the
DBpedia Mappings Wiki
–manually created based on the most commonly used
infoboxes
–currently covers 685 classes which form a subsumption
hierarchy and are described by 2,795 different
properties
–subsumption hierarchy with a maximal depth of 5
–is maintained and extended by the community in the
DBpedia Mappings Wiki
27. DBpedia Tutorial 09.02.2015 http://dbpedia.org27
Wikipedia articles
– Wikipedia articles consist mostly of free text
– also comprise various types of structured
information
– including: infobox templates, categorisation
information, images, geo-coordinates, links to
external web pages, disambiguation pages,
redirects between pages, other language links
– Title
– Abstract
– Infoboxes
– Geo-
coordinates
– Categories
– Images
article outline
–Links
»other language
versions
»other Wikipedia pages
»To the Web
»Redirects
»Disambiguations
28. DBpedia Tutorial 09.02.2015 http://dbpedia.org28
Structure in Wikipedia
Title
Abstract
Infoboxes
Geo-coordinates
Categories
Images
Links
– other language versions
– other Wikipedia pages
– To the Web
– Redirects
– Disambiguations
29. DBpedia Tutorial 09.02.2015 http://dbpedia.org29
{{Infobox Korean settlement
| title = Busan Metropolitan City
| img = Busan.jpg
| imgcaption = A view of the [[Geumjeong]] district in Busan
| hangul = 부 산 광 역 시
...
| area_km2 = 763.46
| pop = 3635389
| popyear = 2006
| mayor = Hur Nam-sik
| divs = 15 wards (Gu), 1 county (Gun)
| region = [[Yeongnam]]
| dialect = [[Gyeongsang]]
}}
dbp:Busan dbp:title ″Busan Metropolitan City″
dbp:Busan dbp:hangul ″ 부 산 광 역 시 ″ @Hang
dbp:Busan dbp:area_km2 ″763.46“^xsd:float
dbp:Busan dbp:pop ″3635389“^xsd:int
dbp:Busan dbp:region dbp:Yeongnam
dbp:Busan dbp:dialect dbp:Gyeongsang
...
infobox encondig
31. DBpedia Tutorial 09.02.2015 http://dbpedia.org31
Björk (Musician)
Occupation = Musician, Actor
Born = 21.12.1965, Reykjavík
Brown (Prime Minister)
office = Prime Minister of the UK
birth_date = 20.4.1951
birth_place = Govan
Romero (Actor)
occupation = Actor, Editor
birthdate = 4.2.1940
birthplace = New York
32. DBpedia Tutorial 09.02.2015 http://dbpedia.org32
DBpedia Extraction
Framework
DIEF - DBpedia Information Extraction Framework
–extracts structured information from Wikipedia and
turns it into a rich knowledge base
–Mapping-Based Infobox Extraction, Raw Infobox
Extraction, Feature Extraction, Statistical Extraction
–Hosted on GitHub
–Written in Scala & Java
34. DBpedia Tutorial 09.02.2015 http://dbpedia.org34
Dbpedia Live
–Wikipedia articles are continuously revised at a
very
high rate
–English Wikipedia, in June 2013, had
approximately 3.3 million edits per month (^=
77 edits per minute)
–Dbpedia Live was developed to keep Dbpedia
in synchronization with Wikipedia
–works on a continuous stream of updates from
Wikipedia and processes that stream on the fly
36. DBpedia Tutorial 09.02.2015 http://dbpedia.org36
Acessing DBpedia - Browsing
●
official DBpedia mirror http://dbpedia.org
⇒ run on Virtuoso
⇒ point & click browsing via DBpedia VAD
⇒ faceted search with Virtuoso Facets
37. DBpedia Tutorial 09.02.2015 http://dbpedia.org37
Acessing DBpedia - SPARQL
●
official SPARQL endpoint http://dbpedia.org/sparql
●
⇒ subject to a fair use policy (limited query runtime)
●
⇒ iSPARQL frontend (interactive query building)
●
⇒ Snorql frontend
●
⇒ query with any SPARQL compliant tool or API
38. DBpedia Tutorial 09.02.2015 http://dbpedia.org38
Querying RDF with SPARQL
●
SPARQL Protocol and RDF Query Language
⇒ graph patterns as set of triples (with variables)
⇒ successful matches of graph patters generate
bindings in (sub-)query solutions
39. DBpedia Tutorial 09.02.2015 http://dbpedia.org39
Querying RDF with SPARQL
●
SPARQL Protocol and RDF Query Language
⇒ graph patterns as set of triples (with variables)
⇒ successful matches of graph patters generate
bindings in (sub-)query solutions
●
different result types for queries
SELECT ⇒ bindings, ASK ⇒ true/false, CONSTRUCT ⇒ new graph
●
combinators and modifiers for basic graph patterns
⇒ UNION, FILTER, MINUS, FILTER (NOT) EXISTS
●
result set modifies
LIMIT, OFFSET, DISTINCT, ORDER BY
●
numerous operators and operators for resource and
literal values
●
many additions in 1.1 revision:
grouping & aggregates, regular property path expr., sub-queries
41. DBpedia Tutorial 09.02.2015 http://dbpedia.org41
SPARQL Tooling
●
FlintSparqlEditor: Javascript SPARQL Editor
●
syntax highlighting, code assistance
●
auto-completion for properties and classes (for small
datasets)
●
Protegé: full-fledged ontology editor
●
good to get an overview of ontologies backing datasets
●
two SPARQL plug-ins (one supporting entailment)
●
curl or your favourite simple REST API
●
allows for simple testing queries from any text editor with
SPARQL syntax support (e.g. Emacs, Vim, Sublime Text)
$curl -H 'Accept: application/json' --data-urlencode
"query=$(cat query.sparql)" http://dbpedia.org/sparql
42. DBpedia Tutorial 09.02.2015 http://dbpedia.org42
DBpedia for Entity Linking and
Disambiguation
●
DBpedia Spotlight
●
web service to detect, disambiguate and link mentionings
of DBpedia resource occurrences in input text
●
uses two NLP datasets derived by DBpedia
⇒ topic signatures - tf/idf weighted term vectors
⇒ lexicalisations - alternative names for entities and
concepts
●
several other entity detection and linking services
targetting DBpedia entities:
AlchemyAPI, Ontos Semantic API, OpenCalais, Zemanta
45. DBpedia Tutorial 09.02.2015 http://dbpedia.org45
Linking DBpedia
target
dataset
predicate out-link cout
Freebase owl:sameAs 3 6000 000
YAGO2 rdf:type 18 100 000
UMBEL rdf:type 896 400
WordNet dbp:wordnet type 467 100
OpenCyc owl:sameAs 27 100
LinkedGeoData owl:sameAs 103 600
GeoNames owl:sameAs 86 500
●
community-curated links to various major and minor external
datasets:
●
Linked Data Web analysis with Sinditech measured
3 960 212 in-links to DBpedia (lower-bound)
statistics from (Lehmann et al. 2012)
46. DBpedia Tutorial 09.02.2015 http://dbpedia.org46
Linking DBpedia -
use cases for Linked DBpedia Data
●
correllate the accumulated Funding per year from EU to
member countries (from FTS) with the gross domestic
product of these countries (DBpedia)
●
correlate the share of metropolitan area above average used
for parks or other natural recreational areas in town and
cities led environmentalist (LinkedGeoData & DBpedia)
●
is there a town with town with no more than 15000
inhabitants in the area around Leipzig containing a church
with Catholic denomination, childcare, a primary shool and a
grammar school, not currently led by a politican from the
conservative party
47. DBpedia Tutorial 09.02.2015 http://dbpedia.org47
DBpedia internationalised
●
non-English versions of DBpedia offers
●
coverage of more entities
●
more detailed or up-to-date information for entities associated
with the particular coutries
●
international mapping community helps in provision of localized
dbpedia datasets for 125 languages
⇒ own IRI recipe http://<langcode>.dbpedia.org/resource/<thing>
●
15 DBpedia chapters: autonomous management of mapping,
organisation of local community, hosting of datasets and services
●
also canonicalized datasets: facts derived from localized
Wikipedias, but only statements for resources also present in
Englisch DBpedia
⇒ usage of default http://dbpedia.org/resource/ namespace
49. DBpedia Tutorial 09.02.2015 http://dbpedia.org49
Related Work: Freebase
–extracts structured data from Wikipedia
–makes it available in RDF
Similarities:
–provides dumps of the extracted data
–provides APIs and endpoints to access the data
50. DBpedia Tutorial 09.02.2015 http://dbpedia.org50
Related Work: Freebase
Differences:
Freebase
- Freebase uses several
Sources –> higher
coverage
- Freebase can be directly
edited by users
- mainly run by Google
(discontiued)
Dbpedia
- RDF representation of Wikipedia
- hub on the Web of Data
- can be only indirectly edited by
modifying the content of
Wikipedia
- ongoing community effort
51. DBpedia Tutorial 09.02.2015 http://dbpedia.org51
Related Work: Wikidata
– Initialized by Wikimedia Germany e.V. in 2012
– free knowledge base about the world that can be read
– edited by humans and machines alike
– can offer a variety of statements from different sources
and dates
– does not offer the truth about things:
• (-) Berlin has a population of 3.5 million
• (+) Wikidata contains the statement about Berlin’s
population being 3.5 million as of 2011 according to
the German statistical office
– aim is to provide a single point of truth for facts in
Wikipedia across different language versions
52. DBpedia Tutorial 09.02.2015 http://dbpedia.org52
Current developments
●
Increased validation and curation process
(DBpedia+, RDFUnit)
●
ease creation of local DBpedia SPARQL endpoints
(Debian packaging, docker images of triple store
and dataset selection, automatic import)
●
novel more intuitive and feature rich browsing
interfaces
⇒ add corrections in place in LD viewer interfaces (?)
53. DBpedia Tutorial 09.02.2015 http://dbpedia.org53
How you can get involved
–set up new mirrors and endpoints of Dbpedia
–revise mappings and/or write new ones
–help improving the ontology
–get involved with the Irish/Gaelic chapter
bianca.pereira@insight-centre.org
caoilfhionn.lane@insight-centre.org
–edit Wikipedia
54. DBpedia Tutorial 09.02.2015 http://dbpedia.org54
Further Reading: Website
landing page:
http://dbpedia.org/About
overview over datasets (also info on localized
datasets):
http://wiki.dbpedia.org/Datasets
DBpeda data access oveview:
http://wiki.dbpedia.org/OnlineAccess
55. DBpedia Tutorial 09.02.2015 http://dbpedia.org55
Further Reading: Publications
2007
T: DBpedia: A Nucleus for a Web of Open Data
A: Auer, Bizer, Kobilarov, Lehmann,Cyganiak, Ives
http://www.cis.upenn.edu/~zives/research/dbpedia.pdf
2009
T: DBpedia - A Crystallization Point for the Web of Data
A: Bizer, Lehmann, Kobilarov, Auer, Becker, Cyganiak, Hellmann
http://jens-lehmann.org/files/2009/dbpedia_jws.pdf
2012
T: DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia
A: Lehmann, Isele, Jkob , Jentzsch, Kontokostas,Hellmann, Morsey, van Kleef, Auer,
Bizer
http://www.semantic-web-journal.net/system/files/swj499.pdf