SlideShare a Scribd company logo
1 of 108
Download to read offline
1
From Hyperlinks to Semantic Web Properties 

using Open Knowledge Extraction
Valentina Presutti
Andrea Giovanni Nuzzolese, Sergio Consoli,
Diego Reforgiato Recupero,
Aldo Gangemi
STLab-ISTC, CNR, Italy
Extends “Uncovering the Semantics of Wikipedia pagelinks”
Accepted as combined conference/journal paper @ EKAW 2014
• Some background
• Motivation and goal
• Introducing Open Knowledge Extraction (OKE)
• A method for extracting semantic relations
from hyperlinks
• Legalo and Legalo-Wikipedia: implementations
• Evaluation results
• Conclusion and future work
2
Outline
3
The Web as envisioned in 1989
by Tim Berners-Lee
URIs and links
having types
relationships amongst
named objects
4
The current Web vs the Semantic Web
(Marja-Riitta Koivunen and Eric Miller 2001)
resources (web documents)
and links
Resources and links
that can have explicit types
To populate the web with machine
understandable information so that artificial
intelligences (AI) can use it as background
knowledge for assisting humans in performing a
significant number of their daily tasks
5
The Semantic Web vision
• Standardised knowledge representation
and query languages: OWL, RDF, SPARQL
• Web of Data: huge amount of distributed,
interlinked and formally represented data
(linked data) -> related to the “open
data” initiative
6
today…
7
Semantic Web standard knowledge
representation languages
The RDF data model is based on triples
Each entity has a URI that identifies it in the global web space
subjects and predicates always have a URI
objects can also be literals
triples are connected when they share an entity
the result is an interconnected graph of triples, i.e. linked data
dbpedia:Paris*
dbpedia:France
“Paris”
dbpedia-owl:country
rdfs:label
*dbpedia: <http://dbpedia.org/>
7
Semantic Web standard knowledge
representation languages
The RDF data model is based on triples
Each entity has a URI that identifies it in the global web space
subjects and predicates always have a URI
objects can also be literals
triples are connected when they share an entity
the result is an interconnected graph of triples, i.e. linked data
dbpedia:Paris*
dbpedia:France
“Paris”
dbpedia-owl:country
http://www.france.fr
foaf:homepage
rdfs:label
*dbpedia: <http://dbpedia.org/>
8
May 2007
8
from 2 million (2007) to 31 billion (2011) triples
• Most of the Web of Data derived from
structured data (typically databases) or semi-
structured data (e.g. Wikipedia infoboxes)
• Web content is mostly natural language text
(web sites, news, forums, reviews, etc.)
• Such content is highly valuable for the
Semantic Web (question answering, opinion
mining, knowledge summarisation, etc.)
9
Limits and motivation
To extract as much relevant knowledge as
possible from web textual content and
publish it in the form of
Semantic Web triples
10
Overall Goal
• Most attention to enrich the Web of Data
by learning standard relations: e.g.,
membership (rdf:type), class taxonomy
(rdfs:subClassOf), entity linking
(owl:sameAs)
• What about general factual relations?
• e.g. part, participation, causality,
location, friendship, etc.
11
Approaches to linked data and
ontology learning
• Distant supervision: maps textual sentences to linked data triples
• no support for n-ary relation
• bound to already defined relation types
• Open Information Extraction: creates resources of triples composed of text fragments
(subject, relational phrase, object)
• not directly reusable as linked data
Open Challenges:
-> new relation discovery (both facts and relation types)
-> methods for automatically formalising discovered knowledge
-> usable label generation for new discovered relations
12
Relation extraction
“Paris is the capital city of France” dbpedia:Paris dbpedia-owl:country dbpedia:France
“John Stigall received a Bachelor of arts” ?
“John Stigall received a Bachelor of arts” (John Stigall, received, a Bachelor of arts)
Open Knowledge Extraction:
• unsupervised -> no need of huge annotated
corpora for training
• open-domain -> not bound to specific domain
• abstractive -> model the text, use external
knowledge resources, use summarisation and
language generation techniques
• producing formalised output -> directly
reusable as linked data
13
Contribution
14
OIE vs OKE
Subject Predicate Object Approach
John Stigall received a Bachelor of arts extractive
John Stigall received from the State
University of New York
aat Cortland
extractive
dbpedia:John_Sti
gall
myprop:receives
AcademicDegree
dbpedia:Bachelor_of_
Arts
abstractive
dbpedia:John_Sti
gall
myprop:receives
AcademicDegree
From
dbpedia:State_Univers
ity_of_New_York
abstractive
“John Stigall received a Bachelor of arts from
the State University of New York at Cortland “
15
Hypothesis
Hyperlinks are pragmatic traces of semantic relations
between two entities
“McCarthy often commented on world
affairs on the Usenet forums”
Texts surrounding hyperlinks are linguistic traces of
semantic relations between two entities
Pragmatic trace: presence of a hyperlink in a web page
16
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
Pragmatic trace: presence of a hyperlink in a web page
16
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
?
Linguistic trace: text surrounding a hyperlink
17
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
John MCCarthy
?
Linguistic trace: text surrounding a hyperlink
dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Lisp
(DBpedia)
dbpo:wikiPageWikiLink
17
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
John MCCarthy
?
Linguistic trace: text surrounding a hyperlink
dbpedia:John_McCarthy myont:developed dbpedia:Lisp
(Legalo)
legalo:developProgrammingLanguage
17
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
John MCCarthy
?
Linguistic trace: text surrounding a hyperlink
dbpedia:John_McCarthy myont:developed dbpedia:Lisp
(Legalo)
legalo:developProgrammingLanguage
rdfs:subPropertyOf http://swrc.ontoware.org/ontology#develops
17
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
John MCCarthy
?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
18
Hyperlinks
Hyperlinks can be either defined
by humans or automatically created by Knowledge Extraction (KE) tools
“McCarthy often commented on world
affairs on the Usenet forums”
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
18
Hyperlinks
Hyperlinks can be either defined
by humans or automatically created by Knowledge Extraction (KE) tools
“McCarthy often commented on world
affairs on the Usenet forums”
“McCarthy often commented on world
affairs on the Usenet forums”
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
18
Hyperlinks
Hyperlinks can be either defined
by humans or automatically created by Knowledge Extraction (KE) tools
“McCarthy often commented on world
affairs on the Usenet forums”
“McCarthy often commented on world
affairs on the Usenet forums”
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
19
What can we derive from a hyperlink?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
“McCarthy often commented on world
affairs on the Usenet forums”
19
What can we derive from a hyperlink?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
“McCarthy often commented on world
affairs on the Usenet forums”
McCarthy and Usenet are
(probably) related
19
What can we derive from a hyperlink?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
“McCarthy often commented on world
affairs on the Usenet forums”
dbpedia:John_McCarthy
dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are
(probably) related
19
What can we derive from a hyperlink?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
“McCarthy often commented on world
affairs on the Usenet forums”
dbpedia:John_McCarthy
dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are
(probably) related
The following entity pairs could be related:
Joey Foster Ellis, The New York Times
Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
19
What can we derive from a hyperlink?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
“McCarthy often commented on world
affairs on the Usenet forums”
dbpedia:John_McCarthy
dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are
(probably) related
The following entity pairs could be related:
Joey Foster Ellis, The New York Times
Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
dbpedia:Joey_Foster_Ellis
dul:associateWith
dbpedia:The_New_York_Times
19
What can we derive from a hyperlink?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”“McCarthy often commented on world
affairs on the Usenet forums”
dbpedia:John_McCarthy
dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are
(probably) related
The following entity pairs could be related:
Joey Foster Ellis, The New York Times
Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
dbpedia:Joey_Foster_Ellis
dul:associateWith
dbpedia:The_New_York_Times
“McCarthy often commented on world
affairs on the Usenet forums”
Linguistic traces provides us the meaning of relevant relations
20
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
s
{(esubj, eobj)i}
Joey Foster Ellis, The New York Times
Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
φs(esubj, eobj)i φs(Joey Foster Ellis, The New York Times)
Natural language
sentence
Set of entity
pairs
Relevant relations
evidenced by s
21
Relevant Relation Assessment
φs(esubj, eobj)i
φs(Joey Foster Ellis, The New York Times)
“Joey Foster Ellis has published on The New
York Times, and The Wall Street Journal.”
φs(The Wall Street Journal, The New York Times)
22
Usable predicate generation
λ ~ λi
, λi
∈ Λ
Λ ≡ {λ1, ..., λn}
labels for φs(esubj, eobj)i
defined for a LOD vocabulary
publish on
φs(Joey Foster Ellis, The New York Times)
“Joey Foster Ellis has published
on The New York Times, and
The Wall Street Journal.”
(has) published on
(has) published on journal
(is) author for
writes for
23
OWL/RDF formalisation
dbpedia:Joey_Foster_Ellis
legalo:publishOn dbpedia:The_New_York_Times .
legalo:publishOn a owl:ObjectProperty ;
rdfs:range … ;
rdfs:domain … ;
… .
φs(Joey Foster Ellis, The New York Times)
“Joey Foster Ellis has published
on The New York Times, and
The Wall Street Journal.”
A novel Open Knowledge Extraction (OKE) approach that performs
unsupervised, open domain, and abstractive knowledge extraction
from text for producing directly usable machine readable data
Several implementations of OKE: Fred, Legalo, Legalo-Wikipedia
Their evaluation performed with the help of crowdsourcing and the
creation of a gold standard, all showing high values of precision,
recall, and accuracy
24
Contribution
http://wit.istc.cnr.it/stlab-tools/legalo
http://wit.istc.cnr.it/stlab-tools/legalo/wikipedia
http://wit.istc.cnr.it/kore-dev/legalo
Developing version
http://wit.istc.cnr.it/stlab-tools/fred
http://wit.istc.cnr.it/kore-dev/fred
25
Method and implementation
25
Method and implementation
Frame-based formal
representation of s
25
Method and implementation
Frame-based formal
representation of s
Relation assessment
25
Method and implementation
Frame-based formal
representation of s
Label generation (extraction+abstraction)
Relation assessment
25
Method and implementation
Frame-based formal
representation of s
Label generation (extraction+abstraction)
Formalisation
Relation assessment
25
Method and implementation
Frame-based formal
representation of s
Label generation (extraction+abstraction)
Formalisation
Alignment to existing
LOD vocabularies
Relation assessment
25
Method and implementation
Frame-based formal
representation of s
Label generation (extraction+abstraction)
Formalisation
Alignment to existing
LOD vocabularies
Relation assessment
Linguistic traces of
Wikipedia pagelinks
subjects and objects are given by
DBpedia pagelinks
26
Let’s see how it works in detail…
27
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
27
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
29
Relation assessment
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
29
Relation assessment
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
G’=(V,E’) is the undirected version of G=(V,E)
G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
G’=(V,E’) is the undirected version of G=(V,E)
G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}
vi ∈ V is the node in G representing the entity ei mentioned in s.
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
P(vsubj,vobj)=[v0,edge1,…,edgen,vn], Psetsubj,obj ≡ {edge1, v1…, vn-1, edgen}
G’=(V,E’) is the undirected version of G=(V,E)
G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}
vi ∈ V is the node in G representing the entity ei mentioned in s.
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
31
Necessary condition: φs(esubj , eobj ) ∃P(vsubj , vobj )
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
31
Necessary condition: φs(esubj , eobj ) ∃P(vsubj , vobj )
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
31
Necessary condition: φs(esubj , eobj ) ∃P(vsubj , vobj )
φs(Rico Lebrun , Michelangelo ) ∃P(fred:Rico_Lebrun, fred:Michelangelo)
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
Let f be a node of G such that dul:Event(f)
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
Let f be a node of G such that dul:Event(f)
Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
Let f be a node of G such that dul:Event(f)
Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f
AgRole ≡ {ρm1 , ..., ρmm }, the set of VerbNet agentive roles
AgRole ⊆ Role
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
Let f be a node of G such that dul:Event(f)
Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f
AgRole ≡ {ρm1 , ..., ρmm }, the set of VerbNet agentive roles
AgRole ⊆ Role
Sufficient condition:
φs(esubj , eobj ) ⇐
∃P (vsubj , vobj ) and ∃f ∈ Psetsubj,obj such that dul:Event(f) and ρ(f,vsubj)∈AgRole
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Subject
Object
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Subject
Object
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Subject Object
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Subject Object
Relation assessment
34
Semantic Web triples and properties generation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
34
Semantic Web triples and properties generation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
35
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
35
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
teach
35
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
teach art
35
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
legalo:teachArtAt
teach art at
36
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
36
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
teach
36
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
legalo:teachAbout
teach about
37
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
38
“Rico Lebrun taught visual arts at the
Chouinard Art Institute and at the
Disney Studios. He was influenced by
Michelangelo and maintained a lifelong
affinity for Goya and Picasso.”
Semantic Web triples and properties generation
dbpedia:Rico_Lebrun s:teachAbout dbpedia:Visual_arts .
s:teachAbout a owl:ObjectProperty ;
rdfs:subPropertyOf fred:Teach;
rdfs:domain wibi:Artist ;
rdfs:range wibi:Art ;
grounding:definedFromFormalRepresentation
fred-graph:a6705cedbf9b53d10bbcdedaa3be9791da0a9e94 ;
grounding:derivedFromLinguisticEvidence s:linguisticEvidence ;
owl:propertyChainAxiom([ owl:inverseOf s:AgentTeach ] s:TopicTeach) .
_:b2 a alignment:Cell ;
alignment:entity1 s:teachAbout ;
alignment:entity2 <http://purl.org/vocab/aiiso/schema#teaches> ;
alignment:measure "0.846"^xsd:float ;
alignment:relation "equivalence" .
domain, range, subsumption
linguistic and formal scope
alignment to existing LOD vocabularies
39
Evaluation
40
Evaluation sample
*https://code.google.com/p/relation-extraction-corpus/downloads/list
corpus for relation extraction developed at Google Research
Crel−extraction*: 5 datasets each dedicated to a specific relation
{“pred”:
“/people/person/education./education/education/institution",
“sub":"/m/0g9zjv3",
“obj":"/m/0ckrjh",
“evidences”:
[{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov",
“snippet":
“Dmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He
started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in
many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually
creates scenic design and stage clothes for his plays.”}],
“judgments":
[{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"},
{"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"},
{"rater":"12022408018620867151","judgment":"yes"}]}
41
Evaluation sample
Cinstitution: random sample of 130 snippets (one entity pair) from
Crel−extraction for “attending or graduating from an institution”
Legalo produced always an output, either p or “no relation”
Ceducation: random sample of 130 snippets (one entity pair) from
Crel−extraction for “obtaining a degree of education”
Legalo produced always an output, either p or “no relation”
Cgeneral: random sample of 60 snippets from all Crel−extraction datasets
then broken into 186 single sentences with at least one entity pair.
Legalo produced 867 results, 262 p and 605 “no relation”
42
Hypothesis 1:
Relevant relation assessment
Legalo is able to assess if, given a sentence s, a relevant relation exists
which holds between two entities, according to the content of s:
∃φ.φs (esubj , eobj )
φs(esubj, eobj)i
43
Evaluation results for
Relevant Relation Assessment
The confidence value expresses a weighted value
for inter-rater agreement, by rater trust scores
44
Hypothesis 2:
Usable predicate generation
Legalo is able to generate a usable predicate λ
′
for a relevant relation φs between to
entities, expressed in a sentence ′ s: given λ, a label generated by Legalo for φs, and
λi a label generated by a human for φs the following holds:
λ ~ λi , λi ∈ Λ
λ ~ λi
, λi
∈ Λ
Λ ≡ {λ1, ..., λn}
labels for φs(esubj, eobj)i
defined for a LOD vocabulary
45
Evaluation results for
Usable Predicate Generation
Agree = 1, Partly Agree = 0.5, Disagree = 0
The confidence value expresses a weighted value
for inter-rater agreement, by rater trust scores
• Similarity score between human created {λi} and Legalo
λ, for a φs(esubj, eobj)i
• Jaccard distance measure (string similarity)
• Semantic similarity measure based on the SimLibrary
framework*
46
Evaluation results for
Usable Predicate Generation
Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence
5 Any 0.63 0.80 0.59
* G. Pirró and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010
47
Legalo-Wikipedia
(EKAW 2014)
3 raters, computer science, non-familiar with Legalo, familiar with Linked Data
• Open Knowledge Extraction: unsupervised, open-
domain, abstractive, producing OWL/RDF output
• OKE for relation extraction and its
implementations
• An evaluation showing high performance for both
tasks: relevant relation assessment, usable
predicate generation
48
Conclusion
49
Where are we heading?
• Improving frequent-subgraph heuristics
• Strategies for alignment to existing LOD
vocabularies (or to ODPs?)
• Reconciliation of properties, out of sentence scope
• Using Legalo for abstractive summarisation tasks
• Using Legalo for language generation
• Identifying ways to directly compare to other
approaches
50
Thanks for your attention
http://wit.istc.cnr.it/stlab-tools/
51
Evaluation sample
*https://code.google.com/p/relation-extraction-corpus/downloads/list
corpus for relation extraction developed at Google Research
Crel−extraction*: 5 datasets each dedicated to a specific relation
{“pred”:
“/people/person/education./education/education/institution",
“sub":"/m/0g9zjv3",
“obj":"/m/0ckrjh",
“evidences”:
[{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov",
“snippet":
“Dmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He
started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in
many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually
creates scenic design and stage clothes for his plays.”}],
“judgments":
[{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"},
{"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"},
{"rater":"12022408018620867151","judgment":"yes"}]}
52
Evaluation sample
Cinstitution: random sample of 130 snippets (one entity pair) from
Crel−extraction for “attending or graduating from an institution”
Legalo produced always an output, either p or “no relation”
Ceducation: random sample of 130 snippets (one entity pair) from
Crel−extraction for “obtaining a degree of education”
Legalo produced always an output, either p or “no relation”
Cgeneral: random sample of 60 snippets from all Crel−extraction datasets
then broken into 186 single sentences with at least one entity pair.
Legalo produced 867 results, 262 p and 605 “no relation”
53
Hypothesis 1:
Relevant relation assessment
Legalo is able to assess if, given a sentence s, a relevant relation exists
which holds between two entities, according to the content of s:
∃φ.φs (esubj , eobj )
φs(esubj, eobj)i
54
Crowdsourcing tasks for
Relevant Relation Assessment
Task 1:
Assessing if a sentence s is an evidence of a referenced relation (i.e.
either “institution” or “education”) between two entities, mentioned in s.
Based on data from Cinstitution and Ceducation, respectively
Task 2:
Assessing if a sentence s is an evidence of any relation between two given
entities mentioned in s.
Based on data from Cgeneral
At least 3 raters with t>0.70 Answer could be “yes” or “no”
55
Evaluation results for
Relevant Relation Assessment
The confidence value expresses a weighted value
for inter-rater agreement, by rater trust scores
56
Hypothesis 2:
Usable predicate generation
Legalo is able to generate a usable predicate λ
′
for a relevant relation φs between to
entities, expressed in a sentence ′ s: given λ, a label generated by Legalo for φs, and
λi a label generated by a human for φs the following holds:
λ ~ λi , λi ∈ Λ
λ ~ λi
, λi
∈ Λ
Λ ≡ {λ1, ..., λn}
labels for φs(esubj, eobj)i
defined for a LOD vocabulary
57
Crowdsourcing tasks for
Usable predicate generation
Task 3:
Judging if λ generated by a machine expresses is a good summarisation of a
specific relation (i.e. either “institution” or “education”) between two given
entities mentioned in s, according to the evidence provided by s.
Based on data from Cinstitution and Ceducation, respectively
Task 4:
judging if a predicate λ is a good summarisation of a relation expressed by
the content of s, between two given entities mentioned in s, according to the
evidence provided by s.
Based on data from Cgeneral
At least 3 raters with t>0.70 Answer could be Agree, Partly Agree, Disagree
58
Evaluation results for
Usable Predicate Generation
Agree = 1, Partly Agree = 0.5, Disagree = 0
The confidence value expresses a weighted value
for inter-rater agreement, by rater trust scores
59
Crowdsourcing tasks for
Usable predicate generation
Task 5:
Creating a phrase λ that summarises the relation expressed by the content of
s, between two given entities mentioned in s, according to the evidence
provided by s.
Based on data from Cgeneral
At least 3 raters with t>0.60 Answer was open
• Similarity score between human created {λi} and Legalo
λ, for a φs(esubj, eobj)i
• Jaccard distance measure (string similarity)
• Semantic similarity measure based on the SimLibrary
framework*
60
Evaluation results for
Usable Predicate Generation
Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence
5 Any 0.63 0.80 0.59
* G. Pirró and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010
61
Legalo-Wikipedia
(EKAW 2014)
3 raters, computer science, non-familiar with Legalo, familiar with Linked Data
62
Confidence score
Given a task unit u,
a set of possible judgements {ji},
with i = 1, ...n,
{tk} a set of trust scores
each representing a rater,
with k=1,…,m
tsum=sumk=1,…m(tk)
is the sum of trust scores of raters
giving judgement on u
trust(ji)
is the sum of tk values of raters
that choose judgement ji
confidence(ji, u) for judgement ji
on the task unit u is
confidence(“yes”, 582275117) =
= (0.95+0.98)/2.82 = 0.68
confidence(“no”, 582275117) =
= (0.89/2.82 = 0.31

More Related Content

What's hot

DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
Herbert Van de Sompel
 
Generous Interfaces - rich websites for digital collections
Generous Interfaces - rich websites for digital collections Generous Interfaces - rich websites for digital collections
Generous Interfaces - rich websites for digital collections
Mitchell Whitelaw
 
Los Angeles R users group - Nov 17 2010 - Part 2
Los Angeles R users group - Nov 17 2010 - Part 2Los Angeles R users group - Nov 17 2010 - Part 2
Los Angeles R users group - Nov 17 2010 - Part 2
rusersla
 

What's hot (20)

Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Generous Interfaces - rich websites for digital collections
Generous Interfaces - rich websites for digital collections Generous Interfaces - rich websites for digital collections
Generous Interfaces - rich websites for digital collections
 
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
 
Web Data Management in the RDF Age
Web Data Management in the RDF AgeWeb Data Management in the RDF Age
Web Data Management in the RDF Age
 
Los Angeles R users group - Nov 17 2010 - Part 2
Los Angeles R users group - Nov 17 2010 - Part 2Los Angeles R users group - Nov 17 2010 - Part 2
Los Angeles R users group - Nov 17 2010 - Part 2
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
What you Can Make Out of Linked Data
What you Can Make Out of Linked DataWhat you Can Make Out of Linked Data
What you Can Make Out of Linked Data
 
Semantics-aware Graph-based Recommender Systems exploiting Linked Open Data
Semantics-aware Graph-based Recommender Systems exploiting Linked Open DataSemantics-aware Graph-based Recommender Systems exploiting Linked Open Data
Semantics-aware Graph-based Recommender Systems exploiting Linked Open Data
 
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
 
Ir1
Ir1Ir1
Ir1
 
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
 
Web Data Management with RDF
Web Data Management with RDFWeb Data Management with RDF
Web Data Management with RDF
 
The aDORe Federation Architecture
The aDORe Federation ArchitectureThe aDORe Federation Architecture
The aDORe Federation Architecture
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
Linked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N RecommendationsLinked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N Recommendations
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchers
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 

Similar to From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction

Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Olivier Grisel
 
Web 20 E Oltre 1202297800291589 3
Web 20 E Oltre 1202297800291589 3Web 20 E Oltre 1202297800291589 3
Web 20 E Oltre 1202297800291589 3
Universita' di Bari
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
Anna Fensel
 

Similar to From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction (20)

Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic Web
 
The Future of Social Networks on the Internet: The Need for Semantics
The Future of Social Networks on the Internet: The Need for SemanticsThe Future of Social Networks on the Internet: The Need for Semantics
The Future of Social Networks on the Internet: The Need for Semantics
 
Exploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLExploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXL
 
Enhancing the Web Experience
Enhancing the Web ExperienceEnhancing the Web Experience
Enhancing the Web Experience
 
Cf intro
Cf introCf intro
Cf intro
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
The Social Semantic Web: An Introduction
The Social Semantic Web: An IntroductionThe Social Semantic Web: An Introduction
The Social Semantic Web: An Introduction
 
Social Networking and Collaboration Tools for Enterprise 2.0
Social Networking and Collaboration Tools for Enterprise 2.0Social Networking and Collaboration Tools for Enterprise 2.0
Social Networking and Collaboration Tools for Enterprise 2.0
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
 
Semantic engagement
Semantic engagementSemantic engagement
Semantic engagement
 
Summary Day 2
Summary Day 2Summary Day 2
Summary Day 2
 
杭州讲座 石田英敬
杭州讲座 石田英敬杭州讲座 石田英敬
杭州讲座 石田英敬
 
Intro to Web Science (Fall 2013)
Intro to Web Science (Fall 2013)Intro to Web Science (Fall 2013)
Intro to Web Science (Fall 2013)
 
Web 20 E Oltre 1202297800291589 3
Web 20 E Oltre 1202297800291589 3Web 20 E Oltre 1202297800291589 3
Web 20 E Oltre 1202297800291589 3
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
 
AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101  AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101
 
The wider environment of open scholarship – Jisc and CNI conference 10 July ...
The wider environment of open scholarship – Jisc and CNI conference 10 July ...The wider environment of open scholarship – Jisc and CNI conference 10 July ...
The wider environment of open scholarship – Jisc and CNI conference 10 July ...
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
Elsevier Gran Challenge: The living document
Elsevier Gran Challenge: The living documentElsevier Gran Challenge: The living document
Elsevier Gran Challenge: The living document
 
09 semantic web & ontologies
09 semantic web & ontologies09 semantic web & ontologies
09 semantic web & ontologies
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 

Recently uploaded (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 

From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction

  • 1. 1 From Hyperlinks to Semantic Web Properties 
 using Open Knowledge Extraction Valentina Presutti Andrea Giovanni Nuzzolese, Sergio Consoli, Diego Reforgiato Recupero, Aldo Gangemi STLab-ISTC, CNR, Italy Extends “Uncovering the Semantics of Wikipedia pagelinks” Accepted as combined conference/journal paper @ EKAW 2014
  • 2. • Some background • Motivation and goal • Introducing Open Knowledge Extraction (OKE) • A method for extracting semantic relations from hyperlinks • Legalo and Legalo-Wikipedia: implementations • Evaluation results • Conclusion and future work 2 Outline
  • 3. 3 The Web as envisioned in 1989 by Tim Berners-Lee URIs and links having types relationships amongst named objects
  • 4. 4 The current Web vs the Semantic Web (Marja-Riitta Koivunen and Eric Miller 2001) resources (web documents) and links Resources and links that can have explicit types
  • 5. To populate the web with machine understandable information so that artificial intelligences (AI) can use it as background knowledge for assisting humans in performing a significant number of their daily tasks 5 The Semantic Web vision
  • 6. • Standardised knowledge representation and query languages: OWL, RDF, SPARQL • Web of Data: huge amount of distributed, interlinked and formally represented data (linked data) -> related to the “open data” initiative 6 today…
  • 7. 7 Semantic Web standard knowledge representation languages The RDF data model is based on triples Each entity has a URI that identifies it in the global web space subjects and predicates always have a URI objects can also be literals triples are connected when they share an entity the result is an interconnected graph of triples, i.e. linked data dbpedia:Paris* dbpedia:France “Paris” dbpedia-owl:country rdfs:label *dbpedia: <http://dbpedia.org/>
  • 8. 7 Semantic Web standard knowledge representation languages The RDF data model is based on triples Each entity has a URI that identifies it in the global web space subjects and predicates always have a URI objects can also be literals triples are connected when they share an entity the result is an interconnected graph of triples, i.e. linked data dbpedia:Paris* dbpedia:France “Paris” dbpedia-owl:country http://www.france.fr foaf:homepage rdfs:label *dbpedia: <http://dbpedia.org/>
  • 10. 8 from 2 million (2007) to 31 billion (2011) triples
  • 11. • Most of the Web of Data derived from structured data (typically databases) or semi- structured data (e.g. Wikipedia infoboxes) • Web content is mostly natural language text (web sites, news, forums, reviews, etc.) • Such content is highly valuable for the Semantic Web (question answering, opinion mining, knowledge summarisation, etc.) 9 Limits and motivation
  • 12. To extract as much relevant knowledge as possible from web textual content and publish it in the form of Semantic Web triples 10 Overall Goal
  • 13. • Most attention to enrich the Web of Data by learning standard relations: e.g., membership (rdf:type), class taxonomy (rdfs:subClassOf), entity linking (owl:sameAs) • What about general factual relations? • e.g. part, participation, causality, location, friendship, etc. 11 Approaches to linked data and ontology learning
  • 14. • Distant supervision: maps textual sentences to linked data triples • no support for n-ary relation • bound to already defined relation types • Open Information Extraction: creates resources of triples composed of text fragments (subject, relational phrase, object) • not directly reusable as linked data Open Challenges: -> new relation discovery (both facts and relation types) -> methods for automatically formalising discovered knowledge -> usable label generation for new discovered relations 12 Relation extraction “Paris is the capital city of France” dbpedia:Paris dbpedia-owl:country dbpedia:France “John Stigall received a Bachelor of arts” ? “John Stigall received a Bachelor of arts” (John Stigall, received, a Bachelor of arts)
  • 15. Open Knowledge Extraction: • unsupervised -> no need of huge annotated corpora for training • open-domain -> not bound to specific domain • abstractive -> model the text, use external knowledge resources, use summarisation and language generation techniques • producing formalised output -> directly reusable as linked data 13 Contribution
  • 16. 14 OIE vs OKE Subject Predicate Object Approach John Stigall received a Bachelor of arts extractive John Stigall received from the State University of New York aat Cortland extractive dbpedia:John_Sti gall myprop:receives AcademicDegree dbpedia:Bachelor_of_ Arts abstractive dbpedia:John_Sti gall myprop:receives AcademicDegree From dbpedia:State_Univers ity_of_New_York abstractive “John Stigall received a Bachelor of arts from the State University of New York at Cortland “
  • 17. 15 Hypothesis Hyperlinks are pragmatic traces of semantic relations between two entities “McCarthy often commented on world affairs on the Usenet forums” Texts surrounding hyperlinks are linguistic traces of semantic relations between two entities
  • 18. Pragmatic trace: presence of a hyperlink in a web page 16 He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.
  • 19. Pragmatic trace: presence of a hyperlink in a web page 16 He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI. ?
  • 20. Linguistic trace: text surrounding a hyperlink 17 He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI. John MCCarthy ?
  • 21. Linguistic trace: text surrounding a hyperlink dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Lisp (DBpedia) dbpo:wikiPageWikiLink 17 He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI. John MCCarthy ?
  • 22. Linguistic trace: text surrounding a hyperlink dbpedia:John_McCarthy myont:developed dbpedia:Lisp (Legalo) legalo:developProgrammingLanguage 17 He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI. John MCCarthy ?
  • 23. Linguistic trace: text surrounding a hyperlink dbpedia:John_McCarthy myont:developed dbpedia:Lisp (Legalo) legalo:developProgrammingLanguage rdfs:subPropertyOf http://swrc.ontoware.org/ontology#develops 17 He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI. John MCCarthy ?
  • 24. “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” 18 Hyperlinks Hyperlinks can be either defined by humans or automatically created by Knowledge Extraction (KE) tools “McCarthy often commented on world affairs on the Usenet forums”
  • 25. “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” 18 Hyperlinks Hyperlinks can be either defined by humans or automatically created by Knowledge Extraction (KE) tools “McCarthy often commented on world affairs on the Usenet forums” “McCarthy often commented on world affairs on the Usenet forums”
  • 26. “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” 18 Hyperlinks Hyperlinks can be either defined by humans or automatically created by Knowledge Extraction (KE) tools “McCarthy often commented on world affairs on the Usenet forums” “McCarthy often commented on world affairs on the Usenet forums” “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.”
  • 27. 19 What can we derive from a hyperlink? “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” “McCarthy often commented on world affairs on the Usenet forums”
  • 28. 19 What can we derive from a hyperlink? “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” “McCarthy often commented on world affairs on the Usenet forums” McCarthy and Usenet are (probably) related
  • 29. 19 What can we derive from a hyperlink? “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” “McCarthy often commented on world affairs on the Usenet forums” dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet McCarthy and Usenet are (probably) related
  • 30. 19 What can we derive from a hyperlink? “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” “McCarthy often commented on world affairs on the Usenet forums” dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet McCarthy and Usenet are (probably) related The following entity pairs could be related: Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times The New York Times, The Wall Street Journal
  • 31. 19 What can we derive from a hyperlink? “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” “McCarthy often commented on world affairs on the Usenet forums” dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet McCarthy and Usenet are (probably) related The following entity pairs could be related: Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times The New York Times, The Wall Street Journal dbpedia:Joey_Foster_Ellis dul:associateWith dbpedia:The_New_York_Times
  • 32. 19 What can we derive from a hyperlink? “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.”“McCarthy often commented on world affairs on the Usenet forums” dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet McCarthy and Usenet are (probably) related The following entity pairs could be related: Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times The New York Times, The Wall Street Journal dbpedia:Joey_Foster_Ellis dul:associateWith dbpedia:The_New_York_Times “McCarthy often commented on world affairs on the Usenet forums” Linguistic traces provides us the meaning of relevant relations
  • 33. 20 “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” s {(esubj, eobj)i} Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times The New York Times, The Wall Street Journal φs(esubj, eobj)i φs(Joey Foster Ellis, The New York Times) Natural language sentence Set of entity pairs Relevant relations evidenced by s
  • 34. 21 Relevant Relation Assessment φs(esubj, eobj)i φs(Joey Foster Ellis, The New York Times) “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” φs(The Wall Street Journal, The New York Times)
  • 35. 22 Usable predicate generation λ ~ λi , λi ∈ Λ Λ ≡ {λ1, ..., λn} labels for φs(esubj, eobj)i defined for a LOD vocabulary publish on φs(Joey Foster Ellis, The New York Times) “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” (has) published on (has) published on journal (is) author for writes for
  • 36. 23 OWL/RDF formalisation dbpedia:Joey_Foster_Ellis legalo:publishOn dbpedia:The_New_York_Times . legalo:publishOn a owl:ObjectProperty ; rdfs:range … ; rdfs:domain … ; … . φs(Joey Foster Ellis, The New York Times) “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.”
  • 37. A novel Open Knowledge Extraction (OKE) approach that performs unsupervised, open domain, and abstractive knowledge extraction from text for producing directly usable machine readable data Several implementations of OKE: Fred, Legalo, Legalo-Wikipedia Their evaluation performed with the help of crowdsourcing and the creation of a gold standard, all showing high values of precision, recall, and accuracy 24 Contribution http://wit.istc.cnr.it/stlab-tools/legalo http://wit.istc.cnr.it/stlab-tools/legalo/wikipedia http://wit.istc.cnr.it/kore-dev/legalo Developing version http://wit.istc.cnr.it/stlab-tools/fred http://wit.istc.cnr.it/kore-dev/fred
  • 39. 25 Method and implementation Frame-based formal representation of s
  • 40. 25 Method and implementation Frame-based formal representation of s Relation assessment
  • 41. 25 Method and implementation Frame-based formal representation of s Label generation (extraction+abstraction) Relation assessment
  • 42. 25 Method and implementation Frame-based formal representation of s Label generation (extraction+abstraction) Formalisation Relation assessment
  • 43. 25 Method and implementation Frame-based formal representation of s Label generation (extraction+abstraction) Formalisation Alignment to existing LOD vocabularies Relation assessment
  • 44. 25 Method and implementation Frame-based formal representation of s Label generation (extraction+abstraction) Formalisation Alignment to existing LOD vocabularies Relation assessment Linguistic traces of Wikipedia pagelinks subjects and objects are given by DBpedia pagelinks
  • 45. 26 Let’s see how it works in detail…
  • 46. 27 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 47. 27 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 48. 28 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 49. 28 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 50. 28 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 51. 28 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 52. 28 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 53. 28 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 54. 29 Relation assessment “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 55. 29 Relation assessment “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 56. 30 Relation assessment “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 57. 30 Relation assessment G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen} “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 58. 30 Relation assessment G’=(V,E’) is the undirected version of G=(V,E) G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen} “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 59. 30 Relation assessment G’=(V,E’) is the undirected version of G=(V,E) G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen} vi ∈ V is the node in G representing the entity ei mentioned in s. “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 60. 30 Relation assessment P(vsubj,vobj)=[v0,edge1,…,edgen,vn], Psetsubj,obj ≡ {edge1, v1…, vn-1, edgen} G’=(V,E’) is the undirected version of G=(V,E) G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen} vi ∈ V is the node in G representing the entity ei mentioned in s. “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 61. 31 Necessary condition: φs(esubj , eobj ) ∃P(vsubj , vobj ) “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 62. 31 Necessary condition: φs(esubj , eobj ) ∃P(vsubj , vobj ) “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 63. 31 Necessary condition: φs(esubj , eobj ) ∃P(vsubj , vobj ) φs(Rico Lebrun , Michelangelo ) ∃P(fred:Rico_Lebrun, fred:Michelangelo) “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 64. 32 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 65. 32 Let f be a node of G such that dul:Event(f) “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 66. 32 Let f be a node of G such that dul:Event(f) Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 67. 32 Let f be a node of G such that dul:Event(f) Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f AgRole ≡ {ρm1 , ..., ρmm }, the set of VerbNet agentive roles AgRole ⊆ Role “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 68. 32 Let f be a node of G such that dul:Event(f) Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f AgRole ≡ {ρm1 , ..., ρmm }, the set of VerbNet agentive roles AgRole ⊆ Role Sufficient condition: φs(esubj , eobj ) ⇐ ∃P (vsubj , vobj ) and ∃f ∈ Psetsubj,obj such that dul:Event(f) and ρ(f,vsubj)∈AgRole “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 69. 33 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 70. 33 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Subject Object Relation assessment
  • 71. 33 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Subject Object Relation assessment
  • 72. 33 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Subject Object Relation assessment
  • 73. 33 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Subject Object Relation assessment
  • 74. 34 Semantic Web triples and properties generation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 75. 34 Semantic Web triples and properties generation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 76. 35 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object
  • 77. 35 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object teach
  • 78. 35 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object teach art
  • 79. 35 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object legalo:teachArtAt teach art at
  • 80. 36 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object
  • 81. 36 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object teach
  • 82. 36 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object legalo:teachAbout teach about
  • 83. 37 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation
  • 84. 38 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation dbpedia:Rico_Lebrun s:teachAbout dbpedia:Visual_arts . s:teachAbout a owl:ObjectProperty ; rdfs:subPropertyOf fred:Teach; rdfs:domain wibi:Artist ; rdfs:range wibi:Art ; grounding:definedFromFormalRepresentation fred-graph:a6705cedbf9b53d10bbcdedaa3be9791da0a9e94 ; grounding:derivedFromLinguisticEvidence s:linguisticEvidence ; owl:propertyChainAxiom([ owl:inverseOf s:AgentTeach ] s:TopicTeach) . _:b2 a alignment:Cell ; alignment:entity1 s:teachAbout ; alignment:entity2 <http://purl.org/vocab/aiiso/schema#teaches> ; alignment:measure "0.846"^xsd:float ; alignment:relation "equivalence" . domain, range, subsumption linguistic and formal scope alignment to existing LOD vocabularies
  • 86. 40 Evaluation sample *https://code.google.com/p/relation-extraction-corpus/downloads/list corpus for relation extraction developed at Google Research Crel−extraction*: 5 datasets each dedicated to a specific relation {“pred”: “/people/person/education./education/education/institution", “sub":"/m/0g9zjv3", “obj":"/m/0ckrjh", “evidences”: [{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov", “snippet": “Dmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually creates scenic design and stage clothes for his plays.”}], “judgments": [{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"}, {"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"}, {"rater":"12022408018620867151","judgment":"yes"}]}
  • 87. 41 Evaluation sample Cinstitution: random sample of 130 snippets (one entity pair) from Crel−extraction for “attending or graduating from an institution” Legalo produced always an output, either p or “no relation” Ceducation: random sample of 130 snippets (one entity pair) from Crel−extraction for “obtaining a degree of education” Legalo produced always an output, either p or “no relation” Cgeneral: random sample of 60 snippets from all Crel−extraction datasets then broken into 186 single sentences with at least one entity pair. Legalo produced 867 results, 262 p and 605 “no relation”
  • 88. 42 Hypothesis 1: Relevant relation assessment Legalo is able to assess if, given a sentence s, a relevant relation exists which holds between two entities, according to the content of s: ∃φ.φs (esubj , eobj ) φs(esubj, eobj)i
  • 89. 43 Evaluation results for Relevant Relation Assessment The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores
  • 90. 44 Hypothesis 2: Usable predicate generation Legalo is able to generate a usable predicate λ ′ for a relevant relation φs between to entities, expressed in a sentence ′ s: given λ, a label generated by Legalo for φs, and λi a label generated by a human for φs the following holds: λ ~ λi , λi ∈ Λ λ ~ λi , λi ∈ Λ Λ ≡ {λ1, ..., λn} labels for φs(esubj, eobj)i defined for a LOD vocabulary
  • 91. 45 Evaluation results for Usable Predicate Generation Agree = 1, Partly Agree = 0.5, Disagree = 0 The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores
  • 92. • Similarity score between human created {λi} and Legalo λ, for a φs(esubj, eobj)i • Jaccard distance measure (string similarity) • Semantic similarity measure based on the SimLibrary framework* 46 Evaluation results for Usable Predicate Generation Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence 5 Any 0.63 0.80 0.59 * G. Pirró and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010
  • 93. 47 Legalo-Wikipedia (EKAW 2014) 3 raters, computer science, non-familiar with Legalo, familiar with Linked Data
  • 94. • Open Knowledge Extraction: unsupervised, open- domain, abstractive, producing OWL/RDF output • OKE for relation extraction and its implementations • An evaluation showing high performance for both tasks: relevant relation assessment, usable predicate generation 48 Conclusion
  • 95. 49 Where are we heading? • Improving frequent-subgraph heuristics • Strategies for alignment to existing LOD vocabularies (or to ODPs?) • Reconciliation of properties, out of sentence scope • Using Legalo for abstractive summarisation tasks • Using Legalo for language generation • Identifying ways to directly compare to other approaches
  • 96. 50 Thanks for your attention http://wit.istc.cnr.it/stlab-tools/
  • 97. 51 Evaluation sample *https://code.google.com/p/relation-extraction-corpus/downloads/list corpus for relation extraction developed at Google Research Crel−extraction*: 5 datasets each dedicated to a specific relation {“pred”: “/people/person/education./education/education/institution", “sub":"/m/0g9zjv3", “obj":"/m/0ckrjh", “evidences”: [{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov", “snippet": “Dmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually creates scenic design and stage clothes for his plays.”}], “judgments": [{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"}, {"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"}, {"rater":"12022408018620867151","judgment":"yes"}]}
  • 98. 52 Evaluation sample Cinstitution: random sample of 130 snippets (one entity pair) from Crel−extraction for “attending or graduating from an institution” Legalo produced always an output, either p or “no relation” Ceducation: random sample of 130 snippets (one entity pair) from Crel−extraction for “obtaining a degree of education” Legalo produced always an output, either p or “no relation” Cgeneral: random sample of 60 snippets from all Crel−extraction datasets then broken into 186 single sentences with at least one entity pair. Legalo produced 867 results, 262 p and 605 “no relation”
  • 99. 53 Hypothesis 1: Relevant relation assessment Legalo is able to assess if, given a sentence s, a relevant relation exists which holds between two entities, according to the content of s: ∃φ.φs (esubj , eobj ) φs(esubj, eobj)i
  • 100. 54 Crowdsourcing tasks for Relevant Relation Assessment Task 1: Assessing if a sentence s is an evidence of a referenced relation (i.e. either “institution” or “education”) between two entities, mentioned in s. Based on data from Cinstitution and Ceducation, respectively Task 2: Assessing if a sentence s is an evidence of any relation between two given entities mentioned in s. Based on data from Cgeneral At least 3 raters with t>0.70 Answer could be “yes” or “no”
  • 101. 55 Evaluation results for Relevant Relation Assessment The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores
  • 102. 56 Hypothesis 2: Usable predicate generation Legalo is able to generate a usable predicate λ ′ for a relevant relation φs between to entities, expressed in a sentence ′ s: given λ, a label generated by Legalo for φs, and λi a label generated by a human for φs the following holds: λ ~ λi , λi ∈ Λ λ ~ λi , λi ∈ Λ Λ ≡ {λ1, ..., λn} labels for φs(esubj, eobj)i defined for a LOD vocabulary
  • 103. 57 Crowdsourcing tasks for Usable predicate generation Task 3: Judging if λ generated by a machine expresses is a good summarisation of a specific relation (i.e. either “institution” or “education”) between two given entities mentioned in s, according to the evidence provided by s. Based on data from Cinstitution and Ceducation, respectively Task 4: judging if a predicate λ is a good summarisation of a relation expressed by the content of s, between two given entities mentioned in s, according to the evidence provided by s. Based on data from Cgeneral At least 3 raters with t>0.70 Answer could be Agree, Partly Agree, Disagree
  • 104. 58 Evaluation results for Usable Predicate Generation Agree = 1, Partly Agree = 0.5, Disagree = 0 The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores
  • 105. 59 Crowdsourcing tasks for Usable predicate generation Task 5: Creating a phrase λ that summarises the relation expressed by the content of s, between two given entities mentioned in s, according to the evidence provided by s. Based on data from Cgeneral At least 3 raters with t>0.60 Answer was open
  • 106. • Similarity score between human created {λi} and Legalo λ, for a φs(esubj, eobj)i • Jaccard distance measure (string similarity) • Semantic similarity measure based on the SimLibrary framework* 60 Evaluation results for Usable Predicate Generation Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence 5 Any 0.63 0.80 0.59 * G. Pirró and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010
  • 107. 61 Legalo-Wikipedia (EKAW 2014) 3 raters, computer science, non-familiar with Legalo, familiar with Linked Data
  • 108. 62 Confidence score Given a task unit u, a set of possible judgements {ji}, with i = 1, ...n, {tk} a set of trust scores each representing a rater, with k=1,…,m tsum=sumk=1,…m(tk) is the sum of trust scores of raters giving judgement on u trust(ji) is the sum of tk values of raters that choose judgement ji confidence(ji, u) for judgement ji on the task unit u is confidence(“yes”, 582275117) = = (0.95+0.98)/2.82 = 0.68 confidence(“no”, 582275117) = = (0.89/2.82 = 0.31