The document discusses using usage analysis to improve ontology engineering. It describes analyzing query logs over datasets like DBpedia to identify frequently queried triples and patterns. This can reveal missing or inconsistent data and suggest new links between entities. The analysis helps increase data quality and acquire new knowledge that benefits both the dataset and Web of Data as a whole. While complete automation may not be needed, supporting usage analysis and endpoint access allows publishers to play a role in maintaining datasets and the Web of Data.
3. A
Usage-‐dependent
Life
Cycle
Request
to
put
• toy
train
away
the
“train”
• toy
train
• made
of
plas)c
• SELECT
*
WHERE
?t
• made
of
wood
a:madeOf
a:Plas)c
• SELECT
*
WHERE
?t
b:madeOf
b:Wood
Nego)ate
Enter
the
room
understanding
USAGE
4. Yet
another…
OTK
…
The
NeOn
Maintenance
METHONTOLOGY
Black
Box
DILIGENT
Make
it
less
a
methodology
but
support
the
people
to
get
their
“Things”
done!
5. Who
is
hurt
by
that?
• rather
small/simple
ontologies
– min.
effort
for
OE
– “under-‐engineered”
• unknown
user
requirements
6. Hey
“LOD
people”,
do
you
think
that
ontology
engineering
maaers?
Usage-‐based
ontology
engineering
7. Survey
covering
approx.
25%
of
all
cloud
datasets
• size
• complexity
• engineering
methodology
• …
Publishers
of
75%
of
the
dataset
do
not
feel
99%
responsible
for
their
data?
Survey
ran
in
October
2010
9. Usage?
Request
to
put
away
the
“train”
• SELECT
*
WHERE
?t
a:madeOf
a:Plas)c
Yes*!
But
beyond?
• SELECT
*
WHERE
?t
b:madeOf
b:Wood
USAGE
• What
about
the
future
of
SPARQL
endpoints
on
the
WoD?
*
W.r.t.
an
architecture
proposed
by
a
famous
“Web-‐Extremist”
10. You
should
have
a
query
endpoint!
Effort Distribution between Publisher and Consumer
• You
get
Pays-As-You-Go
something
valuable
Consumer generates/
data mines links
en the data publisher,t
data integration effort is
out
of
i the
publisher
which
helps
mer and third parties. !"#
Effort
$%&'()) *(+(
Distribution
,-+&.'(+"/-
lisher
s data as RDFyou
to
play
011/'+
your
role
erms from common vocabularies
s and publishes mappings
Publisher provides
links
Links as
hints
ties on
the
p
WoD!
pointing at y
g your data
mappings to the Web
567)"83&'98
011/'+
23"'4
5('+:
011/'+
Christian Bizer: Pay-as-you-go Data Integration (21/9/2010)
sumer
;/-86<&'98
o the rest 011/'+
ta mining techniques for
11. Usage
Analysis
• queries
• paaerns
• triples
• primi)ves
visualize
heat
maps
zoom
in
and
see
details
12. Some
Results
(DBpedia
Analysis)
• ns:Band
ns:instrument
?x
inconsistent
• ns:Band
ns:genre
?y
data
• ns:Band
ns:associatedBand
?z
• ns:Band
ns:knownFor
?x
• ns:Band
ns:na)onality
?y
missing
facts
Complete
analysis
can
be
found
at
hap://page.mi.fu-‐berlin.de/mluczak/pub/visual-‐analysis-‐of-‐web-‐of-‐data-‐usage-‐dbpedia33/
13. Some
Thoughts
about
Benefit
• usage
analysis
helps
to
acquire
new
knowledge
– links
between
data
helps
to
increase
• lightweight
approach
the
quality
of
data
on
helps
to
bootstrap
the
Web
linked
data
– external
schema
It
is
not
necessary
to
automate
everything
if
the
result
has
enough
(business)
value
in
a
problem
domain
anyway.
14. make
it
less
a
methodology
• LOD
vocabularies
are
specific
ontologies
• and
need
specific
life
cycle
our
data
provide
(query)
access
to
y support
T • endpoint
and
pcan
your
to
mon
the
WoD
usage
analysis
lay
help
role
aintain
them
a • (and
the
mplicitly
necessary
to
automate
it
is
not
i data)
k • things
ahat
enable
automa)on
publisher
this
is
t
benefit
for
the
dataset
e
and
the
Web
of
data
as
a
whole
A Hey
“LOD
people”,
do
you
think
that
w dataset
maintenance
maaers?
a
y
Markus
Luczak-‐Rösch
(luczak@inf.fu-‐berlin.de)
Freie
Universität
Berlin,
Networked
Informa)on
Systems
(www.ag-‐nbi.de)
15. Actual
Addi)on
• “15.500.000
people
in
Germany
are
not
willing
to
use
the
internet”
– emphasis
on
the
ESWC
discussion:
bridging
the
gap
(directly
or
indirectly)
between
these
people
and
the
internet/Web
has
a
high
poten)al
to
influence
societal
transforma)on
(they
are
not
going
to
use
a
browser
or
an
iPhone
and
they
do
not
care
for
seman)cs)
Source:
ARD-‐Morgenmagazin,
08-‐07-‐2011