Yes, this is a small workshop...
2 paper presentations out of 4 papers submitted
but
A lot of interest from both communities (KD/ML and Linked Data).
LD4KD is more than a set of paper presentations.
Working on existing opportunities and challenges and the way
they can be better supported/addressed.
Program
10:00 – 10:15 Welcome
10:15 – 10:45 Linked Data for Knowledge Discovery: the story so far
10:45 – 11:15 Mehwish Alam and Amedeo Napoli, Navigating and
Exploring RDF Data using Formal Concept Analysis
11:15 – 11:30 Coffee Break
11:30 – 12:00 Denis Krompaß and Volker Tresp, Ensemble Solutions for
Link-Prediction in Knowledge Graphs
12:00 – 12:45 Demo session
12:45 – 13:00 Wrap-up and conclusions
LD for KD - KD with LD
A
set of techniques and methods to
extract meaningful information
patterns from
raw
data
LD for KD - KD with LD
A
set of techniques and methods to
extract meaningful information
patterns from
raw
data
A
set of principles and technologies
for sharing and integrating data
through the architecture of the W
eb
Needs a more systematic understanding...
Of the way the properties of the process of KD and of the information source of
LD create new opportunities and challenges for both communities.
Knowledge
Discovery Linked Data
Talking about
communities: ?
Started in LD4KD 2014
http://events.kmi.open.ac.uk/ld4kd2014/
http://goo.gl/NEu1d7
A collaborative document to
share information about
issues, challenges, tools and
methods at the intersection of
Linked Data and Knowledge
Discovery
Opportunities
Linked Data as Input
Large, global, accessible, convenient, multilingual - Separation of data and
process - Easily extended, integrated, enriched.
Link Discovery
Using DM/ML techniques to find connections across disparate datasets
Exploiting links across dataset for richer data, and richer patterns
Can this be done “on the fly”, i.e. within DM/ML process?
Background knowledge to enrich the KDD process
Can Linked Data be part of the bottom arrow in the KDD diagram? A global,
universally accessible knowledge base of almost everything?
Is it too hard?
“RDF and SPARQL - are they really complicated?”
“Not really, but SPARQL is not what ML researchers want to
worry about. Most of us don't even like SQL. Just a CSV file
is the easiest format. It's messy, but we really don't care.”
Challenges
Linked data is a graph
but is this really an issue?
Linked data is a distributed, collaborative graph
accessibility issues, link explosion, termination
need to build the graph on the fly, not all data is known at the start
Linked data is incomplete and biased
And we don’t know the bias - how to evaluate KDD that uses Linked Data?
Linked data is redundant, unbalanced and unreliable
noisy, bad formating (no control), lack of documentation, which ID/source to choose
and what is the impact?
And we haven’t even started talking about...
Mr. Slow
and
Mr. Nosey
Continuing this year...
Start with the practical aspects:
What tools and applications exist, through which we can explore the use of linked
data in KD, and from which we can learn how to solve some of the challenges