Presentation of my work "Using Linked Data Traversal to Label Academic Communities" at the SAVE-SD workshop, co-located with the 24th International World Wide Web Conference at Florence, Italy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
1. Using Linked Data Traversal to
Label Academic Communities
Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta
Knowledge Media Institute, The Open University
2. Motivation
We
• Explain data patterns automatically
• Using Linked Data background knowledge
Scholarly data
• Growing interest and techniques
• Mine and visualise data
• Reveal hidden knowledge
• Forecast
Data interpretation still manual
3. Use-case: Community Detection
Aim
• Detecting communities of research topics
• The Open University papers (ORO1)
Usual text-mining methods
• Groups of similar documents
• Probabilistically extracted topics
• Based on words of co-occurrence
1http://oro.open.ac.uk/
5. Linked Data can help!
• Scholarly data: big portion within Linked Data
• RDF structure (machine understandable)
• Linked datasets
• Across disciplines
• Easier discovery of unrevealed knowledge
• Easier result interpretation
6. Proposition
• Automatic topic detection (labels)
• With Linked Data background knowledge
• Machine Learning approach
• A* search over the Linked Data graph
• Link traversal (vs. literature based on SPARQL)
7. Approach
Document clustering
• text pre-processing (normalise, stem, filter)
• Latent Semantic Analysis space of word vectors
• clustering according to LSA distance
• community : a group of similar words
Communities networking
• connecting clusters’ centroids (the closest one)
• network graph of communities
8. Initial dataset
• Words URIs
• connected to DBpedia
Machine Learning/Logic Programming approach
• Given
• Positive examples E+ : Cluster (words) to label
• Negative examples E-: Words not in E+
• Background Knowledge from Linked Data
• Derive
• Explanations of the grouping for E+ (topic)
Approach
9. Explanation
• RDF property chains
• Leading to the same
entity
• shared by a subset of
initial words
Linked Data Background Knowledge
Topic: many words of the cluster that share the
same explanation
10. Aim: find the explanation shared by the biggest
number of words in the cluster
Linked Data Traversal
e.g. <skos:relatedMatch-dc:subject-skos:broader.db:Creativity>
11. How: A* search to iteratively explore new parts
of the graph and improve the explanation
Linked Data Traversal
<skos:relatedMatch-dc:subject-skos:broader-skos:broader.db:Aesthetics>
12. Ranking explanations according to F-Measure
Take the best explanation and label the cluster
Explanation Evaluation
word outside E+
words
sharing
the
explanation
cluster
(E+)
13. Community Labeling
Examples of topics:
<skos:relatedMatch-dc:subject-skos:broader-skos:broader-skos:broader.db:Geology>
<skos:relatedMatch-dc:subject-skos:broader-skos:broader-skos:broader.db:Chemistry>
<skos:relatedMatch-dc:subject-skos:broader-skos:broader-skos:broader.db:Mathematics>
14. Conclusion and future work
Facilitating data interpretation by combining
• scholarly data
• Machine Learning
• Linked Data graph search
Future work
• improve the graph exploration to discover
more knowledge
• focus on the definition of “explanation”
with this in mindexplanations can be many… / not all of the words are in C+
and outside the cluster
2nd other pb: what if there are better explanation after that better represent the cluster?
the amount of items in the cluster to which the explanation applies
awa the amount of items outside the cluster
one being precision and the latter being recall