Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Finding Co-solvers on Twitter, with the Little Help from Linked Data
1. Finding Co-solvers on Twitter, with a
Little Help from Linked Data
Milan Stankovic, Hypios, Université Paris-Sorbonne, France*
Matthew Rowe, KMi, Open University, UK
Philippe Laublet, Université Paris-Sorbonne, France
2. Outline
• Context
• Problem
• Our Approach
• Evaluation
• Example of use
• Conclusion and questions
3. Context: Innovation on the Web
Academia
Solvers from
Innovation Seekers industry,
research etc.
5. Problem: Find Collaborators
•How to find collaborators that
Innovation Seeker complement the solver’s competence
with regards to the problem
??
?
? •How to find collaborators that are
Problem
compatible with him in terms of
teamwork
solver
6. Problem: Find Collaborators
Complementary Competence
Problem Interest Similarity
Social Similarity
solver
inspired by social studies on team composition, and factors that influence good teamwork
7. Our Approach
profiling >> profile extension >> calculation of similarities >> ranking
Implementation and tests performed using data from Twitter
10. Our Approach: Profiling
• Conceptual Profiles
– users: Zemanta used to extract DBPedia concepts from
textual elements that the user created on twitter (tweets,
bio, etc.). Profiles contain concepts and the frequency of
their occurrence
– problem: Text of the innovation problem treated with
Zemanta to extract concepts
• Social Profiles
– contain all the contacts of a given user on Twitter
• Both types of profiles are in vector form.
• Simple in purpose, to get most topics, not to specialize
for topics of highest expertise
12. Our Our Approach: Profiling
Approach: Profile Extension
• Why extend profiles:
– imperfection of source data
(tweets)
– incompleteness of coverage
(due to difference in vocabulary
some concepts may stay
unnoticed)
– to perform broader/lateral
match
13. Our Our Approach: Profiling
Approach: Profile Extension
• How
– hyProximity (HPSR): a graph
measure using Linked Data
(tested on DBPedia)
– DMSR: distributional measure
inspired by Normalized Google
Distance
– PRF: Pseudo Relevance
Feedback
14. Our Our Approach: Profiling
Approach: Profile Extension
• HSPR (hyProximity)
HPSR(c1,c 2 ) = å ic(K i ) + å link( p,c1,c 2 ) · pond( p,c1 )
K i Î K (c1 ,c 2 ) pÎ P
skos:broader
skos:broader
dct:subject
15. Our Our Approach: Profiling
Approach: Profile Extension
• DMSR – Distributional Measure of Semantic
Relatedness
ocurrence(c1,c 2 )
DMSRτ (c1,c 2 ) =
ocurrence(c1 ) + ocurrence(c 2 )
c1 c2 c16 c18 c32
c1 and c2 more related
c1 c2 c15 c43 c56 then c1 and c3
c1 c3 c4 c10 c13
16. Our Our Approach: Profiling
Approach: Profile Extension
• PRF: Pseudo Relevance Feedback
– Distributional measure based on the profiles
appearing in the n best ranked solutions.
– The same measure of co-occurrence as DMSR,
applied to the set of first 10 suggestions
– This method can be applied with any ranking
technique
18. Our Approach: Similarities
Our Approach: Profiling
Complementarity (Similarity with
difference topics)
Conceptual Similarity (Similarity
of conceptual profiles)
Social Similarity (Similarity of
Social Profiles)
20. Our Approach: Profiling
Ranking
• By one similarity measure
– complementarity
– conceptual similarity
– social similarity
• By a linear combination of measures
a*Comp+b*ConcSim+c*SocSim
• By a product of measures
Comp*ConcSim*SocSim
21. Our Approach: Profiling
Evaluation
• Evaluation 1
– recommending a collaborator to a group of solvers
– a group of 3 solvers (experts in Semantic Web) is
trying to solve 3 cross-disciplinary problems
– problems inspired from real challenges (workshops,
calls for papers, etc.)
• Evaluation 2
– recommending collaborators to individual solvers
– 12 twitter users, experts in Semantic Web look for
collaborators for the same 3 problems
22. Our Approach: Profiling
Evaluation: Metrics
• Discounted Cumulative Gain
– what is the value of considering first 10
suggestions, and what is the quality of their
ordering 10
ratingi
DCG = rating1 + ∑
i =2 log 2 i
• Average Precision
– what is the cumulative benefit of considering each
next suggestion in a particular ranking
25. Our Approach: Profiling
Evaluation 2
• Composite Ranking Functions: Product
– Comp*ConcSim*SocSim
– PRF(Comp*ConcSim*SocSim): PRF problem profile expansion with
composite similarity.
– HSPR(Comp)*ConcSim*SocSim: HPSR expansion performed on difference
topics prior to calculating the complementarity (similarity with difference
topics)
– Comp*DMSR(ConcSim)*SocSim: DMSR expansion performed over the
seed user profile prior to calculating interest similarity.
– HSPR(Comp)*DMSR(ConcSim)*SocSim: composite function in which HPSR
is used to expand profile topics and DMSR to expand seed user topic
profile prior to calculating the similarities.
28. Our Approach: Profiling
Conclusions
• The Linked Data based concept expansion technique
(hyProximity) gives best results when expanding topics for
Compatibility measures. A distributional one works slightly
better for Conceptual Similarity measures.
• In a composite ranking function, expanding profiles with
hyProximity is beneficial if applied only to Compatibility.
Expansion in both Compatibility and Conceptual Similarity has
negative effects.
• All profile expansion techniques, applied individually, have
positive effects in comparisons to direct similarity calculation
with no expansion.
29. Our Approach: Profiling
Take Away
Compatibility Expansion
( , )
Problem
hyProximity
a Linked Data-
based measure
Conceptual Similarity
DMSR
a distributional
measure
30. Our Approach: Profiling
Example
Problem : Semantic Web representation of start-
up history for start-up performance indicators
User: Milan Stankovic (@milstan)
Angel investor specialized
Suggestions: davidsrose in technology statups
fundingpost
ECVentureCapita
BVCA Investors and
Entrepreneurs, Information
vc20 technology
AndySack
CVCACanada
Austin_Startups
tgmtgm Entrepreneur, Social
davidblerner Networks (KLOUT), Metrics