This paper describes an approach of information retrieval which takes into account social signals associated with Web resources to estimate its relevance to a query. We show how these data, which are in the form of actions within social activities (e.g. like, tweet), can be exploited to quantify social properties such as popularity and reputation. We propose a model that combines the social relevance, estimated from these properties, with the conventional textual relevance. We evaluated the effectiveness of our approach on IMDb dataset containing 32706 resources and their social characteristics collected from several social networks. We used also the selected criteria to learn models to determine their effectiveness in information retrieval. Our experimental results are promising and show the interest of integrating social signals in retrieval model to enhance a search.
3. 1.1 Emergence of social Web
1
Number of active users 2013
1,2 1,4
1,7
2,4
2011 2012 2013 2014
Number of Internet users
Social content per 1 minute
41000 Publications
1,8 Million Like
~350 GB of Data
Facebook
Source:
blogdumoderateur.com
quantcast.com
semiocast.com
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
4. Video
Photo
Web Page
Web Resources
Resource
.
.
.
Social Networks
Bookmark
Comment
Share/Recommend
Motion/Vote
Like/+1
Interaction
Extraction and quantification of
social properties
Information Retrieval Model
(Ranking)
Integration
Query
2
Results
Fig 1. Global presentation of our work
Social Signals
(Source of Evidence)
Popularity
Reputation
Freshness
5. 3
1.2 Example of Social Signals
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
6. 1.3 Research Issues
Can these social data help the search systems for guiding the users to reach a
better quality or more relevant content?
2
How effective is each individual social signal for ranking resources for a
given query? What are the ranking correlations created by these social data?
3
4
How to combine these social data in form of social properties? What are the
most useful of them to take into account in a model search?
4
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
1
What happens when a user clicks on like or dislike button or posts a
comment for a resource, say a Web page, photo or video?
7. Sources of evidence (Social Features) Properties Models Authors
• Number of : clicks, votes, records and
recommendations.
Popularity
Importance
Linear
combination
(Karweg et al., 2011)
• Number of : like, dislike, comments on
YouTube.
• The playcount (number of times a user
listens to a track on lastfm)
Importance
Machine
learning
and
Linear
combination
(Chelaru et al., 2012)
(Khodaei et al. 2012)
• Presence of a URL in a tweet. (Alonso et al., 2010)
• Number of retweets.
• Number of annotations (tags).
Popularity
Machine
learning
(Yang et al., 2012)
(Hong et al., 2011)
(Pantel et al., 2012)
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
2.1 Related Work
5
8. • Our IR approach consists of exploiting various and heterogeneous social
signals from different social networks to define social properties to take into
account in retrieval model. We associate to each Web resource a priori relevance
based on these social properties. This relevance is then combined with a classical
topical relevance.
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
3.1 A Modular Approach for Social IR
6
9. • We assume that resource r can be represented both by a set of textual key-words
𝑟𝑤={𝑤1, 𝑤2, … 𝑤 𝑛} and a set of social actions (signals) performed on this
resource, 𝑟𝑎={𝑎1, 𝑎2, … 𝑎 𝑚}.
• We consider a set X={Popularity, Reputation, Freshness} of 3 social properties
that characterize a resource r. Each property is quantified by a specific actions
group. These properties are considered as a priori knowledge of a resource.
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
3.2 Social Signals and Social Properties
7
Web Resource
- Textual key-words
- Social Signals
- Like
- +1
- Share
- Comment
- Dates of actions
Web Resource
- Textual key-words
- Social Signals
- Like
- +1
- Share
- Comment
- Dates of actions
Reputation
Popularity
Freshness
10. 𝑓𝑥 𝑟, 𝐺 =
𝑖=1, 𝑎 𝑖
𝑥
∈ 𝐴
𝑚
𝐶𝑜𝑢𝑛𝑡 (𝑎𝑖
𝑥
, 𝑟, 𝐺)
3.1 Proposed Approach
• Popularity: The resource popularity can be estimated according to the rate of
sharing this resource on social networks.
• Reputation: The resource reputation can be estimated based on social activities
that have positive meaning such as Facebook like. Indeed, resource reputation
depends on the degree of users' appreciation on social networks.
The general formula is the following:
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
3.3 Estimation of Popularity and Reputation
8
𝑓𝑥(𝑟, 𝐺) 𝑁𝑜𝑟𝑚=
𝑓𝑥 𝑟, 𝐺 − 𝑀𝐼𝑁(𝑓𝑥 𝑟, 𝐺 )
𝑀𝐴𝑋 𝑓𝑥 𝑟, 𝐺 − 𝑀𝐼𝑁(𝑓𝑥 𝑟, 𝐺 )
(1)
(2)
11. 3.1 Proposed Approach
• Let 𝑇𝑎 𝑖
={𝑡1,𝑎 𝑖
, 𝑡2,𝑎 𝑖
, … 𝑡 𝑘,𝑎 𝑖
} a set of k moments (date) at which action 𝑎𝑖 was
produced. A moment t represents the datetime for each action a of the same type.
• Freshness: We assume that a resource is fresh if recent social signals were
associated with it. For that purpose, we define freshness as follows:
"a date of each social action (e.g., date of comment, date of share) performed on a
resource on social networks can be exploited to measure the recency of these social
actions, hence the freshness of information".
Its formula is the following:
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
3.4 Estimation of Freshness
9
𝑓𝐹 𝑟, 𝐺 =
1
1
𝑚 𝑖=1
𝑚
(
1
𝑘 𝑗=1
𝑘
𝑇𝑖𝑚𝑒(𝑡𝑗,𝑎 𝑖
, 𝑟, 𝐺))
(3)
12. 3.1 Proposed Approach
• The combination of topical relevance with social relevance is given by the
following formula:
• Social Score: Regarding the social score 𝑅𝑒𝑙 𝑆(𝑞, 𝑟, 𝐺), we specify that this
score takes into account these social properties, which are in the form of three
normalized factors that are combined linearly by the following formula:
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
Score of Topical
Relevance
Score of Social
Relevance
𝑅𝑒𝑙 𝑞, 𝑟, 𝐺 = α ∙ 𝑅𝑒𝑙 𝑇(𝑞, 𝑟) + (1 - α) ∙ 𝑅𝑒𝑙 𝑆(𝑞, 𝑟, 𝐺)
Freshness
𝑅𝑒𝑙 𝑆 𝑞, 𝑟, 𝐺 = β ∙ 𝑓𝐹(𝑟, 𝐺) + λ ∙ 𝑓𝑃(𝑟, 𝐺) + δ ∙ 𝑓𝑅(𝑟, 𝐺)
Popularity Reputation
3.5 First Method : Linear Combination
10
(4)
(5)
13. 3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
3. Approach of SIR
4. Experimental Results
3.6 Second Method : Machine Learning Models
11
Original
Dataset
Training Dataset
Attribute Selection
Algorithms
- WrapperSubsetEval1
- CfsSubsetEval1
- ReliefFAttributeEval2
- SVMAttributeEval3
Learning Algorithms
- Naïve Bayes1
- J482
- SVM3
Cross-Fold
Evaluation
Repeat 5 x for 5-Fold Cross Validation
Fig 2. Machine Learning Process
Topical model results
for all topics
14. 3.1 Proposed Approach
• Objectives
1. Studying the impact of each individual integration of social signals on the
performance of retrieval process.
2. Studying the impact of combining these social signals as social properties.
3. Studying the ranking correlation between social signals and relevance.
• Evaluation challenge
1. Absence of a standard framework for evaluation in social IR.
2. Collect social signals from 5 social networks and mount experimentation.
1. Introduction 2. Related Work
5. Conclusion
4.1 Experimental Evaluation
12
3. Approach of SIR
4. Experimental Results
15. 3.1 Proposed Approach
• Textual Content: 32706 Documents Film in English extracted from IMDb.
• Social Content: 8 social data from 5 social networks.
1. Introduction 2. Related Work
5. Conclusion
4.2 Description of DataSet
13
3. Approach of SIR
4. Experimental Results
ID Title Year Released Runtime Genre Director Writer Actors Plot Poster url
- indexed indexed indexed indexed indexed indexed indexed indexed indexed - -
ACEBOOK
Like
Share
Comment
Date of last action
WITTER
Tweet
GOOGLE+
+1
Share
LINKEDDELICIOUS
Bookmark
16. 3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
4.3 Quantifying of Social Properties
14
3. Approach of SIR
4. Experimental Results
Social Properties Social Signals Social Networks
Popularity P
Number of « Comment » C1 Facebook
Number of « Tweet » C2 Twitter
Number of « Share » C3 LinkedIn
Number of « Share » C4 Facebook
Reputation R
Number of « Like » C5 Google+
Number of « +1 » C6 Facebook
Number of « Bookmark » C7 Delicious
Freshness F Date of last action C8 Facebook
• Each social property is quantified based on social signals according to their
nature and signification.
17. 3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
4.4 Results: Linear Combination
15
3. Approach of SIR
4. Experimental Results
0
0,1
0,2
0,3
0,4
0,5
0,6
Like Share Comment Tweet Mention+1 Share(LIn) Bookmark
Individual Integration of Social Signals
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
Freshness F Reputation R Popularity P R+F P+F P+R All Properties
Different Combinations of Social Signals (Social Properties)
0
0,1
0,2
0,3
0,4
BM25 Lucene Model
Baselines (Topical Models)
P@10 P@20 nDCG@10 nDCG@20
Facebook signals
18. 3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
4.5 Results: Machine Learning
16
3. Approach of SIR
4. Experimental Results
Table 1. Selected Social Signals With Attribute Selection Algorithms
++ : Highly selected
+ : Moderately selected
20. 3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
4.6 Results: Ranking Correlation Analysis
18
3. Approach of SIR
4. Experimental Results
Fig 3. Spearman correlation between social signals and relevance
Fig 4. Spearman correlation between social properties and relevance
21. 3.1 Proposed Approach
1. Introduction 2. Related Work
5. Conclusion
5. Conclusion
19
3. Proposed Approaches
4. Experimental Results
• Social Information Retrieval Model
- Topical relevance (retrieval model based content only).
- Social relevance (retrieval model based content and social features).
- Attribute selection algorithms and machine learning.
• Experimental Evaluation
- Superiority of proposed approach compared to textual models (baselines).
- Positive ranking correlation between social signals and relevance.
• Perspectives
- Integration of other social features.
- Further study on the impact of the temporal property.
- Comparison of the proposed models with other social models.
- Experimental evaluation on larger dataset.