SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Making Your Interests
Follow You on Twitter
Fabrizio Silvestri, ISTI - CNR, Pisa, Italy
Joint Work with:

Marco Pennacchiotti, eBay Inc., San Jose, USA
Hossein Vahabi, IMT, Lucca, Italy
Rossano Venturini, Dept. of Computer Science, University of Pisa, Italy

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Twitter
Recommendations: Why?
• Social media are popular:
• In January 2012 Twitter has been visited 2.5
billion times, more than double than 6 months
before.
• More than 3,000 tweets per second:
Information Overload?
• Information Hiding

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
What is Twitter, A Social
Network or a News Media?
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media?. In Proceedings of WWW '10.

• “We [...] classify the trending topics based on the
active period and the tweets and show that the
majority (over 85%) of topics are headline or
persistent news in nature.”
• Twitter users want to be notified on interesting (for
them) news as soon as possible.
• What if I do not follow a person retweeting (or
tweeting) a piece of news that is interesting for me?

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Related Work
• At SIGIR 2012 (notification sent but accepted papers not available at the time
of CIKM submission):
• K. Chen, T. Chen, G. Zheng, O. Jin, E. Yao, and Y. Yu. Collaborative personalized tweet
recommendation. In Proceedings of SIGIR '12.

• For them: “The goal of personalized tweet recommendation is to estimate the
value of a tweet for each user.”
• They make the following assumptions:
• “Users’ retweeting actions reflect their personal judgement of
informativeness and usefulness.”
• “Users who have retweeted similar statuses in the past are likely to retweet
similar statuses in the future.”

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Related Work
• Their solution is a collaborative tweet ranking model
integrating topic level features, social relations, and
explicit features. Stochastic gradient descent is
used for parameter estimation.
Pros
•

Within their evaluation schema MAP gets to a
value of 0.7627

• How they evaluate effectiveness?
Cons

For each tweet we need to extract a quite big
set of features (expensive to compute).
• It is more a retweet prediction method rather
than a tweet recommendation method.
• It can suggest the “same” tweet over
and over again.
•

• If a user retweets a tweet it is relevant (1)
otherwise it is not (0).
• P@n and MAP

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Problem Setting
• Let T be a stream of tweets t1,t2,...
• u is a twitter user and we “assume” we know
the interestingness of a tweet ti for u: Iu(ti).
• We define two problems whose goal is to
select S⊆T, such that the overall
interestingness, Iu(S), is maximized.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
TweetRec Problem
• Given a user u and a positive integer k, we aim at
finding a set S of k tweets in T maximizing the overall
“interestingness”. More formally, we would like to find

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
TweetRec Problem
• Pros
• The optimal solution to TweetRec is
discoverable in O(|T| log k).
• Cons

t 1 = “What’s new in Linux 3.2?
#linux”
t 2 = “New features in Linux 3.2.
#linux”

Would u be happy to a tweet
• Independence: “Being interesting forsee both of them in
the “tweet to follow” area?
does not impact on interestingness of other
tweets”.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Interesting-Spanning
TweetRec Problem
• Given a user u and a positive integer k, we aim at
finding the k tweets t in T maximizing the overall
interestingness. More formally, we would like to
identify the set S such that

interestingness for user t∈S in
the shared informative
content among all tweets t∈S
M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Hardness Results
• InterestSpanning TweetRec is NP-Hard
• Reduction from IndSet.
• F(S) is non-negative, monotone, and submodular.
• Theorem [Fisher, Nemhauser, and Wolsey, ’78]
For a non-negative, monotone submodular function F, let S be
a set of size k obtained by selecting elements one at a time,
each time choosing an element that provides the largest
marginal increase in the function value. Let S* be a set that
maximizes the value of F over all k-element sets. Then F(S) ≥
(1 − 1/e)F(S*).
Estimating
Interestingness
• So far we have assumed to know I(S) for each subset S of T.
• Assumption 1: “The interests of a user u are implicitly
expressed in his/her tweets.”
• It is legitimate, then, to aim at computing I(t), i.e., the
interestingness of a tweet t, as a linear combination of two
text-based similarity scores: item-wise vs. pair-wise.
• Assumption 2: “The set of tweets written by a passive user u
can be enriched with other carefully chosen tweets that have
been posted by users that are highly authoritative and
connected to u.”

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Estimation Procedure
Passive
Active
Users

Term
s

{the, math, behind,
this, year, nobel,
prize, in, economics,
gently, and,
beautifully, explained,
mathchat}
Term

Pairs of
Terms

{<the math>, <the
behind>, <the,
this>, ...}

s

Pairs of
Terms

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Experiment Settings
• Corpus of 182,000 tweets posted between Oct 30 and Nov 4, 2011
• A large set of more than 14 million tweets was downloaded from Twitter using the Spritzer API, that provides
access to a 1% random sample of all tweets. This set was then pruned to obtain our final corpus containing
informative and non-junk English tweets on which we run our experiments.

• In details, the pruning process was as follows
• We discarded all the tweets shorter than 30 characters and having less than 8
tokens (i.e., terms, hash- tags, and usernames)
• We removed tweets containing less than 3 English nouns and more than 5 English
stop-words (we used the NLTK7 toolkit).
• Finally, we discarded directed tweets (i.e., tweets starting with the @ symbol), that are
usually personal in nature, and therefore not interesting for recommendation purposes.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Evaluating intλ
• We evaluate (averaged on the 250 users):
• P@k. Precision at rank k is the fraction of correct
tweets in the top-k tweets ranked by the method.
• S@k. Success at rank k is the probability of finding at
least one correct tweet on the top-k ranked ones.
• MRR. Mean of the Reciprocal Rank for each user,
i.e., the inverse of the position of the first correct tweet
in the ranking produced by the method.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Evaluating intλ
• Assumption: “A user is likely to find his own tweets
more interesting than random tweets from other
users.”
• We consider a collection of 250 users. For each
user’s u timeline we extract 90% of her tweets and we
use them to train u’s user model.
• The remaining 10% from all users is used to test the
model. For each user u we consider
“correct/relevant” the tweets from her 10%.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Comparing intλ with
baselines
• Cosine: cosine measure between tweets and user’s profile
• HashTags: cosine between tweets’ hashtags been users’ profile
λ=0.9 has and
• int 0.9 : our intλ metric with λ=0.9 (Pairs optiimzedimportant).
are very on the
training set.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
User Study
• The assessment is conducted by a group of 7 professional assessors that regularly post tweets
(Active users, in our terminology), and a group of 5 professional assessors that are using
Twitter mainly for reading tweets (Passive users).
• For each assessor and each method, we generate the top-20 tweet suggestions. Each
assessor is provided with a random combination of tweets selected by the different methods,
and is asked to state his personal interest on each tweet, using the following scale:
• Excellent: if the tweet is “very interesting/very informative/very funny with respect to his/her
interests”;
• Good: if the tweet is “interesting/informative/funny with respect to his/her interests”;
• Fair: if the tweet is “somehow interesting, but nothing bad if he/she would have skipped it”;
• Bad: if the tweet is “not interesting, and he/she would have preferred not to have it in his/her
timeline.”

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
User Study

(Useful tweets are those judged as E, G, or F)
(Useful tweets are those judged as E, G, or F)

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
List-based Evaluation of
Interest Spanning TweetRec
• We proceed by randomly selecting 15 sets of 20 lists each.
• Assumption: “each list represents a specific user interest.”
• For each list we download 800 tweets.
• We then create 15 virtual users with 8,000 (= 20 × 400)
tweets, one per each set, by selecting 400 tweets per list.
• We use the remaining 400 tweets per list to build a single
virtual stream of tweets. The resulting dataset is a set of 15
virtual users spanning 20 different interests and having
produced 8,000 tweets.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
List-based Evaluation of
Interest Spanning TweetRec

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Interest Spanning TweetRec
Vs.
TweetRec
TweetRec
TweetRec

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Conclusion
• We presented two novel recommendation
methods based on two problems:
• TweetRec and Interest Spanning TweetRec
• We beat the baseline by a large margin in terms
of all the metrics we tested
• The Interest Spanning formulation impacts
positively on more than 30% of users.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Question Time
(Some) More Notation
• Given S⊆T, we want to maximize the overall interestingness
F(S) of the content of tweets t∈S for the user u.

• Within this definition, overall means that duplicate information
across tweets is not going to increase the value of the
objective function t∈S.
• We also define

as the interestingness for user

t∈S in the shared informative content among all tweets t∈S

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Interesting-Spanning
TweetRec is NP-Hard
• Reduction from IndSet.
• We want to decide wether a graph G=(V,E) has an IndSet of size k.
• In our setting we have:
• V=T, |V|=|T|=n
• I(v)=1/n, for each v in V
• For any set S⊆T, we define I(S) = |S|/n iff for each pair of vertexes u,v in S,
(u,v) is not in E; otherwise we define I(S) < |S|/n.
• Given k the set S identified by a solution of Interesting-Spanning TweetRec
has probability k/n if and only if G has an independent set of size k. In this
case, nodes in S are exactly the nodes of one of these independent sets.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Submodularity of F(S)
• We have to show that given A⊆B⊆T and c∈T it holds that
F({c}∪A) - F(A) ≥ F({c}∪B) - F(B)
Theorem [Fisher, Nemhauser,
• By definition of F:
Therefore
and

and Wolsey, ’78]

For a non-negative, monotone submodular function F,
let S be a set of size k obtained by selecting elements
one at a time, each time choosing an element that
provides the largest marginal increase in the function
value. Let S* be a set that maximizes the value of F
over all k-element sets. Then F(S) ≥ (1 − 1/e)F(S*)

The thesis follows by observing that
since A⊆B⊆T.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Item-Wise Similarity

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Pair-Wise Similarity

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Set Similarity

is the bag-of words of tweets in X
and

is the bag-of-pairs of terms

for tweets in X.
M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Enhancing tscore for
Passive Users

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Estimating
authoritativeness
After
•A =
•α =
•β =

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

tuning:
0.5
2
2000

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Optimizing λ

λ = 0.9
pairs are far more
valuable than single
terms.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Manual Selection of
Pairs of Users
• We manually selected 20 pairs of users having very similar interests. The
selection was done by using the Twitter user recommender system.
• For each pair of users we add all the tweets of the second user in the stream; the
task consists to re-retrieving them by using the set of tweets of the first user as
the user profile.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
User Study - III

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012
Future Work
• First, we are interested in improving the precision of our methods by
devising automatic strategies to filter out tweets that are uninteresting
‘status updates’.
• We also plan to improve the selection power of Interests-Spanning
TweetRec by providing tweet deduping at higher levels of semantics (e.g.,
adopting Textual Entailment Recognition techniques).
• Our recommendation system could work in several application scenarios,
e.g., providing to users a more interesting timeline than the one currently
available in Twitter.
• Finally, we plan to carry out simulation trials to test the computational cost
of our methods when facing a real-time load of thousands of tweets per
second.

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

CIKM 2012 - Maui, HI
Tuesday, October
30, 2012

Weitere ähnliche Inhalte

Ähnlich wie Making Your Interests Follow You on Twitter

Altmetrics: An Overview
Altmetrics: An OverviewAltmetrics: An Overview
Altmetrics: An OverviewPallab Pradhan
 
What’s on your mind? A Social Media Workshop
What’s on your mind? A Social Media WorkshopWhat’s on your mind? A Social Media Workshop
What’s on your mind? A Social Media WorkshopWiLS
 
Influencing the MOOC agenda - analysis of #MOOC Twitter Data
Influencing the MOOC agenda - analysis of #MOOC Twitter Data  Influencing the MOOC agenda - analysis of #MOOC Twitter Data
Influencing the MOOC agenda - analysis of #MOOC Twitter Data Mairéad Nic Giolla Mhichíl
 
Sentiment analysis by using fuzzy logic
Sentiment analysis by using fuzzy logicSentiment analysis by using fuzzy logic
Sentiment analysis by using fuzzy logicijcseit
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...ijcseit
 
Sentiment Analysis using Fuzzy logic
Sentiment Analysis using Fuzzy logicSentiment Analysis using Fuzzy logic
Sentiment Analysis using Fuzzy logicVinay Sawant
 
SENTIMENT ANALYSIS BY USING FUZZY LOGIC
SENTIMENT ANALYSIS BY USING FUZZY LOGICSENTIMENT ANALYSIS BY USING FUZZY LOGIC
SENTIMENT ANALYSIS BY USING FUZZY LOGICijcseit
 
Smss boston robert_bochnak
Smss boston  robert_bochnakSmss boston  robert_bochnak
Smss boston robert_bochnakJillian Petrie
 
Twitter: Ego boosting echo chamber or learning tool?
Twitter: Ego boosting echo chamber or learning tool?Twitter: Ego boosting echo chamber or learning tool?
Twitter: Ego boosting echo chamber or learning tool?Dr Muireann O'Keeffe
 
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...Yandex
 
Defining new metrics for library success
Defining new metrics for library successDefining new metrics for library success
Defining new metrics for library successStephen Abram
 
Motivating Participation in Peer Learning Applications
Motivating Participation in Peer Learning ApplicationsMotivating Participation in Peer Learning Applications
Motivating Participation in Peer Learning ApplicationsJulita Vassileva
 
2013 Johns Hopkins School of Public Health Lecture
2013 Johns Hopkins School of Public Health Lecture2013 Johns Hopkins School of Public Health Lecture
2013 Johns Hopkins School of Public Health LectureDouglas Joubert
 

Ähnlich wie Making Your Interests Follow You on Twitter (20)

Altmetrics: An Overview
Altmetrics: An OverviewAltmetrics: An Overview
Altmetrics: An Overview
 
What’s on your mind? A Social Media Workshop
What’s on your mind? A Social Media WorkshopWhat’s on your mind? A Social Media Workshop
What’s on your mind? A Social Media Workshop
 
Influencing the MOOC agenda - analysis of #MOOC Twitter Data
Influencing the MOOC agenda - analysis of #MOOC Twitter Data  Influencing the MOOC agenda - analysis of #MOOC Twitter Data
Influencing the MOOC agenda - analysis of #MOOC Twitter Data
 
Sentiment analysis by using fuzzy logic
Sentiment analysis by using fuzzy logicSentiment analysis by using fuzzy logic
Sentiment analysis by using fuzzy logic
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Sentiment Analysis using Fuzzy logic
Sentiment Analysis using Fuzzy logicSentiment Analysis using Fuzzy logic
Sentiment Analysis using Fuzzy logic
 
SENTIMENT ANALYSIS BY USING FUZZY LOGIC
SENTIMENT ANALYSIS BY USING FUZZY LOGICSENTIMENT ANALYSIS BY USING FUZZY LOGIC
SENTIMENT ANALYSIS BY USING FUZZY LOGIC
 
Weed, Freberg, Kinsky, & Hutchins (2018) Building a Social Learning Flock: Us...
Weed, Freberg, Kinsky, & Hutchins (2018) Building a Social Learning Flock: Us...Weed, Freberg, Kinsky, & Hutchins (2018) Building a Social Learning Flock: Us...
Weed, Freberg, Kinsky, & Hutchins (2018) Building a Social Learning Flock: Us...
 
Social marketing
Social marketingSocial marketing
Social marketing
 
Smss boston robert_bochnak
Smss boston  robert_bochnakSmss boston  robert_bochnak
Smss boston robert_bochnak
 
Give me kudos for taking responsibility for self-marketing my scientific publ...
Give me kudos for taking responsibility for self-marketing my scientific publ...Give me kudos for taking responsibility for self-marketing my scientific publ...
Give me kudos for taking responsibility for self-marketing my scientific publ...
 
Twitter: Ego boosting echo chamber or learning tool?
Twitter: Ego boosting echo chamber or learning tool?Twitter: Ego boosting echo chamber or learning tool?
Twitter: Ego boosting echo chamber or learning tool?
 
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...
 
Defining new metrics for library success
Defining new metrics for library successDefining new metrics for library success
Defining new metrics for library success
 
Motivating Participation in Peer Learning Applications
Motivating Participation in Peer Learning ApplicationsMotivating Participation in Peer Learning Applications
Motivating Participation in Peer Learning Applications
 
Who gives a tweet
Who gives a tweetWho gives a tweet
Who gives a tweet
 
2013 Johns Hopkins School of Public Health Lecture
2013 Johns Hopkins School of Public Health Lecture2013 Johns Hopkins School of Public Health Lecture
2013 Johns Hopkins School of Public Health Lecture
 
32 99-1-pb
32 99-1-pb32 99-1-pb
32 99-1-pb
 
Chapter 6
Chapter 6Chapter 6
Chapter 6
 
Research questions and research objectives
Research questions and research objectivesResearch questions and research objectives
Research questions and research objectives
 

Kürzlich hochgeladen

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 

Kürzlich hochgeladen (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 

Making Your Interests Follow You on Twitter

  • 1. Making Your Interests Follow You on Twitter Fabrizio Silvestri, ISTI - CNR, Pisa, Italy Joint Work with: Marco Pennacchiotti, eBay Inc., San Jose, USA Hossein Vahabi, IMT, Lucca, Italy Rossano Venturini, Dept. of Computer Science, University of Pisa, Italy CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 2. Twitter Recommendations: Why? • Social media are popular: • In January 2012 Twitter has been visited 2.5 billion times, more than double than 6 months before. • More than 3,000 tweets per second: Information Overload? • Information Hiding M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 3. What is Twitter, A Social Network or a News Media? H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media?. In Proceedings of WWW '10. • “We [...] classify the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline or persistent news in nature.” • Twitter users want to be notified on interesting (for them) news as soon as possible. • What if I do not follow a person retweeting (or tweeting) a piece of news that is interesting for me? M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 4. Related Work • At SIGIR 2012 (notification sent but accepted papers not available at the time of CIKM submission): • K. Chen, T. Chen, G. Zheng, O. Jin, E. Yao, and Y. Yu. Collaborative personalized tweet recommendation. In Proceedings of SIGIR '12. • For them: “The goal of personalized tweet recommendation is to estimate the value of a tweet for each user.” • They make the following assumptions: • “Users’ retweeting actions reflect their personal judgement of informativeness and usefulness.” • “Users who have retweeted similar statuses in the past are likely to retweet similar statuses in the future.” M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 5. Related Work • Their solution is a collaborative tweet ranking model integrating topic level features, social relations, and explicit features. Stochastic gradient descent is used for parameter estimation. Pros • Within their evaluation schema MAP gets to a value of 0.7627 • How they evaluate effectiveness? Cons For each tweet we need to extract a quite big set of features (expensive to compute). • It is more a retweet prediction method rather than a tweet recommendation method. • It can suggest the “same” tweet over and over again. • • If a user retweets a tweet it is relevant (1) otherwise it is not (0). • P@n and MAP M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 6. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 7. Problem Setting • Let T be a stream of tweets t1,t2,... • u is a twitter user and we “assume” we know the interestingness of a tweet ti for u: Iu(ti). • We define two problems whose goal is to select S⊆T, such that the overall interestingness, Iu(S), is maximized. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 8. TweetRec Problem • Given a user u and a positive integer k, we aim at finding a set S of k tweets in T maximizing the overall “interestingness”. More formally, we would like to find M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 9. TweetRec Problem • Pros • The optimal solution to TweetRec is discoverable in O(|T| log k). • Cons t 1 = “What’s new in Linux 3.2? #linux” t 2 = “New features in Linux 3.2. #linux” Would u be happy to a tweet • Independence: “Being interesting forsee both of them in the “tweet to follow” area? does not impact on interestingness of other tweets”. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 10. Interesting-Spanning TweetRec Problem • Given a user u and a positive integer k, we aim at finding the k tweets t in T maximizing the overall interestingness. More formally, we would like to identify the set S such that interestingness for user t∈S in the shared informative content among all tweets t∈S M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 11. Hardness Results • InterestSpanning TweetRec is NP-Hard • Reduction from IndSet. • F(S) is non-negative, monotone, and submodular. • Theorem [Fisher, Nemhauser, and Wolsey, ’78] For a non-negative, monotone submodular function F, let S be a set of size k obtained by selecting elements one at a time, each time choosing an element that provides the largest marginal increase in the function value. Let S* be a set that maximizes the value of F over all k-element sets. Then F(S) ≥ (1 − 1/e)F(S*).
  • 12. Estimating Interestingness • So far we have assumed to know I(S) for each subset S of T. • Assumption 1: “The interests of a user u are implicitly expressed in his/her tweets.” • It is legitimate, then, to aim at computing I(t), i.e., the interestingness of a tweet t, as a linear combination of two text-based similarity scores: item-wise vs. pair-wise. • Assumption 2: “The set of tweets written by a passive user u can be enriched with other carefully chosen tweets that have been posted by users that are highly authoritative and connected to u.” M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 13. Estimation Procedure Passive Active Users Term s {the, math, behind, this, year, nobel, prize, in, economics, gently, and, beautifully, explained, mathchat} Term Pairs of Terms {<the math>, <the behind>, <the, this>, ...} s Pairs of Terms M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 14. Experiment Settings • Corpus of 182,000 tweets posted between Oct 30 and Nov 4, 2011 • A large set of more than 14 million tweets was downloaded from Twitter using the Spritzer API, that provides access to a 1% random sample of all tweets. This set was then pruned to obtain our final corpus containing informative and non-junk English tweets on which we run our experiments. • In details, the pruning process was as follows • We discarded all the tweets shorter than 30 characters and having less than 8 tokens (i.e., terms, hash- tags, and usernames) • We removed tweets containing less than 3 English nouns and more than 5 English stop-words (we used the NLTK7 toolkit). • Finally, we discarded directed tweets (i.e., tweets starting with the @ symbol), that are usually personal in nature, and therefore not interesting for recommendation purposes. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 15. Evaluating intλ • We evaluate (averaged on the 250 users): • P@k. Precision at rank k is the fraction of correct tweets in the top-k tweets ranked by the method. • S@k. Success at rank k is the probability of finding at least one correct tweet on the top-k ranked ones. • MRR. Mean of the Reciprocal Rank for each user, i.e., the inverse of the position of the first correct tweet in the ranking produced by the method. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 16. Evaluating intλ • Assumption: “A user is likely to find his own tweets more interesting than random tweets from other users.” • We consider a collection of 250 users. For each user’s u timeline we extract 90% of her tweets and we use them to train u’s user model. • The remaining 10% from all users is used to test the model. For each user u we consider “correct/relevant” the tweets from her 10%. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 17. Comparing intλ with baselines • Cosine: cosine measure between tweets and user’s profile • HashTags: cosine between tweets’ hashtags been users’ profile λ=0.9 has and • int 0.9 : our intλ metric with λ=0.9 (Pairs optiimzedimportant). are very on the training set. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 18. User Study • The assessment is conducted by a group of 7 professional assessors that regularly post tweets (Active users, in our terminology), and a group of 5 professional assessors that are using Twitter mainly for reading tweets (Passive users). • For each assessor and each method, we generate the top-20 tweet suggestions. Each assessor is provided with a random combination of tweets selected by the different methods, and is asked to state his personal interest on each tweet, using the following scale: • Excellent: if the tweet is “very interesting/very informative/very funny with respect to his/her interests”; • Good: if the tweet is “interesting/informative/funny with respect to his/her interests”; • Fair: if the tweet is “somehow interesting, but nothing bad if he/she would have skipped it”; • Bad: if the tweet is “not interesting, and he/she would have preferred not to have it in his/her timeline.” M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 19. User Study (Useful tweets are those judged as E, G, or F) (Useful tweets are those judged as E, G, or F) M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 20. List-based Evaluation of Interest Spanning TweetRec • We proceed by randomly selecting 15 sets of 20 lists each. • Assumption: “each list represents a specific user interest.” • For each list we download 800 tweets. • We then create 15 virtual users with 8,000 (= 20 × 400) tweets, one per each set, by selecting 400 tweets per list. • We use the remaining 400 tweets per list to build a single virtual stream of tweets. The resulting dataset is a set of 15 virtual users spanning 20 different interests and having produced 8,000 tweets. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 21. List-based Evaluation of Interest Spanning TweetRec M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 22. Interest Spanning TweetRec Vs. TweetRec TweetRec TweetRec M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 23. Conclusion • We presented two novel recommendation methods based on two problems: • TweetRec and Interest Spanning TweetRec • We beat the baseline by a large margin in terms of all the metrics we tested • The Interest Spanning formulation impacts positively on more than 30% of users. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 25. (Some) More Notation • Given S⊆T, we want to maximize the overall interestingness F(S) of the content of tweets t∈S for the user u. • Within this definition, overall means that duplicate information across tweets is not going to increase the value of the objective function t∈S. • We also define as the interestingness for user t∈S in the shared informative content among all tweets t∈S M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 26. Interesting-Spanning TweetRec is NP-Hard • Reduction from IndSet. • We want to decide wether a graph G=(V,E) has an IndSet of size k. • In our setting we have: • V=T, |V|=|T|=n • I(v)=1/n, for each v in V • For any set S⊆T, we define I(S) = |S|/n iff for each pair of vertexes u,v in S, (u,v) is not in E; otherwise we define I(S) < |S|/n. • Given k the set S identified by a solution of Interesting-Spanning TweetRec has probability k/n if and only if G has an independent set of size k. In this case, nodes in S are exactly the nodes of one of these independent sets. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 27. Submodularity of F(S) • We have to show that given A⊆B⊆T and c∈T it holds that F({c}∪A) - F(A) ≥ F({c}∪B) - F(B) Theorem [Fisher, Nemhauser, • By definition of F: Therefore and and Wolsey, ’78] For a non-negative, monotone submodular function F, let S be a set of size k obtained by selecting elements one at a time, each time choosing an element that provides the largest marginal increase in the function value. Let S* be a set that maximizes the value of F over all k-element sets. Then F(S) ≥ (1 − 1/e)F(S*) The thesis follows by observing that since A⊆B⊆T. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 28. Item-Wise Similarity M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 29. Pair-Wise Similarity M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 30. Set Similarity is the bag-of words of tweets in X and is the bag-of-pairs of terms for tweets in X. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 31. Enhancing tscore for Passive Users M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 32. Estimating authoritativeness After •A = •α = •β = M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini tuning: 0.5 2 2000 CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 33. Optimizing λ λ = 0.9 pairs are far more valuable than single terms. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 34. Manual Selection of Pairs of Users • We manually selected 20 pairs of users having very similar interests. The selection was done by using the Twitter user recommender system. • For each pair of users we add all the tweets of the second user in the stream; the task consists to re-retrieving them by using the set of tweets of the first user as the user profile. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 35. User Study - III M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012
  • 36. Future Work • First, we are interested in improving the precision of our methods by devising automatic strategies to filter out tweets that are uninteresting ‘status updates’. • We also plan to improve the selection power of Interests-Spanning TweetRec by providing tweet deduping at higher levels of semantics (e.g., adopting Textual Entailment Recognition techniques). • Our recommendation system could work in several application scenarios, e.g., providing to users a more interesting timeline than the one currently available in Twitter. • Finally, we plan to carry out simulation trials to test the computational cost of our methods when facing a real-time load of thousands of tweets per second. M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini CIKM 2012 - Maui, HI Tuesday, October 30, 2012