Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»

usage mining techniques
with applications to web search
and content recommendation
Aristides Gionis
Yahoo! Research, Barcelona

yandex aug 31, 2012

yahoo! research, barcelona

web mining
social media and multimedia
large-scale distributed systems
user engagement
semantic web

yandex aug 31, 2012

web mining in yahoo! research

themes
usage mining and query-log mining
social network analysis and graph mining
inﬂuence propagation
other data mining problems
data sources
- query logs (search) and toolbar (browsing)
- social networks (ﬂickr, messenger, email, ...)
- question-answering (answers)
- micro-blogging (twitter)

yandex aug 31, 2012

overview of the talk

query-log mining
query graphs
query recommendations
yahoo! tips
news recommendations using real-time web

yandex aug 31, 2012

query-log mining

yandex aug 31, 2012

query-log mining

search engines collect a large amount of query logs
lots of interesting information
analyzing users’ behavior
creating user proﬁles and personalization
creating knowledge bases and folksonomies
ﬁnding similar concepts
building systems for query recommendations
using statistics for improving systems’ performance
...

yandex aug 31, 2012

the click graph

[Craswell and Szummer, 2007]
yandex aug 31, 2012

applications of the click graph

[Craswell and Szummer, 2007]
query-to-document search
query-to-query suggestion
document-to-query annotation
document-to-document relevance feedback

yandex aug 31, 2012

the query-flow graph

[Boldi et al., 2008]
take into account temporal information
captures the “flow” of how users submit queries
definition:
nodes V = Q ∪ {s, t} the distinct set of queries Q, plus
a starting state s and a terminal state t
edges E ⊆ V × V
weights w (q, q ) representing the probability
that q and q are part of the same chain

yandex aug 31, 2012

building the query-flow graph

an edge (q, q ) if q and q are consecutive in
at least one session
weights w (q, q ) learned by machine learning
features used
textual features: cosine similarity, Jaccard coefficient,
size of intersection, etc.
session features: the number of sessions, the average
session length, the average number of clicks in the
sessions, the average position of the queries in the
sessions, etc. and
time-related features: average time difference, etc.

yandex aug 31, 2012

query-ﬂow graph barcelona fc
website

0.043
barcelona fc
fixtures
0.031

barcelona fc 0.017 real
madrid
0.080
0.011
0.506

0.439
barcelona
hotels 0.072
0.018 cheap
barcelona
0.023
hotels
0.029
<T>

barcelona luxury
0.043 barcelona
0.018
barcelona hotels
weather
0.416

0.523
0.100

barcelona
weather
online
yandex aug 31, 2012

query-ﬂow graph

picture of a funny
cat and dog
picture of a cat

funny dog
cat
funny cat
^ picture of a dog

dog dog for sale $

breed of dog

yandex aug 31, 2012

query recommendations

the general theme:
given an input query q
identify similar queries q
rank them and present them to the user
most query graphs can be used for both tasks:
similarity and ranking

yandex aug 31, 2012

recommendations using the query-ﬂow graph

[Boldi et al., 2008]
perform a random walk on the query-ﬂow graph
teleportation to the submitted query
teleportation to previous queries to take into account
the user history
normalize PageRank score to un-biasing
for very popular queries

yandex aug 31, 2012

example : apple

Max. weight sq sq
ˆ sq
¯
t t apple apple
apple ipod apple apple fruit apple ipod
apple store apple ipod apple ipod apple trailers
apple trailers apple store apple belgium apple store
amazon apple trailers eating apple apple mac
apple mac google apple.nl apple fruit
itunes amazon apple monitor apple usa
pc world argos apple usa apple ipod nano
argos itunes apple jobs apple.com/ipod...

yandex aug 31, 2012

example : banana → apple

banana → apple banana
banana banana
apple eating bugs
usb no banana holiday
banana cs opening a banana
giant chocolate bar banana shoe
where is the seed in fruit banana
anut
banana shoe recipe 22 feb 08
fruit banana banana jules oliver
banana cloths banana cs
eating bugs banana cloths

yandex aug 31, 2012

example : beatles → apple
beatles → apple beatles
beatles beatles
apple scarring
apple ipod paul mcartney
scarring yarns from ireland
srg peppers artwork statutory instrument
A55
ill get you silver beatles tribute
band
bashles beatles mp3
dundee folk songs GHOST’S
the beatles love album ill get you
place lyrics beatles fugees triger ﬁnger
remix

yandex aug 31, 2012

recommendations as shortcuts to qfg

[Anagnostopoulos et al., 2010]

yandex aug 31, 2012

the query-recommendation problem

yandex aug 31, 2012

the recommendation problem

model user behavior as a random walk on qfg
a user starts at query q0 and follows a path p of
reformulations on qfg before terminating
consider a reward function w (q) on the nodes of qfg
goal: “nudge” users in order to maximize their reward

objectives:
1. collect a large reward along the way
2. end the session at a high-reward node

applications: a general problem formulation for suggesting
shortcuts (web graph, social networks, etc.)

yandex aug 31, 2012

probabilistic model

we can only suggest, not order the user
we do not know how the user will act

random walk on qfg is modeled by stochastic matrix P
recommendations R modify P to P = P + R

yandex aug 31, 2012

utility functions

reward function w (q) on queries
- quality of search results, user satisfaction, dwell time,
monetization, etc.

utility function U(p) on paths p = q0 . . . qk−1 T

U(p) = w (q) U(p) = w (qk−1 ),
q∈p

(Cafavy) (Machiavelli)
“road to Ithaca” “end justify the means”

yandex aug 31, 2012

utility

Sum of expected values

1.2
1.0
0.8
0.6
0.4
0.2
0.0

w ρ ρw 1−step heuristic

yandex aug 31, 2012

qfg projections for diverse recommendations

[Bordino et al., 2010]

yandex aug 31, 2012

diverse recommendations

[Bordino et al., 2010]
we want not only relevant and high-quality
recommendations, but also a diverse set
we want recommendations that take to diﬀerent
“directions” in the qfg
need notions of distance of queries in the qfg
use spectral embeddings
project a graph in a low dimensional space, so that
embedding minimizes total edge distortion
ﬁnding diverse recommendations reduces to a geometric
problem

yandex aug 31, 2012

example: time

Spectral projection on 2-hop neighborhood

time time magazine new york times time zone world time what time is it time warner time warner cable
time magazine 0.9953 0.0162 0.1422 0.1049 -0.6071 -0.6056
new york times 0.9953 -0.0051 0.1248 0.0893 -0.6478 -0.6462
time zone 0.0162 -0.0051 0.9903 0.9891 -0.5234 -0.5254
world time 0.1422 0.1248 0.9903 0.9970 -0.6263 -0.6282
what time is it 0.1049 0.0893 0.9891 0.9970 -0.6244 -0.6263
time warner -0.6071 -0.6478 -0.5234 -0.6263 -0.6244 0.9999
time warner cable -0.6056 -0.6462 -0.5254 -0.6282 -0.6263 0.9999

yandex aug 31, 2012

improving recommendation
for long-tail queries via templates

[Szpektor et al., 2011]

yandex aug 31, 2012

motivation

goal: improve coverage of query-recommendation systems
observation: in a typical query log 50 % of query volume
are unique queries [Baeza-Yates et al., 2007]
most query-recommendation systems are based on ﬁnding
queries that co-occur frequently
inherent limitation on using co-occurrences
need to be able to develop methods to reason for rare,
and even previously unseen, queries

yandex aug 31, 2012

overview of the approach

1 generate candidate query-templates for each query
Paris hotels → <city> hotels
Paris hotels → <district> hotels
Moscow hotels → <city> hotels
2 infer transitions between templates
<city> hotels → <city> restaurants
3 infer recommendations for rare queries
Yancheng hotels → Yancheng restaurants

yandex aug 31, 2012

query templates

defined over a hierarchy of entity types
define a global set of templates over the whole query log
do not restrict on specific domains
(such as, travel, weather, or movies)
examples:
jaguar spare parts → <car> spare parts
name for salt → name for <compound>
a thousand miles notes → <song> notes

yandex aug 31, 2012

candidate templates – example
substance

food
drink
dessert instruction

chocolate cookie chocolate cookie recipe

query: chocolate cookie recipe
candidate templates: <food> cookie recipe
<drink> cookie recipe
<food> recipe
<substance> recipe
chocolate cookie <instruction> . . .

yandex aug 31, 2012

ranking candidate templates

ambiguity
Jaguar spare parts → <car> spare parts
Jaguar spare parts → <animal> spare parts
focus
name for salt → name for <compound>
name for salt → <description> for salt
right generalization level
Paris hotels → <capital> hotels
Paris hotels → <city> hotels
Paris hotels → <location> hotels

yandex aug 31, 2012

construction of query templates – details

hierarchy used: WordNet 3.0 hierarchy and Wikipedia
category hierarchy, connected via yago mapping
queries are tokenized, and n-grams are looked up and
mapped to entities in the hierarchy
enriched with heuristic generalizations for <email>,
<url>, numbers, and noun-phrases not in the taxonomy

yandex aug 31, 2012

query-to-template edges

mapping from a query q to its set of templates T (q)
viewed as query-to-template edges
associated edge scores

sqt (q, t) = αd

when t obtained by generalizing q at distance d in H
parameter α set experimentally to 0.9
set sqt (q, q ) = 1, if (q, q ) edge in query-ﬂow graph
normalize so that all sqt (q, ·) sum to 1

yandex aug 31, 2012

template-to-templates edges
reasoning about transitions between templates
<food> recipe → healthy <food> recipe
for templates (t1 , t2 ) deﬁne the support set of query pairs
{(q1 , q2 )}, s.t.
t1 ∈ T (q1 ) and t2 ∈ T (q2 )
t1 and t2 substitute the same token in q1 and q2
(e.g., dosa recipe and healthy dosa recipe)
deﬁne template-to-template edge score as

stt (t1 , t2 ) = sqq (q1 , q2 )
(q1 ,q2 )∈Sup(t1 ,t2 )

normalize so that all stt (t, ·) sum to 1
yandex aug 31, 2012

example – ambiguity
consider query transition:
jaguar transmission → jaguar spare parts
template transition
<car> transmission → <car> spare parts
supported by
bmw transmission → bmw spare parts
audi transmission → audi spare parts
...
template transition
<animal> transmission → <animal> spare parts
will not be supported by
lion transmission → lion spare parts
tiger transmission → tiger spare parts
...
yandex aug 31, 2012

the query-template ﬂow graph

extension of the query-ﬂow graph
superposition of all the concepts we have seen so far:
set of nodes consists of queries and templates
set of edges consists of
query to query edges
query to template edges
template to template edges
associated weights

yandex aug 31, 2012

generating recommendations
s4
q q
s1

s2 s5 q
q t1 t3
s6
s3
t2 s7 t4

r (q, q ) = s1 s4 + s2 s5 + s3 s6 + s3 s7
interpretation: probability of a feasible path
dashed lines do not really exist, but discovered on-the-fly
queries q and q may not have been seen before
transitions in the query-flow graph ranked first
yandex aug 31, 2012

methodology

methods:
query-template ﬂow graph
query-ﬂow graph

evaluation:
inspection a sample of the results
editorial evaluation
automated evaluation

yandex aug 31, 2012

training dataset

queries templates
# nodes 95 279 132 5 382 051 983
# edges 83 513 590 4 345 497 267
avg degree 0.88 0.81
max out-degree 14 145 34 249
(craigslist) (<album>)
max in-degree 14 317 133 874
(youtube) (<institution>)

yandex aug 31, 2012

anecdotal evidence

{“guangzhou flights”, “guangzhou map”}
<capital> flights → <capital> map

{“a thousand miles notes”, “a thousand miles piano notes”}
<single> notes → <single> piano notes

{“8 week old weimaraner”, “8 week old weimaraner puppy”}
8 week old <breed> → 8 week old <breed> puppy

{“aaa office twin falls idaho”, “aaa twin falls idaho”}
aaa office <city> → aaa <city>

{“air force titles”, “air force ranks”}
<military service> titles → <military service> ranks

{“name for salt”, “chemical name for salt”}
name for <compound> → chemical name for <compound>

yandex aug 31, 2012

editorial evaluation
set-A: 300 pairs from each configuration,
recommendation in the top-10
set-B: 100 pairs, same queries in each configuration,
same position
set-C: 100 pairs for which query-flow graph has no
recommendation
editors labeled query-recommendation pairs as:
relevant, not relevant, cannot tell
two editors, 100 common queries, kappa-statistic 0.37
qfg qtfg
set-A 98.48% 97.84%
set-B 97.65% 98.86%
set-C — 94.38%
yandex aug 31, 2012

automated evaluation – guiding principle

extract query pairs {qi , qi+1 } from a testing dataset, such
that user submitted qi+1 after qi in the same session
measure if qi+1 is predicted by our methods, and in which
position
assumption: qi+1 should be relevant and useful for qi

yandex aug 31, 2012

results
qfg qtfg relative increase
pair occurrences
total pairs 3134388 3134388
coverage 22.65 % 28.17 % 24.37 %
# in top-100 16.97 % 25.49 % 50.23 %
# in top-10 9.49 % 20.74 % 118.49 %
# in top-1 2.86 % 10.01 % 249.5 %
MAP 0.050 0.137
avg. position 18.35 8.3
unique pairs
total pairs 2755922 2755922
coverage 13.28 % 19.38 % 45.87 %
# in top-100 12.06 % 17.25 % 42.96 %
# in top-10 8.41 % 13.52 % 60.68 %
# in top-1 2.86 % 6.5 % 127.32 %
MAP 0.047 0.089
yandex avg. position 12.33 9.43 aug 31, 2012

results

20
QFG
18 QTFG
# test-pairs at top-10 (%)

16
14
12
10
8
6
4
2
0
2 4 6 8 10 12 14 16
query length (words)
yandex aug 31, 2012

conclusions

improve coverage of query recommendation systems
recommendations for rare or previously unseen queries
well suited for tail queries
complements rather than replaces existing methods
future work: improve quality of extracted templates

yandex aug 31, 2012

yahoo! tips

[Weber et al., 2011]

yandex aug 31, 2012

motivation

provide answers, not links
identify “how to” queries and provide tips
tip: piece of advice that is
1 short
2 concrete
3 self-contained
4 non-obvious

yandex aug 31, 2012

yahoo! tips

yandex aug 31, 2012

extract tips from yahoo! answers

tip: To tell if your eggs are fresh : place eggs in a bowl/glass
of water.....if it ﬂoats it’s bad. if it sinks it’s good.
yandex aug 31, 2012

system diagram

zest lime without zester

rule-based extraction

250k candidate tips Does query have no show normal
how-to intent? search results
Obtain quality labels for 20k
candidate tip using CrowdFlower yes
machine learning
Are there relevant show normal
22k high quality tips no
high quality tips? search results

yes

rank the matching tips and
display highest ranking one

TIP: To zest a lime if you don‘t have a zester : use a cheese grater

yandex aug 31, 2012

mining tips from yahoo! answers

consider tips of a speciﬁc structure: “X : Y ”
X : goal of the tip
Y : action of the tip
examples
To get the mildew smell out of your towels : try soaking
it in a salt water solution, then washing with soap and
cold water, that tends to get rid of smells
To style your hair without heat, gel or straighteners : try
coconut oil mark k

yandex aug 31, 2012

mining tips from yahoo! answers

english
only literal “how to” queries
answer should start with a verb
consider only best answers
replace I, my, me, myself, etc.
with you, your, you, yourself, etc.

yandex aug 31, 2012

quality ﬁltering

generated 249 675 tips
manually label 20 000 using CrowdFlower
classes: very good (25%), ok (48%), bad (27%)
algorithms
svm (rbf)
decision trees
k-nn (Euclidean, k = 21 . . . 50)
feature families:
18 handcrafted features: e.g., style (Flesch-Kincaid
reading level), sentiment, # urls, emoticons, etc.
content: SVD on the tip×term matrix

yandex aug 31, 2012

quality ﬁltering — machine learning results

Method handcrafted content both
features features
SVM 0.63/0.13 0.60/0.09 0.63/0.16
Hard

Decision Tree 0.67/0.07 0.61/0.06 0.66/0.13
k-NN 0.62/0.23 0.56/0.11 0.63/0.11
SVM 0.95/0.11 0.93/0.05 0.95/0.08
Soft

Decision Tree 0.95/0.03 0.92/0.03 0.94/0.06
k-NN 0.94/0.11 0.91/0.05 0.94/0.05

yandex aug 31, 2012

quality ﬁltering — machine learning results
Category P,R VG size
Beauty & Style 0.53,0.08 0.16 0.08
Business & Finance 0.57,0.20 0.20 0.03
Cars & Transportation 0.64,0.12 0.23 0.03
Computers & Internet 0.69,0.33 0.45 0.15
Consumer Electronics 0.70,0.23 0.38 0.06
Entertainment & Music 0.60,0.39 0.15 0.05
Family & Relationships 0.35,0.05 0.06 0.14
Games & Recreation 0.61,0.31 0.24 0.04
Health 0.62,0.07 0.15 0.09
Home & Garden 0.43,0.06 0.27 0.04
Society & Culture 0.50,0.19 0.09 0.03
Sports 0.68,0.24 0.19 0.03
Yahoo! Products 0.73,0.43 0.45 0.07

yandex aug 31, 2012

detecting “how to” queries

how many? 2-3% of volume, 3-4% of distinct queries
start with “how to” “how do i” or “how can i”
how do you ﬁx keys on a laptop
P: 96-99%, cover: 1.0%
queries start with an action verb
play my music on tool bar raido
P: 7-14%, cover: 3.2%
if exists “how to X” then “X”
craft ideas for boys
P: 87-94%, cover: 1.1%
incoming queries to “how to” web sites
ﬁxing a wet cell phone
P: 61-75%, cover: 0.08%

yandex aug 31, 2012

matching queries to tips

precision–recall trade-oﬀ
index only the “goal” or also “action”
use AND or OR mode for query
require minimum “span” for the goal
ranking
rank by number of query tokens in goal, then tf·idf

yandex aug 31, 2012

matching queries to tips — evaluation

mode min span vol. dist. P@1 median
AND .50 8.7% 2.7% .428/.680 1
AND .66 6.8% 1.8% .557/.770 1
AND 1.0 4.4% 0.8% .625/.835 1
OR .50 87.4% 88.4% .048/.110 18
OR .66 36.8% 36.3% .092/.200 2
OR 1.0 13.5% 10.3% .160/.300 1

yandex aug 31, 2012

future work

mine tips from other recourses
twitter
wikitravel
improve quality of existing system
incorporating more features
improving rule extraction
classiﬁcation

yandex aug 31, 2012

information dissemination in social networks

yandex aug 31, 2012

the information dissemination spectrum

news sites
content-provider sites web search
editorially curated url, images, music,
users browse ...
no speciﬁc info need clear intent

social media (twitter, facebook)
recommendations
(content- or context- or geo-aware)
user-generated content
(blogs, images, q/a)

yandex aug 31, 2012

social media

yandex aug 31, 2012

the information overload problem

yandex aug 31, 2012

social media and user-generated content

paradigm shift from a broadcast one-to-many mechanism
to a many-to-many model
users at the role of information producers

yandex aug 31, 2012

beneﬁts and opportunities

wealth of information of extreme volume and diversity
wisdom of crowd phenomena
accurate proﬁling and personalization
(toolbar, search, clicks)
content- and context- information available
social and geo information available

yandex aug 31, 2012

challenges

heterogeneous sources
high variability in quality
needle-in-the-haystack problems

we want to:
support users to seek, ﬁlter, and disseminate information
build eﬃcient platforms that support social-media
functionalities

yandex aug 31, 2012

personalized news recommendations
by harnessing the real-time web

[De Francisci Morales et al., 2012]

yandex aug 31, 2012

overview

a news recommendation system based on real-time web,
e.g., twitter
suggest news articles to twitter users
infer user preferences from twitter activity

yandex aug 31, 2012

yahoo! news

yandex aug 31, 2012

sources characteristics

news stream
+ high coverage
− sparse and noisy data for user proﬁling
− latency on collecting user feedback
twitter stream
+ much more accurate personalization
+ news spread very fast

yandex aug 31, 2012

otivation
1.2 1.4
news
$+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0

$+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0
twitter 1.2
1 clicks
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2

0 0

-0.2 -0.2
M

M

M

M

M

M

M

M

M

M
ay

ay

ay

ay

ay

ay

ay

ay

ay

ay
-0

-0

-0

-0

-0

-0

-0

-0

-0

-0
10000
1

2

2

2

2

2

2

3

3

3
h2

h0

h0

h0

h1

h1

h2

h0

h0

h0
0

0

4

8

2

6

0

0

4

8
9:;<;'=-1'>;?$1%9*"$10

yandex aug 31, 2012

ke into account recency: new Motivat
pularity45counts of older enti- 1.2

e popularity counts using an
News-click delay

$+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0
":5% 40 1

ails in Section 5.3.1. However,
-% 35
0.8
$8:<"*%+>%+''8**"$'"0

30
dent of 25 recommendation
+405 our 0.6

0.4
n be used.20

15 0.2

for recommending news arti-
10 0

r combination of the scoring
5 -0.2
05
investigate the effect of100non-
0
1 10 1000 10000
Minutes
R"?0V',('-%1",#E%1(09*(<89(+$

yandex aug 31, 2012

challenges

scale to large volumes of news and tweets
high dynamicity of news and tweets
news have short life-cycle
twitter users use jargon language
ﬁnd the right degree of personalization
cope with inactive twitter users

yandex aug 31, 2012

relate users, tweets, and news articles

yandex aug 31, 2012

9:;<;'=-1'>;?$1%9*"$10 @ABC-1'!AD1;?A

T.rex architecture
"*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1

Method
T.Rex
Followee User
User
tweets tweets Model
Π " Personalized
ranked list of
"% Followee
news articles
!
1/5 tweets
twitter
#
tweets
Followee

I- tweets news
articles
R ECE
C LIC
E% S OCI
T.Rex C ON
$%
!"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5 P OPU
yandex aug 31, 2012

recommendation model

Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)

social model
Σ(i, j) social relevance of
news j to user i

content model
Γ(i, j) content relevance
of news j to user i

popularity model
Π(j) popularity model of
news article j
yandex aug 31, 2012

Personalized News Recommendation
popularity update rule
orales Aristides Gionis Claudio Lucche
gionis@yahoo-inc.om claudio.lucchese@isti.c

take into account recency: new Motivation
popularity45counts of older enti- 1.2 1.4

e the popularity counts using an
News-click delay news news

$+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0

$+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0
twitter twitter
%0E09":5% 40 1 clicks
1.2
clicks

details in Section 5.3.1. However,
V*#$-% 35
0.8
1
$8:<"*%+>%+''8**"$'"0

5
,('-%,+405
30
pendent of 25 recommendation
our 0.6 news become stale after two 0.8

0.6

n can be used.
0.4
20

15 0.2
days 0.4

on for recommending news arti-
0.2

10 0

near combination of the scoring
5 -0.2
track mentions in news and 0

-0.2
#*%,+405

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M
to investigate the effect of100non-

ay

ay

ay

ay

ay

ay

ay

ay

ay

ay

ay

ay

ay

ay

ay

a
0
tweets with exponential
-0

-0

-0

-0

-0

-0

-0

-0

-0

-0

-2

-2

-2

-2

-2
1 10 1000 10000

1

2

2

2

2

2

2

3

3

3

2

2

3

3

4
h2

h0

h0

h0

h1

h1

h2

h0

h0

h0

h0

h1

h0

h1

h0
Minutes

0

0

4

8

2

6

0

0

4

8

0

2

0

2

0
R"?0V',('-%1",#E%1(09*(<89(+$ 9:;<;'=-1'>;?$1%9*"$10 @ABC-1'!AD1;?A'9*"$10
#'E% decay
$1%
g Rτ (u, n)). Given the components
',"05 Why Twitter?%%P(:",($"00%#$1%)"*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1%)*"1
news N and a stream of tweets T
mmendation score of a news article
as τ
Method
Z = λZτ −1 + wT HT + wN HN
Model R
· Γτ (u, n) + γ · Πτ (n), T.Rex Alg
Followee User
tweets tweets
User R EC
Model C LI
e relative weight of the components.
del Γ Popularity Model Π " Personalized S OC
ranked list of
0%9@"%'+$9"$9% 6'('7'*'8%?@"*"'6,/0%(0%9@"% Followee
news articles
C ON
r system produces a set of news
*%80"*%2-5 )+)8,#*(9E%+>%$"?0%#*9(',"%1/5 tweets
! P OP
T.R
andidate yandex e.g., the most re-
news, twitter
# aug 31, 2012 T.R

model learning and evaluation

Yahoo! toolbar data
the recommendation model should rank high
news articles that users click
learn the model using SVM
use clicks and twitter proﬁles of 3K users
to train and test the system

yandex aug 31, 2012

systems evaluated

T.rex: basic model using only user proﬁles

T.rex+: additional features
entity hotness
news click count
news article age

yandex aug 31, 2012

0%#%4++1%)*"1('9+*%+>%($9"*"095 $(3.!4)/!5.(/!&!2&!&#-(τ6
results
Results
Table 5.2: MRR, precision and coverage.

Algorithm MRR P@1 P@5 P@10 Coverage
R ECENCY 0.020 0.002 0.018 0.036 1.000
C LICK C OUNT 0.059 0.024 0.086 0.135 1.000
S OCIAL 0.017 0.002 0.018 0.036 0.606
C ONTENT 0.107 0.029 0.171 0.286 0.158
P OPULARITY 0.008 0.003 0.005 0.012 1.000
T.R EX 0.107 0.073 0.130 0.168 1.000
T.R EX+ 0.109 0.062 0.146 0.189 1.000

!"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5

R ECENCY: it ranks news articles by time of publication (most recent ﬁrst);
C LICK C OUNT: it ranks news articles by click count (highest count ﬁrst);
S OCIAL:14 ranks news articles by using T.R EX with β = γ = 0;
it
yandex T.Rex+ aug 31, 2012

results :
R ECENCY it ranks news articles by time of publication (most recent first)
C LICK C OUNT: it ranks news articles by click count (highest count first);
S OCIAL:14 ranks news articles by using T.R EX with β = γ = 0;
it
T.Rex+
C ONTENT: it ranks news articles by using T.R EX with α = γ = 0;
T.Rex
12 Popularity
P OPULARITY: it ranks news articles by using T.R EX with α = β = 0.
Content
Social
10 Recency
5.6.5 Results Click count
Average DCG

8
We report MRR, precision and coverage results in Table 5.6.3. The two
variants of our system, T.R EX and T.R EX+, have the best results overall.
6

T.R EX+ has the highest MRR of all the alternatives. This result means
4
that our model has a good overall performance across the dataset. C ON -
TENT has 2also a very high MRR. Unfortunately, the coverage level achieve
by the C ONTENT strategy is very low. This issue is mainly caused by the
0
sparsity of 1 2 user4 profiles. It is well know 14 15 most 18 19 20 users
the 3 5 6 7 8 9 10 11 12 13 that 16 17 of twitter
belong to the “silent majority,” andRanknot tweet very much.
do
The S OCIAL strategy is affected by the same problem, albeit to a much
63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5
yandex aug 31, 2012

conclusions

real-time web information can be leveraged to deliver
relevant information

future directions

LSI analysis on entities
models for diﬀerent user clusters
georgaphic information

yandex aug 31, 2012

summary

review concepts on query-log mining
answering directly queries with useful tips
challenges and opportunities in information dissemination
news recommendations using real-time web
many nice problems and research opportunities

yandex aug 31, 2012

thank you!

yandex aug 31, 2012

Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»

Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von Yandex

Mehr von Yandex (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»