SlideShare ist ein Scribd-Unternehmen logo
1 von 124
Downloaden Sie, um offline zu lesen
usage mining techniques
         with applications to web search
           and content recommendation
                              Aristides Gionis
                   Yahoo! Research, Barcelona


yandex                              aug 31, 2012
yahoo! research, barcelona



       web mining
       social media and multimedia
       large-scale distributed systems
       user engagement
       semantic web




  yandex                                 aug 31, 2012
web mining in yahoo! research

  themes
      usage mining and query-log mining
      social network analysis and graph mining
      influence propagation
      other data mining problems
  data sources
    - query logs (search) and toolbar (browsing)
    - social networks (flickr, messenger, email, ...)
    - question-answering (answers)
    - micro-blogging (twitter)


  yandex                                               aug 31, 2012
web mining in yahoo! research

  themes
      usage mining and query-log mining
      social network analysis and graph mining
      influence propagation
      other data mining problems
  data sources
    - query logs (search) and toolbar (browsing)
    - social networks (flickr, messenger, email, ...)
    - question-answering (answers)
    - micro-blogging (twitter)


  yandex                                               aug 31, 2012
overview of the talk



       query-log mining
           query graphs
           query recommendations
       yahoo! tips
       news recommendations using real-time web




  yandex                                          aug 31, 2012
query-log mining




yandex           aug 31, 2012
query-log mining


       search engines collect a large amount of query logs
       lots of interesting information
           analyzing users’ behavior
           creating user profiles and personalization
           creating knowledge bases and folksonomies
           finding similar concepts
           building systems for query recommendations
           using statistics for improving systems’ performance
           ...




  yandex                                                 aug 31, 2012
query-log mining


       search engines collect a large amount of query logs
       lots of interesting information
           analyzing users’ behavior
           creating user profiles and personalization
           creating knowledge bases and folksonomies
           finding similar concepts
           building systems for query recommendations
           using statistics for improving systems’ performance
           ...




  yandex                                                 aug 31, 2012
the click graph




  [Craswell and Szummer, 2007]
  yandex                         aug 31, 2012
applications of the click graph



   [Craswell and Szummer, 2007]
        query-to-document search
        query-to-query suggestion
        document-to-query annotation
        document-to-document relevance feedback




   yandex                                         aug 31, 2012
the query-flow graph


  [Boldi et al., 2008]
       take into account temporal information
       captures the “flow” of how users submit queries
       definition:
           nodes V = Q ∪ {s, t} the distinct set of queries Q, plus
           a starting state s and a terminal state t
           edges E ⊆ V × V
           weights w (q, q ) representing the probability
           that q and q are part of the same chain




  yandex                                                 aug 31, 2012
building the query-flow graph


       an edge (q, q ) if q and q are consecutive in
       at least one session
       weights w (q, q ) learned by machine learning
       features used
           textual features: cosine similarity, Jaccard coefficient,
           size of intersection, etc.
           session features: the number of sessions, the average
           session length, the average number of clicks in the
           sessions, the average position of the queries in the
           sessions, etc. and
           time-related features: average time difference, etc.



  yandex                                                   aug 31, 2012
query-flow graph                                                                           barcelona fc
                                                                                            website


                                                                                  0.043
                                                                                          barcelona fc
                                                                                            fixtures
                                                                                  0.031


                                                                   barcelona fc   0.017      real
                                                                                            madrid
                                                0.080
                                                           0.011
                                                                                  0.506


                                                                      0.439
                                               barcelona
                                                 hotels    0.072
                            0.018                                     cheap
                                                                    barcelona
                                       0.023
                                                                      hotels
                                                           0.029
                                                                                             <T>

                           barcelona                                 luxury
                   0.043                                            barcelona
                   0.018
       barcelona                                                      hotels
        weather
                                                           0.416




                                                0.523
                   0.100


                           barcelona
                            weather
                            online
  yandex                                                                                                 aug 31, 2012
query-flow graph

                                                       picture of a funny
                                                          cat and dog
                    picture of a cat

                                                        funny dog
              cat
                    funny cat
 ^                                       picture of a dog


              dog                 dog for sale                              $


                                  breed of dog




     yandex                                                      aug 31, 2012
query recommendations



  the general theme:
       given an input query q
       identify similar queries q
       rank them and present them to the user
       most query graphs can be used for both tasks:
       similarity and ranking




  yandex                                               aug 31, 2012
query recommendations



  the general theme:
       given an input query q
       identify similar queries q
       rank them and present them to the user
       most query graphs can be used for both tasks:
       similarity and ranking




  yandex                                               aug 31, 2012
recommendations using the query-flow graph


  [Boldi et al., 2008]
       perform a random walk on the query-flow graph
       teleportation to the submitted query
       teleportation to previous queries to take into account
       the user history
       normalize PageRank score to un-biasing
       for very popular queries




  yandex                                                aug 31, 2012
example : apple


   Max. weight      sq               sq
                                     ˆ               sq
                                                     ¯
   t                t                apple           apple
   apple ipod       apple            apple fruit     apple ipod
   apple store      apple ipod       apple ipod      apple trailers
   apple trailers   apple store      apple belgium   apple store
   amazon           apple trailers   eating apple    apple mac
   apple mac        google           apple.nl        apple fruit
   itunes           amazon           apple monitor   apple usa
   pc world         argos            apple usa       apple ipod nano
   argos            itunes           apple jobs      apple.com/ipod...




  yandex                                                     aug 31, 2012
example : banana → apple

           banana → apple       banana
           banana               banana
           apple                eating bugs
           usb no               banana holiday
           banana cs            opening a banana
           giant chocolate bar  banana shoe
           where is the seed in fruit banana
           anut
           banana shoe          recipe 22 feb 08
           fruit banana         banana jules oliver
           banana cloths        banana cs
           eating bugs          banana cloths


  yandex                                              aug 31, 2012
example : beatles → apple
           beatles → apple          beatles
           beatles                  beatles
           apple                    scarring
           apple ipod               paul mcartney
           scarring                 yarns from ireland
           srg peppers artwork      statutory    instrument
                                    A55
           ill get you              silver beatles tribute
                                    band
           bashles                  beatles mp3
           dundee folk songs        GHOST’S
           the beatles love album   ill get you
           place lyrics beatles     fugees triger finger
                                    remix

  yandex                                                aug 31, 2012
recommendations as shortcuts to qfg

                    [Anagnostopoulos et al., 2010]




yandex                                  aug 31, 2012
the query-recommendation problem




  yandex                           aug 31, 2012
the query-recommendation problem




  yandex                           aug 31, 2012
the query-recommendation problem




  yandex                           aug 31, 2012
the query-recommendation problem




  yandex                           aug 31, 2012
the recommendation problem

       model user behavior as a random walk on qfg
       a user starts at query q0 and follows a path p of
       reformulations on qfg before terminating
       consider a reward function w (q) on the nodes of qfg
       goal: “nudge” users in order to maximize their reward

       objectives:
    1. collect a large reward along the way
    2. end the session at a high-reward node

       applications: a general problem formulation for suggesting
       shortcuts (web graph, social networks, etc.)

  yandex                                               aug 31, 2012
probabilistic model



       we can only suggest, not order the user
       we do not know how the user will act

       random walk on qfg is modeled by stochastic matrix P
       recommendations R modify P to P = P + R




  yandex                                            aug 31, 2012
utility functions

        reward function w (q) on queries
      - quality of search results, user satisfaction, dwell time,
        monetization, etc.

        utility function U(p) on paths p = q0 . . . qk−1 T


              U(p) =         w (q)           U(p) = w (qk−1 ),
                       q∈p

                 (Cafavy)                     (Machiavelli)
             “road to Ithaca”           “end justify the means”



   yandex                                                    aug 31, 2012
utility

                      Sum of expected values

            1.2
            1.0
            0.8
            0.6
            0.4
            0.2
            0.0




                  w        ρ         ρw    1−step heuristic

   yandex                                                     aug 31, 2012
qfg projections for diverse recommendations

                                   [Bordino et al., 2010]




yandex                                         aug 31, 2012
diverse recommendations

  [Bordino et al., 2010]
      we want not only relevant and high-quality
      recommendations, but also a diverse set
      we want recommendations that take to different
      “directions” in the qfg
      need notions of distance of queries in the qfg
      use spectral embeddings
           project a graph in a low dimensional space, so that
           embedding minimizes total edge distortion
       finding diverse recommendations reduces to a geometric
       problem


  yandex                                                 aug 31, 2012
example: time


       Spectral projection on 2-hop neighborhood

        time        time magazine new york times time zone   world time   what time is it   time warner   time warner cable
  time magazine                       0.9953       0.0162     0.1422         0.1049           -0.6071          -0.6056
  new york times        0.9953                    -0.0051     0.1248         0.0893           -0.6478          -0.6462
     time zone          0.0162       -0.0051                  0.9903         0.9891           -0.5234          -0.5254
     world time         0.1422        0.1248       0.9903                    0.9970           -0.6263          -0.6282
  what time is it       0.1049        0.0893       0.9891      0.9970                         -0.6244          -0.6263
    time warner        -0.6071       -0.6478      -0.5234     -0.6263        -0.6244                            0.9999
time warner cable      -0.6056       -0.6462      -0.5254     -0.6282        -0.6263          0.9999




       yandex                                                                                   aug 31, 2012
improving recommendation
         for long-tail queries via templates

                        [Szpektor et al., 2011]



yandex                               aug 31, 2012
motivation


       goal: improve coverage of query-recommendation systems
       observation: in a typical query log 50 % of query volume
       are unique queries [Baeza-Yates et al., 2007]
       most query-recommendation systems are based on finding
       queries that co-occur frequently
       inherent limitation on using co-occurrences
       need to be able to develop methods to reason for rare,
       and even previously unseen, queries




  yandex                                             aug 31, 2012
overview of the approach

    1   generate candidate query-templates for each query
            Paris hotels → <city> hotels
            Paris hotels → <district> hotels
            Moscow hotels → <city> hotels
    2   infer transitions between templates
            <city> hotels → <city> restaurants
    3   infer recommendations for rare queries
            Yancheng hotels → Yancheng restaurants



  yandex                                               aug 31, 2012
overview of the approach

    1   generate candidate query-templates for each query
            Paris hotels → <city> hotels
            Paris hotels → <district> hotels
            Moscow hotels → <city> hotels
    2   infer transitions between templates
            <city> hotels → <city> restaurants
    3   infer recommendations for rare queries
            Yancheng hotels → Yancheng restaurants



  yandex                                               aug 31, 2012
overview of the approach

    1   generate candidate query-templates for each query
            Paris hotels → <city> hotels
            Paris hotels → <district> hotels
            Moscow hotels → <city> hotels
    2   infer transitions between templates
            <city> hotels → <city> restaurants
    3   infer recommendations for rare queries
            Yancheng hotels → Yancheng restaurants



  yandex                                               aug 31, 2012
overview of the approach

    1   generate candidate query-templates for each query
            Paris hotels → <city> hotels
            Paris hotels → <district> hotels
            Moscow hotels → <city> hotels
    2   infer transitions between templates
            <city> hotels → <city> restaurants
    3   infer recommendations for rare queries
            Yancheng hotels → Yancheng restaurants



  yandex                                               aug 31, 2012
overview of the approach

    1   generate candidate query-templates for each query
            Paris hotels → <city> hotels
            Paris hotels → <district> hotels
            Moscow hotels → <city> hotels
    2   infer transitions between templates
            <city> hotels → <city> restaurants
    3   infer recommendations for rare queries
            Yancheng hotels → Yancheng restaurants



  yandex                                               aug 31, 2012
overview of the approach

    1   generate candidate query-templates for each query
            Paris hotels → <city> hotels
            Paris hotels → <district> hotels
            Moscow hotels → <city> hotels
    2   infer transitions between templates
            <city> hotels → <city> restaurants
    3   infer recommendations for rare queries
            Yancheng hotels → Yancheng restaurants



  yandex                                               aug 31, 2012
query templates


       defined over a hierarchy of entity types
       define a global set of templates over the whole query log
       do not restrict on specific domains
       (such as, travel, weather, or movies)
       examples:
           jaguar spare parts → <car> spare parts
           name for salt → name for <compound>
           a thousand miles notes → <song> notes




  yandex                                              aug 31, 2012
query templates


       defined over a hierarchy of entity types
       define a global set of templates over the whole query log
       do not restrict on specific domains
       (such as, travel, weather, or movies)
       examples:
           jaguar spare parts → <car> spare parts
           name for salt → name for <compound>
           a thousand miles notes → <song> notes




  yandex                                              aug 31, 2012
candidate templates – example
           substance

                   food
     drink
                                dessert               instruction



      chocolate        cookie      chocolate cookie     recipe

       query: chocolate cookie recipe
       candidate templates: <food> cookie recipe
                            <drink> cookie recipe
                            <food> recipe
                            <substance> recipe
                            chocolate cookie <instruction> . . .

  yandex                                                   aug 31, 2012
candidate templates – example
           substance

                   food
     drink
                                dessert               instruction



      chocolate        cookie      chocolate cookie     recipe

       query: chocolate cookie recipe
       candidate templates: <food> cookie recipe
                            <drink> cookie recipe
                            <food> recipe
                            <substance> recipe
                            chocolate cookie <instruction> . . .

  yandex                                                   aug 31, 2012
candidate templates – example
           substance

                   food
     drink
                                dessert               instruction



      chocolate        cookie      chocolate cookie     recipe

       query: chocolate cookie recipe
       candidate templates: <food> cookie recipe
                            <drink> cookie recipe
                            <food> recipe
                            <substance> recipe
                            chocolate cookie <instruction> . . .

  yandex                                                   aug 31, 2012
ranking candidate templates

       ambiguity
       Jaguar spare parts → <car> spare parts
       Jaguar spare parts → <animal> spare parts
       focus
       name for salt → name for <compound>
       name for salt → <description> for salt
       right generalization level
       Paris hotels → <capital> hotels
       Paris hotels → <city> hotels
       Paris hotels → <location> hotels


  yandex                                        aug 31, 2012
ranking candidate templates

       ambiguity
       Jaguar spare parts → <car> spare parts
       Jaguar spare parts → <animal> spare parts
       focus
       name for salt → name for <compound>
       name for salt → <description> for salt
       right generalization level
       Paris hotels → <capital> hotels
       Paris hotels → <city> hotels
       Paris hotels → <location> hotels


  yandex                                        aug 31, 2012
ranking candidate templates

       ambiguity
       Jaguar spare parts → <car> spare parts
       Jaguar spare parts → <animal> spare parts
       focus
       name for salt → name for <compound>
       name for salt → <description> for salt
       right generalization level
       Paris hotels → <capital> hotels
       Paris hotels → <city> hotels
       Paris hotels → <location> hotels


  yandex                                        aug 31, 2012
construction of query templates – details



       hierarchy used: WordNet 3.0 hierarchy and Wikipedia
       category hierarchy, connected via yago mapping
       queries are tokenized, and n-grams are looked up and
       mapped to entities in the hierarchy
       enriched with heuristic generalizations for <email>,
       <url>, numbers, and noun-phrases not in the taxonomy




  yandex                                           aug 31, 2012
query-to-template edges


       mapping from a query q to its set of templates T (q)
       viewed as query-to-template edges
       associated edge scores

                             sqt (q, t) = αd

        when t obtained by generalizing q at distance d in H
       parameter α set experimentally to 0.9
       set sqt (q, q ) = 1, if (q, q ) edge in query-flow graph
       normalize so that all sqt (q, ·) sum to 1



  yandex                                                aug 31, 2012
template-to-templates edges
       reasoning about transitions between templates
       <food> recipe → healthy <food> recipe
       for templates (t1 , t2 ) define the support set of query pairs
       {(q1 , q2 )}, s.t.
            t1 ∈ T (q1 ) and t2 ∈ T (q2 )
            t1 and t2 substitute the same token in q1 and q2
       (e.g., dosa recipe and healthy dosa recipe)
       define template-to-template edge score as

                  stt (t1 , t2 ) =                            sqq (q1 , q2 )
                                     (q1 ,q2 )∈Sup(t1 ,t2 )



       normalize so that all stt (t, ·) sum to 1
  yandex                                                                       aug 31, 2012
example – ambiguity
       consider query transition:
       jaguar transmission → jaguar spare parts
       template transition
       <car> transmission → <car> spare parts
       supported by
       bmw transmission → bmw spare parts
       audi transmission → audi spare parts
       ...
       template transition
       <animal> transmission → <animal> spare parts
       will not be supported by
       lion transmission → lion spare parts
       tiger transmission → tiger spare parts
       ...
  yandex                                          aug 31, 2012
example – ambiguity
       consider query transition:
       jaguar transmission → jaguar spare parts
       template transition
       <car> transmission → <car> spare parts
       supported by
       bmw transmission → bmw spare parts
       audi transmission → audi spare parts
       ...
       template transition
       <animal> transmission → <animal> spare parts
       will not be supported by
       lion transmission → lion spare parts
       tiger transmission → tiger spare parts
       ...
  yandex                                          aug 31, 2012
the query-template flow graph


       extension of the query-flow graph
       superposition of all the concepts we have seen so far:
       set of nodes consists of queries and templates
       set of edges consists of
           query to query edges
           query to template edges
           template to template edges
       associated weights




  yandex                                                aug 31, 2012
generating recommendations
                                s4
                           q              q
                s1

                  s2                 s5                   q
           q               t1             t3
                                s6
                 s3
                           t2        s7   t4


       r (q, q ) = s1 s4 + s2 s5 + s3 s6 + s3 s7
       interpretation: probability of a feasible path
       dashed lines do not really exist, but discovered on-the-fly
       queries q and q may not have been seen before
       transitions in the query-flow graph ranked first
  yandex                                                aug 31, 2012
methodology


  methods:
      query-template flow graph
      query-flow graph

  evaluation:
       inspection a sample of the results
       editorial evaluation
       automated evaluation




  yandex                                    aug 31, 2012
training dataset


                           queries       templates
       # nodes             95 279 132     5 382 051 983
       # edges             83 513 590     4 345 497 267
       avg degree                0.88              0.81
       max out-degree          14 145            34 249
                        (craigslist)         (<album>)
       max in-degree           14 317           133 874
                           (youtube) (<institution>)




  yandex                                          aug 31, 2012
anecdotal evidence

   {“guangzhou flights”, “guangzhou map”}
   <capital> flights → <capital> map

   {“a thousand miles notes”, “a thousand miles piano notes”}
   <single> notes → <single> piano notes

   {“8 week old weimaraner”, “8 week old weimaraner puppy”}
   8 week old <breed> → 8 week old <breed> puppy

   {“aaa office twin falls idaho”, “aaa twin falls idaho”}
   aaa office <city> → aaa <city>

   {“air force titles”, “air force ranks”}
   <military service> titles → <military service> ranks

   {“name for salt”, “chemical name for salt”}
   name for <compound> → chemical name for <compound>

  yandex                                            aug 31, 2012
editorial evaluation
        set-A: 300 pairs from each configuration,
        recommendation in the top-10
        set-B: 100 pairs, same queries in each configuration,
        same position
        set-C: 100 pairs for which query-flow graph has no
        recommendation
        editors labeled query-recommendation pairs as:
        relevant, not relevant, cannot tell
        two editors, 100 common queries, kappa-statistic 0.37
                            qfg    qtfg
                     set-A 98.48% 97.84%
                     set-B 97.65% 98.86%
                     set-C   —    94.38%
   yandex                                               aug 31, 2012
automated evaluation – guiding principle



       extract query pairs {qi , qi+1 } from a testing dataset, such
       that user submitted qi+1 after qi in the same session
       measure if qi+1 is predicted by our methods, and in which
       position
       assumption: qi+1 should be relevant and useful for qi




  yandex                                                  aug 31, 2012
results
                              qfg       qtfg      relative increase
                               pair occurrences
            total pairs     3134388 3134388
            coverage        22.65 % 28.17 %               24.37 %
            # in top-100    16.97 % 25.49 %               50.23 %
            # in top-10      9.49 % 20.74 %              118.49 %
            # in top-1       2.86 % 10.01 %               249.5 %
            MAP                0.050      0.137
            avg. position      18.35        8.3
                                 unique pairs
            total pairs     2755922 2755922
            coverage        13.28 % 19.38 %               45.87 %
            # in top-100    12.06 % 17.25 %               42.96 %
            # in top-10      8.41 % 13.52 %               60.68 %
            # in top-1       2.86 %       6.5 %          127.32 %
            MAP                0.047      0.089
   yandex   avg. position      12.33       9.43                aug 31, 2012
results

                                  20
                                                                 QFG
                                  18                            QTFG
     # test-pairs at top-10 (%)


                                  16
                                  14
                                  12
                                  10
                                   8
                                   6
                                  4
                                  2
                                  0
                                       2   4     6      8    10     12   14    16
                                               query length (words)
   yandex                                                                aug 31, 2012
conclusions



       improve coverage of query recommendation systems
       recommendations for rare or previously unseen queries
       well suited for tail queries
       complements rather than replaces existing methods
       future work: improve quality of extracted templates




  yandex                                              aug 31, 2012
yahoo! tips

         [Weber et al., 2011]




yandex             aug 31, 2012
motivation



       provide answers, not links
       identify “how to” queries and provide tips
       tip: piece of advice that is
           1   short
           2   concrete
           3   self-contained
           4   non-obvious




  yandex                                            aug 31, 2012
yahoo! tips




  yandex      aug 31, 2012
yahoo! tips




  yandex      aug 31, 2012
yahoo! tips




  yandex      aug 31, 2012
yahoo! tips




  yandex      aug 31, 2012
extract tips from yahoo! answers




   tip: To tell if your eggs are fresh : place eggs in a bowl/glass
         of water.....if it floats it’s bad. if it sinks it’s good.
  yandex                                                 aug 31, 2012
system diagram

                                            zest lime without zester


           rule-based extraction

           250k candidate tips                       Does query have          no       show normal
                                                      how-to intent?                   search results
        Obtain quality labels for 20k
      candidate tip using CrowdFlower                      yes
             machine learning
                                                   Are there relevant                  show normal
           22k high quality tips                                               no
                                                   high quality tips?                  search results


                                                           yes

                                              rank the matching tips and
                                              display highest ranking one


                           TIP: To zest a lime if you don‘t have a zester : use a cheese grater



  yandex                                                                                     aug 31, 2012
mining tips from yahoo! answers


       consider tips of a specific structure: “X : Y ”
       X : goal of the tip
       Y : action of the tip
       examples
           To get the mildew smell out of your towels : try soaking
           it in a salt water solution, then washing with soap and
           cold water, that tends to get rid of smells
           To style your hair without heat, gel or straighteners : try
           coconut oil mark k




  yandex                                                   aug 31, 2012
mining tips from yahoo! answers



       english
       only literal “how to” queries
       answer should start with a verb
       consider only best answers
       replace I, my, me, myself, etc.
       with you, your, you, yourself, etc.




  yandex                                     aug 31, 2012
quality filtering

        generated 249 675 tips
        manually label 20 000 using CrowdFlower
        classes: very good (25%), ok (48%), bad (27%)
        algorithms
             svm (rbf)
             decision trees
             k-nn (Euclidean, k = 21 . . . 50)
        feature families:
             18 handcrafted features: e.g., style (Flesch-Kincaid
             reading level), sentiment, # urls, emoticons, etc.
             content: SVD on the tip×term matrix



   yandex                                                   aug 31, 2012
quality filtering

        generated 249 675 tips
        manually label 20 000 using CrowdFlower
        classes: very good (25%), ok (48%), bad (27%)
        algorithms
             svm (rbf)
             decision trees
             k-nn (Euclidean, k = 21 . . . 50)
        feature families:
             18 handcrafted features: e.g., style (Flesch-Kincaid
             reading level), sentiment, # urls, emoticons, etc.
             content: SVD on the tip×term matrix



   yandex                                                   aug 31, 2012
quality filtering

        generated 249 675 tips
        manually label 20 000 using CrowdFlower
        classes: very good (25%), ok (48%), bad (27%)
        algorithms
             svm (rbf)
             decision trees
             k-nn (Euclidean, k = 21 . . . 50)
        feature families:
             18 handcrafted features: e.g., style (Flesch-Kincaid
             reading level), sentiment, # urls, emoticons, etc.
             content: SVD on the tip×term matrix



   yandex                                                   aug 31, 2012
quality filtering — machine learning results


            Method        handcrafted    content      both
                            features     features
            SVM            0.63/0.13    0.60/0.09   0.63/0.16
     Hard




            Decision Tree 0.67/0.07     0.61/0.06   0.66/0.13
            k-NN           0.62/0.23    0.56/0.11   0.63/0.11
            SVM            0.95/0.11    0.93/0.05   0.95/0.08
     Soft




            Decision Tree 0.95/0.03     0.92/0.03   0.94/0.06
            k-NN           0.94/0.11    0.91/0.05   0.94/0.05




  yandex                                               aug 31, 2012
quality filtering — machine learning results
           Category                    P,R      VG     size
           Beauty & Style           0.53,0.08   0.16   0.08
           Business & Finance       0.57,0.20   0.20   0.03
           Cars & Transportation    0.64,0.12   0.23   0.03
           Computers & Internet     0.69,0.33   0.45   0.15
           Consumer Electronics     0.70,0.23   0.38   0.06
           Entertainment & Music    0.60,0.39   0.15   0.05
           Family & Relationships   0.35,0.05   0.06   0.14
           Games & Recreation       0.61,0.31   0.24   0.04
           Health                   0.62,0.07   0.15   0.09
           Home & Garden            0.43,0.06   0.27   0.04
           Society & Culture        0.50,0.19   0.09   0.03
           Sports                   0.68,0.24   0.19   0.03
           Yahoo! Products          0.73,0.43   0.45   0.07

  yandex                                                 aug 31, 2012
detecting “how to” queries

       how many? 2-3% of volume, 3-4% of distinct queries
       start with “how to” “how do i” or “how can i”
           how do you fix keys on a laptop
           P: 96-99%, cover: 1.0%
       queries start with an action verb
           play my music on tool bar raido
           P: 7-14%, cover: 3.2%
       if exists “how to X” then “X”
           craft ideas for boys
           P: 87-94%, cover: 1.1%
       incoming queries to “how to” web sites
           fixing a wet cell phone
           P: 61-75%, cover: 0.08%

  yandex                                            aug 31, 2012
detecting “how to” queries

       how many? 2-3% of volume, 3-4% of distinct queries
       start with “how to” “how do i” or “how can i”
           how do you fix keys on a laptop
           P: 96-99%, cover: 1.0%
       queries start with an action verb
           play my music on tool bar raido
           P: 7-14%, cover: 3.2%
       if exists “how to X” then “X”
           craft ideas for boys
           P: 87-94%, cover: 1.1%
       incoming queries to “how to” web sites
           fixing a wet cell phone
           P: 61-75%, cover: 0.08%

  yandex                                            aug 31, 2012
detecting “how to” queries

       how many? 2-3% of volume, 3-4% of distinct queries
       start with “how to” “how do i” or “how can i”
           how do you fix keys on a laptop
           P: 96-99%, cover: 1.0%
       queries start with an action verb
           play my music on tool bar raido
           P: 7-14%, cover: 3.2%
       if exists “how to X” then “X”
           craft ideas for boys
           P: 87-94%, cover: 1.1%
       incoming queries to “how to” web sites
           fixing a wet cell phone
           P: 61-75%, cover: 0.08%

  yandex                                            aug 31, 2012
detecting “how to” queries

       how many? 2-3% of volume, 3-4% of distinct queries
       start with “how to” “how do i” or “how can i”
           how do you fix keys on a laptop
           P: 96-99%, cover: 1.0%
       queries start with an action verb
           play my music on tool bar raido
           P: 7-14%, cover: 3.2%
       if exists “how to X” then “X”
           craft ideas for boys
           P: 87-94%, cover: 1.1%
       incoming queries to “how to” web sites
           fixing a wet cell phone
           P: 61-75%, cover: 0.08%

  yandex                                            aug 31, 2012
detecting “how to” queries

       how many? 2-3% of volume, 3-4% of distinct queries
       start with “how to” “how do i” or “how can i”
           how do you fix keys on a laptop
           P: 96-99%, cover: 1.0%
       queries start with an action verb
           play my music on tool bar raido
           P: 7-14%, cover: 3.2%
       if exists “how to X” then “X”
           craft ideas for boys
           P: 87-94%, cover: 1.1%
       incoming queries to “how to” web sites
           fixing a wet cell phone
           P: 61-75%, cover: 0.08%

  yandex                                            aug 31, 2012
matching queries to tips



       precision–recall trade-off
           index only the “goal” or also “action”
           use AND or OR mode for query
           require minimum “span” for the goal
       ranking
           rank by number of query tokens in goal, then tf·idf




  yandex                                               aug 31, 2012
matching queries to tips — evaluation



      mode   min span   vol. dist.   P@1    median
      AND      .50     8.7% 2.7% .428/.680    1
      AND      .66     6.8% 1.8% .557/.770    1
      AND      1.0     4.4% 0.8% .625/.835    1
       OR      .50    87.4% 88.4% .048/.110  18
       OR      .66    36.8% 36.3% .092/.200   2
       OR      1.0    13.5% 10.3% .160/.300   1




  yandex                                     aug 31, 2012
future work



       mine tips from other recourses
           twitter
           wikitravel
       improve quality of existing system
           incorporating more features
           improving rule extraction
           classification




  yandex                                    aug 31, 2012
information dissemination in social networks




yandex                                       aug 31, 2012
the information dissemination spectrum

   news sites
   content-provider sites                  web search
   editorially curated                     url, images, music,
   users browse                            ...
   no specific info need                    clear intent

               social media (twitter, facebook)
               recommendations
               (content- or context- or geo-aware)
               user-generated content
               (blogs, images, q/a)



  yandex                                             aug 31, 2012
the information dissemination spectrum

   news sites
   content-provider sites                  web search
   editorially curated                     url, images, music,
   users browse                            ...
   no specific info need                    clear intent

               social media (twitter, facebook)
               recommendations
               (content- or context- or geo-aware)
               user-generated content
               (blogs, images, q/a)



  yandex                                             aug 31, 2012
the information dissemination spectrum

   news sites
   content-provider sites                  web search
   editorially curated                     url, images, music,
   users browse                            ...
   no specific info need                    clear intent

               social media (twitter, facebook)
               recommendations
               (content- or context- or geo-aware)
               user-generated content
               (blogs, images, q/a)



  yandex                                             aug 31, 2012
social media




  yandex       aug 31, 2012
the information overload problem




  yandex                           aug 31, 2012
social media and user-generated content




       paradigm shift from a broadcast one-to-many mechanism
       to a many-to-many model
       users at the role of information producers




  yandex                                           aug 31, 2012
benefits and opportunities



       wealth of information of extreme volume and diversity
       wisdom of crowd phenomena
       accurate profiling and personalization
       (toolbar, search, clicks)
       content- and context- information available
       social and geo information available




  yandex                                              aug 31, 2012
challenges


       heterogeneous sources
       high variability in quality
       needle-in-the-haystack problems

  we want to:
      support users to seek, filter, and disseminate information
      build efficient platforms that support social-media
      functionalities




  yandex                                              aug 31, 2012
challenges


       heterogeneous sources
       high variability in quality
       needle-in-the-haystack problems

  we want to:
      support users to seek, filter, and disseminate information
      build efficient platforms that support social-media
      functionalities




  yandex                                              aug 31, 2012
personalized news recommendations
             by harnessing the real-time web

                [De Francisci Morales et al., 2012]



yandex                                   aug 31, 2012
overview




       a news recommendation system based on real-time web,
       e.g., twitter
       suggest news articles to twitter users
       infer user preferences from twitter activity




  yandex                                           aug 31, 2012
yahoo! news




  yandex      aug 31, 2012
yahoo! news




  yandex      aug 31, 2012
yahoo! news




  yandex      aug 31, 2012
sources characteristics


   news stream
    + high coverage
    − sparse and noisy data for user profiling
    − latency on collecting user feedback
   twitter stream
     + much more accurate personalization
     + news spread very fast




   yandex                                       aug 31, 2012
otivation
                                              1.2                                                                                                                                 1.4
                                                          news
           $+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0




                                                                                                                                               $+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0
                                                         twitter                                                                                                                  1.2
                                                1         clicks
                                                                                                                                                                                    1
                                              0.8
                                                                                                                                                                                  0.8
                                              0.6
                                                                                                                                                                                  0.6
                                              0.4
                                                                                                                                                                                  0.4
                                              0.2
                                                                                                                                                                                  0.2

                                                0                                                                                                                                   0

                                              -0.2                                                                                                                                -0.2
                                                     M

                                                               M

                                                                       M

                                                                                M

                                                                                        M

                                                                                                M

                                                                                                        M

                                                                                                                M

                                                                                                                        M

                                                                                                                                M
                                                     ay

                                                               ay

                                                                       ay

                                                                                ay

                                                                                        ay

                                                                                                ay

                                                                                                        ay

                                                                                                                ay

                                                                                                                        ay

                                                                                                                                 ay
                                                       -0

                                                                 -0

                                                                         -0

                                                                                  -0

                                                                                          -0

                                                                                                  -0

                                                                                                          -0

                                                                                                                  -0

                                                                                                                          -0

                                                                                                                                   -0
10000
                                                           1

                                                                   2

                                                                            2

                                                                                    2

                                                                                            2

                                                                                                    2

                                                                                                            2

                                                                                                                    3

                                                                                                                            3

                                                                                                                                    3
                                                            h2

                                                                      h0

                                                                              h0

                                                                                       h0

                                                                                               h1

                                                                                                       h1

                                                                                                               h2

                                                                                                                       h0

                                                                                                                               h0

                                                                                                                                        h0
                                                               0

                                                                       0

                                                                                4

                                                                                        8

                                                                                                2

                                                                                                        6

                                                                                                                0

                                                                                                                        0

                                                                                                                                4

                                                                                                                                         8
                                                                           9:;<;'=-1'>;?$1%9*"$10

        yandex                                                                                                                  aug 31, 2012
ke into account recency: new                                            Motivat
 pularity45counts of older enti-                                                                                        1.2

e popularity counts using an
               News-click delay




                                                                                     $+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0
":5%      40                                                                                                              1

ails in Section 5.3.1. However,
-%        35
                                                                                                                        0.8
         $8:<"*%+>%+''8**"$'"0




          30
 dent of 25 recommendation
 +405     our                                                                                                           0.6

                                                                                                                        0.4
n be used.20

          15                                                                                                            0.2

for recommending news arti-
          10                                                                                                              0

  r combination of the scoring
           5                                                                                                            -0.2
 05
 investigate the effect of100non-
           0
             1           10                                      1000       10000
                                             Minutes
                                 R"?0V',('-%1",#E%1(09*(<89(+$

     yandex                                                               aug 31, 2012
yandex   aug 31, 2012
challenges



       scale to large volumes of news and tweets
       high dynamicity of news and tweets
       news have short life-cycle
       twitter users use jargon language
       find the right degree of personalization
       cope with inactive twitter users




  yandex                                           aug 31, 2012
relate users, tweets, and news articles




   yandex                                 aug 31, 2012
9:;<;'=-1'>;?$1%9*"$10                                 @ABC-1'!AD1;?A


      T.rex architecture
"*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1


      Method
                                                                   T.Rex
                 Followee                       User
                                                                   User
                            tweets                       tweets    Model
Π                                                                   "      Personalized
                                                                           ranked list of
"%          Followee
                                                                           news articles
                                                                    !
1/5                    tweets
                                               twitter
                                                                    #
                                                          tweets
                 Followee

I-                          tweets      news
                                                articles
                                                                                                  R ECE
                                                                                                  C LIC
E%                                                                                                S OCI
                                                T.Rex                                             C ON
$%
          !"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5                                                 P OPU
        yandex                                                              aug 31, 2012
recommendation model


    Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)

  social model
  Σ(i, j) social relevance of
  news j to user i

                      content model
                      Γ(i, j) content relevance
                      of news j to user i

                                        popularity model
                                        Π(j) popularity model of
                                        news article j
  yandex                                              aug 31, 2012
recommendation model


    Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)

  social model
  Σ(i, j) social relevance of
  news j to user i

                      content model
                      Γ(i, j) content relevance
                      of news j to user i

                                        popularity model
                                        Π(j) popularity model of
                                        news article j
  yandex                                              aug 31, 2012
recommendation model


    Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)

  social model
  Σ(i, j) social relevance of
  news j to user i

                      content model
                      Γ(i, j) content relevance
                      of news j to user i

                                        popularity model
                                        Π(j) popularity model of
                                        news article j
  yandex                                              aug 31, 2012
recommendation model


    Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)

  social model
  Σ(i, j) social relevance of
  news j to user i

                      content model
                      Γ(i, j) content relevance
                      of news j to user i

                                        popularity model
                                        Π(j) popularity model of
                                        news article j
  yandex                                              aug 31, 2012
Personalized News Recommendation
  popularity update rule
 orales                                                              Aristides Gionis                                                                                                                                                                                   Claudio Lucche
                                                                      gionis@yahoo-inc.om                                                                                                                                                            claudio.lucchese@isti.c

   take into account recency: new                                           Motivation
   popularity45counts of older enti-                                                                                       1.2                                                                                                                                                1.4

e the popularity counts using an
                   News-click delay                                                                                                     news                                                                                                                                                   news




                                                                                        $+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0




                                                                                                                                                                                                                                           $+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0
                                                                                                                                       twitter                                                                                                                                                twitter
 %0E09":5%    40                                                                                                             1          clicks
                                                                                                                                                                                                                                                                              1.2
                                                                                                                                                                                                                                                                                               clicks

details in Section 5.3.1. However,
  V*#$-%      35
                                                                                                                           0.8
                                                                                                                                                                                                                                                                                1
             $8:<"*%+>%+''8**"$'"0




 5
 ,('-%,+405
              30
pendent of 25 recommendation
              our                                                                                                          0.6                   news become stale after two                                                                                                  0.8

                                                                                                                                                                                                                                                                              0.6

n can be used.
                                                                                                                           0.4
              20

              15                                                                                                           0.2
                                                                                                                                                 days                                                                                                                         0.4


 on for recommending news arti-
                                                                                                                                                                                                                                                                              0.2

              10                                                                                                             0

  near combination of the scoring
               5                                                                                                           -0.2
                                                                                                                                                 track mentions in news and                                                                                                     0

                                                                                                                                                                                                                                                                              -0.2
#*%,+405



                                                                                                                                   M

                                                                                                                                             M

                                                                                                                                                      M

                                                                                                                                                                M

                                                                                                                                                                          M

                                                                                                                                                                                    M

                                                                                                                                                                                              M

                                                                                                                                                                                                        M

                                                                                                                                                                                                                  M

                                                                                                                                                                                                                            M




                                                                                                                                                                                                                                                                                     M


                                                                                                                                                                                                                                                                                                 M


                                                                                                                                                                                                                                                                                                            M


                                                                                                                                                                                                                                                                                                                         M


                                                                                                                                                                                                                                                                                                                                      M


                                                                                                                                                                                                                                                                                                                                                   M
   to investigate the effect of100non-

                                                                                                                                    ay

                                                                                                                                             ay

                                                                                                                                                       ay

                                                                                                                                                                 ay

                                                                                                                                                                           ay

                                                                                                                                                                                     ay

                                                                                                                                                                                               ay

                                                                                                                                                                                                         ay

                                                                                                                                                                                                                   ay

                                                                                                                                                                                                                             ay




                                                                                                                                                                                                                                                                                     ay


                                                                                                                                                                                                                                                                                                   ay


                                                                                                                                                                                                                                                                                                                ay


                                                                                                                                                                                                                                                                                                                             ay


                                                                                                                                                                                                                                                                                                                                          ay


                                                                                                                                                                                                                                                                                                                                                       a
               0
                                                                                                                                                 tweets with exponential
                                                                                                                                       -0

                                                                                                                                                 -0

                                                                                                                                                           -0

                                                                                                                                                                     -0

                                                                                                                                                                               -0

                                                                                                                                                                                         -0

                                                                                                                                                                                                   -0

                                                                                                                                                                                                             -0

                                                                                                                                                                                                                       -0

                                                                                                                                                                                                                                 -0




                                                                                                                                                                                                                                                                                         -2


                                                                                                                                                                                                                                                                                                     -2


                                                                                                                                                                                                                                                                                                                 -2


                                                                                                                                                                                                                                                                                                                              -2


                                                                                                                                                                                                                                                                                                                                           -2
                 1           10                                      1000     10000




                                                                                                                                         1

                                                                                                                                                  2

                                                                                                                                                            2

                                                                                                                                                                      2

                                                                                                                                                                                2

                                                                                                                                                                                          2

                                                                                                                                                                                                    2

                                                                                                                                                                                                              3

                                                                                                                                                                                                                        3

                                                                                                                                                                                                                                  3




                                                                                                                                                                                                                                                                                          2


                                                                                                                                                                                                                                                                                                        2


                                                                                                                                                                                                                                                                                                                     3


                                                                                                                                                                                                                                                                                                                                  3


                                                                                                                                                                                                                                                                                                                                               4
                                                                                                                                            h2

                                                                                                                                                      h0

                                                                                                                                                                h0

                                                                                                                                                                          h0

                                                                                                                                                                                    h1

                                                                                                                                                                                              h1

                                                                                                                                                                                                        h2

                                                                                                                                                                                                                  h0

                                                                                                                                                                                                                            h0

                                                                                                                                                                                                                                      h0




                                                                                                                                                                                                                                                                                              h0


                                                                                                                                                                                                                                                                                                          h1


                                                                                                                                                                                                                                                                                                                      h0


                                                                                                                                                                                                                                                                                                                                   h1


                                                                                                                                                                                                                                                                                                                                                h0
                                                 Minutes




                                                                                                                                             0

                                                                                                                                                       0

                                                                                                                                                                 4

                                                                                                                                                                           8

                                                                                                                                                                                     2

                                                                                                                                                                                               6

                                                                                                                                                                                                         0

                                                                                                                                                                                                                   0

                                                                                                                                                                                                                             4

                                                                                                                                                                                                                                       8




                                                                                                                                                                                                                                                                                               0


                                                                                                                                                                                                                                                                                                            2


                                                                                                                                                                                                                                                                                                                         0


                                                                                                                                                                                                                                                                                                                                      2


                                                                                                                                                                                                                                                                                                                                                   0
                                     R"?0V',('-%1",#E%1(09*(<89(+$                                                                                         9:;<;'=-1'>;?$1%9*"$10                                                                                                                                @ABC-1'!AD1;?A'9*"$10
 #'E%                                                                                                                                            decay
$1%
g Rτ (u, n)). Given the components
',"05     Why Twitter?%%P(:",($"00%#$1%)"*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1%)*"1
  news N and a stream of tweets T
mmendation score of a news article
 as                           τ
                                                        Method
                                                   Z = λZτ −1 + wT HT + wN HN
Model R
· Γτ (u, n) + γ · Πτ (n),                                                                                                                                                                                                   T.Rex                                                                                                               Alg
                                                                                                    Followee                                                                   User
                                                                                                                                       tweets                                            tweets
                                                                                                                                                                                                                        User                                                                                                                    R EC
                                                                                                                                                                                                                        Model                                                                                                                   C LI
e relative weight of the components.
del Γ                 Popularity Model Π                                                                                                                                                                                     "                                                           Personalized                                           S OC
                                                                                                                                                                                                                                                                                         ranked list of
0%9@"%'+$9"$9%  6'('7'*'8%?@"*"'6,/0%(0%9@"%                                          Followee
                                                                                                                                                                                                                                                                                         news articles
                                                                                                                                                                                                                                                                                                                                                C ON
 r system produces a set of news
*%80"*%2-5      )+)8,#*(9E%+>%$"?0%#*9(',"%1/5                                                                                    tweets
                                                                                                                                                                                                                             !                                                                                                                  P OP
                                                                                                                                                                                                                                                                                                                                                T.R
andidate yandex e.g., the most re-
         news,                                                                                                                                                             twitter
                                                                                                                                                                                                                             #                                      aug 31, 2012                                                                T.R
model learning and evaluation



       Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)
       Yahoo! toolbar data
       the recommendation model should rank high
       news articles that users click
       learn the model using SVM
       use clicks and twitter profiles of 3K users
       to train and test the system




  yandex                                                    aug 31, 2012
systems evaluated


  T.rex: basic model using only user profiles
       Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n)

  T.rex+: additional features
       entity hotness
       news click count
       news article age




  yandex                                                    aug 31, 2012
0%#%4++1%)*"1('9+*%+>%($9"*"095      $(3.!4)/!5.(/!&!2&!&#-(τ6
  results
                                                    Results
                     Table 5.2: MRR, precision and coverage.

          Algorithm        MRR     P@1      P@5     P@10       Coverage
          R ECENCY         0.020   0.002    0.018   0.036        1.000
          C LICK C OUNT    0.059   0.024    0.086   0.135        1.000
          S OCIAL          0.017   0.002    0.018   0.036        0.606
          C ONTENT         0.107   0.029    0.171   0.286        0.158
          P OPULARITY      0.008   0.003    0.005   0.012        1.000
          T.R EX           0.107   0.073    0.130   0.168        1.000
          T.R EX+          0.109   0.062    0.146   0.189        1.000

                  !"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5

   R ECENCY: it ranks news articles by time of publication (most recent first);
   C LICK C OUNT: it ranks news articles by click count (highest count first);
   S OCIAL:14 ranks news articles by using T.R EX with β = γ = 0;
             it
      yandex         T.Rex+                                       aug 31, 2012
results :
  R ECENCY     it ranks news articles by time of publication (most recent first)
  C LICK C OUNT: it ranks news articles by click count (highest count first);
  S OCIAL:14 ranks news articles by using T.R EX with β = γ = 0;
            it
                      T.Rex+
  C ONTENT: it ranks news articles by using T.R EX with α = γ = 0;
                       T.Rex
           12       Popularity
  P OPULARITY: it ranks news articles by using T.R EX with α = β = 0.
                      Content
                               Social
                     10     Recency
  5.6.5 Results           Click count
       Average DCG




                     8
  We report MRR, precision and coverage results in Table 5.6.3. The two
  variants of our system, T.R EX and T.R EX+, have the best results overall.
            6

      T.R EX+ has the highest MRR of all the alternatives. This result means
            4
  that our model has a good overall performance across the dataset. C ON -
  TENT has 2also a very high MRR. Unfortunately, the coverage level achieve
  by the C ONTENT strategy is very low. This issue is mainly caused by the
            0
  sparsity of 1 2 user4 profiles. It is well know 14 15 most 18 19 20 users
              the 3      5 6 7 8 9 10 11 12 13 that 16 17 of twitter
  belong to the “silent majority,” andRanknot tweet very much.
                                        do
     The S OCIAL strategy is affected by the same problem, albeit to a much
                      63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5
   yandex                                                     aug 31, 2012
conclusions



       real-time web information can be leveraged to deliver
       relevant information

  future directions

       LSI analysis on entities
       models for different user clusters
       georgaphic information




  yandex                                               aug 31, 2012
conclusions



       real-time web information can be leveraged to deliver
       relevant information

  future directions

       LSI analysis on entities
       models for different user clusters
       georgaphic information




  yandex                                               aug 31, 2012
summary



       review concepts on query-log mining
       answering directly queries with useful tips
       challenges and opportunities in information dissemination
       news recommendations using real-time web
       many nice problems and research opportunities




  yandex                                               aug 31, 2012
thank you!


yandex        aug 31, 2012
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»
Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»

Weitere ähnliche Inhalte

Mehr von Yandex

Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...Yandex
 
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров ЯндексаСтруктурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров ЯндексаYandex
 
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров ЯндексаПредставление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров ЯндексаYandex
 
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...Yandex
 
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...Yandex
 
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...Yandex
 
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...Yandex
 
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...Yandex
 
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...Yandex
 
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...Yandex
 
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...Yandex
 
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеровКак защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеровYandex
 
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...Yandex
 
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...Yandex
 
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...Yandex
 
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...Yandex
 
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...Yandex
 
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...Yandex
 
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...Yandex
 
Эталонное описание фильма на основе десятков дубликатов
Эталонное описание фильма на основе десятков дубликатовЭталонное описание фильма на основе десятков дубликатов
Эталонное описание фильма на основе десятков дубликатовYandex
 

Mehr von Yandex (20)

Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
Как принять/организовать работу по поисковой оптимизации сайта, Сергей Царик,...
 
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров ЯндексаСтруктурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
Структурированные данные, Юлия Тихоход, лекция в Школе вебмастеров Яндекса
 
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров ЯндексаПредставление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
Представление сайта в поиске, Сергей Лысенко, лекция в Школе вебмастеров Яндекса
 
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
Плохие методы продвижения сайта, Екатерины Гладких, лекция в Школе вебмастеро...
 
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
Основные принципы ранжирования, Сергей Царик и Антон Роменский, лекция в Школ...
 
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
Основные принципы индексирования сайта, Александр Смирнов, лекция в Школе веб...
 
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
Мобильное приложение: как и зачем, Александр Лукин, лекция в Школе вебмастеро...
 
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
Сайты на мобильных устройствах, Олег Ножичкин, лекция в Школе вебмастеров Янд...
 
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
Качественная аналитика сайта, Юрий Батиевский, лекция в Школе вебмастеров Янд...
 
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
Что можно и что нужно измерять на сайте, Петр Аброськин, лекция в Школе вебма...
 
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
Как правильно поставить ТЗ на создание сайта, Алексей Бородкин, лекция в Школ...
 
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеровКак защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
Как защитить свой сайт, Пётр Волков, лекция в Школе вебмастеров
 
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
Как правильно составить структуру сайта, Дмитрий Сатин, лекция в Школе вебмас...
 
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
Технические особенности создания сайта, Дмитрий Васильева, лекция в Школе веб...
 
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
Конструкторы для отдельных элементов сайта, Елена Першина, лекция в Школе веб...
 
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
Контент для интернет-магазинов, Катерина Ерошина, лекция в Школе вебмастеров ...
 
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
Как написать хороший текст для сайта, Катерина Ерошина, лекция в Школе вебмас...
 
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
Usability и дизайн - как не помешать пользователю, Алексей Иванов, лекция в Ш...
 
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
Cайт. Зачем он и каким должен быть, Алексей Иванов, лекция в Школе вебмастеро...
 
Эталонное описание фильма на основе десятков дубликатов
Эталонное описание фильма на основе десятков дубликатовЭталонное описание фильма на основе десятков дубликатов
Эталонное описание фильма на основе десятков дубликатов
 

Kürzlich hochgeladen

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Kürzlich hochgeladen (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Арис Гионис «Методы анализа поведения пользователей и его применение в веб-поиске и рекомендации контента»

  • 1. usage mining techniques with applications to web search and content recommendation Aristides Gionis Yahoo! Research, Barcelona yandex aug 31, 2012
  • 2. yahoo! research, barcelona web mining social media and multimedia large-scale distributed systems user engagement semantic web yandex aug 31, 2012
  • 3. web mining in yahoo! research themes usage mining and query-log mining social network analysis and graph mining influence propagation other data mining problems data sources - query logs (search) and toolbar (browsing) - social networks (flickr, messenger, email, ...) - question-answering (answers) - micro-blogging (twitter) yandex aug 31, 2012
  • 4. web mining in yahoo! research themes usage mining and query-log mining social network analysis and graph mining influence propagation other data mining problems data sources - query logs (search) and toolbar (browsing) - social networks (flickr, messenger, email, ...) - question-answering (answers) - micro-blogging (twitter) yandex aug 31, 2012
  • 5. overview of the talk query-log mining query graphs query recommendations yahoo! tips news recommendations using real-time web yandex aug 31, 2012
  • 6. query-log mining yandex aug 31, 2012
  • 7. query-log mining search engines collect a large amount of query logs lots of interesting information analyzing users’ behavior creating user profiles and personalization creating knowledge bases and folksonomies finding similar concepts building systems for query recommendations using statistics for improving systems’ performance ... yandex aug 31, 2012
  • 8. query-log mining search engines collect a large amount of query logs lots of interesting information analyzing users’ behavior creating user profiles and personalization creating knowledge bases and folksonomies finding similar concepts building systems for query recommendations using statistics for improving systems’ performance ... yandex aug 31, 2012
  • 9. the click graph [Craswell and Szummer, 2007] yandex aug 31, 2012
  • 10. applications of the click graph [Craswell and Szummer, 2007] query-to-document search query-to-query suggestion document-to-query annotation document-to-document relevance feedback yandex aug 31, 2012
  • 11. the query-flow graph [Boldi et al., 2008] take into account temporal information captures the “flow” of how users submit queries definition: nodes V = Q ∪ {s, t} the distinct set of queries Q, plus a starting state s and a terminal state t edges E ⊆ V × V weights w (q, q ) representing the probability that q and q are part of the same chain yandex aug 31, 2012
  • 12. building the query-flow graph an edge (q, q ) if q and q are consecutive in at least one session weights w (q, q ) learned by machine learning features used textual features: cosine similarity, Jaccard coefficient, size of intersection, etc. session features: the number of sessions, the average session length, the average number of clicks in the sessions, the average position of the queries in the sessions, etc. and time-related features: average time difference, etc. yandex aug 31, 2012
  • 13. query-flow graph barcelona fc website 0.043 barcelona fc fixtures 0.031 barcelona fc 0.017 real madrid 0.080 0.011 0.506 0.439 barcelona hotels 0.072 0.018 cheap barcelona 0.023 hotels 0.029 <T> barcelona luxury 0.043 barcelona 0.018 barcelona hotels weather 0.416 0.523 0.100 barcelona weather online yandex aug 31, 2012
  • 14. query-flow graph picture of a funny cat and dog picture of a cat funny dog cat funny cat ^ picture of a dog dog dog for sale $ breed of dog yandex aug 31, 2012
  • 15. query recommendations the general theme: given an input query q identify similar queries q rank them and present them to the user most query graphs can be used for both tasks: similarity and ranking yandex aug 31, 2012
  • 16. query recommendations the general theme: given an input query q identify similar queries q rank them and present them to the user most query graphs can be used for both tasks: similarity and ranking yandex aug 31, 2012
  • 17. recommendations using the query-flow graph [Boldi et al., 2008] perform a random walk on the query-flow graph teleportation to the submitted query teleportation to previous queries to take into account the user history normalize PageRank score to un-biasing for very popular queries yandex aug 31, 2012
  • 18. example : apple Max. weight sq sq ˆ sq ¯ t t apple apple apple ipod apple apple fruit apple ipod apple store apple ipod apple ipod apple trailers apple trailers apple store apple belgium apple store amazon apple trailers eating apple apple mac apple mac google apple.nl apple fruit itunes amazon apple monitor apple usa pc world argos apple usa apple ipod nano argos itunes apple jobs apple.com/ipod... yandex aug 31, 2012
  • 19. example : banana → apple banana → apple banana banana banana apple eating bugs usb no banana holiday banana cs opening a banana giant chocolate bar banana shoe where is the seed in fruit banana anut banana shoe recipe 22 feb 08 fruit banana banana jules oliver banana cloths banana cs eating bugs banana cloths yandex aug 31, 2012
  • 20. example : beatles → apple beatles → apple beatles beatles beatles apple scarring apple ipod paul mcartney scarring yarns from ireland srg peppers artwork statutory instrument A55 ill get you silver beatles tribute band bashles beatles mp3 dundee folk songs GHOST’S the beatles love album ill get you place lyrics beatles fugees triger finger remix yandex aug 31, 2012
  • 21. recommendations as shortcuts to qfg [Anagnostopoulos et al., 2010] yandex aug 31, 2012
  • 22. the query-recommendation problem yandex aug 31, 2012
  • 23. the query-recommendation problem yandex aug 31, 2012
  • 24. the query-recommendation problem yandex aug 31, 2012
  • 25. the query-recommendation problem yandex aug 31, 2012
  • 26. the recommendation problem model user behavior as a random walk on qfg a user starts at query q0 and follows a path p of reformulations on qfg before terminating consider a reward function w (q) on the nodes of qfg goal: “nudge” users in order to maximize their reward objectives: 1. collect a large reward along the way 2. end the session at a high-reward node applications: a general problem formulation for suggesting shortcuts (web graph, social networks, etc.) yandex aug 31, 2012
  • 27. probabilistic model we can only suggest, not order the user we do not know how the user will act random walk on qfg is modeled by stochastic matrix P recommendations R modify P to P = P + R yandex aug 31, 2012
  • 28. utility functions reward function w (q) on queries - quality of search results, user satisfaction, dwell time, monetization, etc. utility function U(p) on paths p = q0 . . . qk−1 T U(p) = w (q) U(p) = w (qk−1 ), q∈p (Cafavy) (Machiavelli) “road to Ithaca” “end justify the means” yandex aug 31, 2012
  • 29. utility Sum of expected values 1.2 1.0 0.8 0.6 0.4 0.2 0.0 w ρ ρw 1−step heuristic yandex aug 31, 2012
  • 30. qfg projections for diverse recommendations [Bordino et al., 2010] yandex aug 31, 2012
  • 31. diverse recommendations [Bordino et al., 2010] we want not only relevant and high-quality recommendations, but also a diverse set we want recommendations that take to different “directions” in the qfg need notions of distance of queries in the qfg use spectral embeddings project a graph in a low dimensional space, so that embedding minimizes total edge distortion finding diverse recommendations reduces to a geometric problem yandex aug 31, 2012
  • 32. example: time Spectral projection on 2-hop neighborhood time time magazine new york times time zone world time what time is it time warner time warner cable time magazine 0.9953 0.0162 0.1422 0.1049 -0.6071 -0.6056 new york times 0.9953 -0.0051 0.1248 0.0893 -0.6478 -0.6462 time zone 0.0162 -0.0051 0.9903 0.9891 -0.5234 -0.5254 world time 0.1422 0.1248 0.9903 0.9970 -0.6263 -0.6282 what time is it 0.1049 0.0893 0.9891 0.9970 -0.6244 -0.6263 time warner -0.6071 -0.6478 -0.5234 -0.6263 -0.6244 0.9999 time warner cable -0.6056 -0.6462 -0.5254 -0.6282 -0.6263 0.9999 yandex aug 31, 2012
  • 33. improving recommendation for long-tail queries via templates [Szpektor et al., 2011] yandex aug 31, 2012
  • 34. motivation goal: improve coverage of query-recommendation systems observation: in a typical query log 50 % of query volume are unique queries [Baeza-Yates et al., 2007] most query-recommendation systems are based on finding queries that co-occur frequently inherent limitation on using co-occurrences need to be able to develop methods to reason for rare, and even previously unseen, queries yandex aug 31, 2012
  • 35. overview of the approach 1 generate candidate query-templates for each query Paris hotels → <city> hotels Paris hotels → <district> hotels Moscow hotels → <city> hotels 2 infer transitions between templates <city> hotels → <city> restaurants 3 infer recommendations for rare queries Yancheng hotels → Yancheng restaurants yandex aug 31, 2012
  • 36. overview of the approach 1 generate candidate query-templates for each query Paris hotels → <city> hotels Paris hotels → <district> hotels Moscow hotels → <city> hotels 2 infer transitions between templates <city> hotels → <city> restaurants 3 infer recommendations for rare queries Yancheng hotels → Yancheng restaurants yandex aug 31, 2012
  • 37. overview of the approach 1 generate candidate query-templates for each query Paris hotels → <city> hotels Paris hotels → <district> hotels Moscow hotels → <city> hotels 2 infer transitions between templates <city> hotels → <city> restaurants 3 infer recommendations for rare queries Yancheng hotels → Yancheng restaurants yandex aug 31, 2012
  • 38. overview of the approach 1 generate candidate query-templates for each query Paris hotels → <city> hotels Paris hotels → <district> hotels Moscow hotels → <city> hotels 2 infer transitions between templates <city> hotels → <city> restaurants 3 infer recommendations for rare queries Yancheng hotels → Yancheng restaurants yandex aug 31, 2012
  • 39. overview of the approach 1 generate candidate query-templates for each query Paris hotels → <city> hotels Paris hotels → <district> hotels Moscow hotels → <city> hotels 2 infer transitions between templates <city> hotels → <city> restaurants 3 infer recommendations for rare queries Yancheng hotels → Yancheng restaurants yandex aug 31, 2012
  • 40. overview of the approach 1 generate candidate query-templates for each query Paris hotels → <city> hotels Paris hotels → <district> hotels Moscow hotels → <city> hotels 2 infer transitions between templates <city> hotels → <city> restaurants 3 infer recommendations for rare queries Yancheng hotels → Yancheng restaurants yandex aug 31, 2012
  • 41. query templates defined over a hierarchy of entity types define a global set of templates over the whole query log do not restrict on specific domains (such as, travel, weather, or movies) examples: jaguar spare parts → <car> spare parts name for salt → name for <compound> a thousand miles notes → <song> notes yandex aug 31, 2012
  • 42. query templates defined over a hierarchy of entity types define a global set of templates over the whole query log do not restrict on specific domains (such as, travel, weather, or movies) examples: jaguar spare parts → <car> spare parts name for salt → name for <compound> a thousand miles notes → <song> notes yandex aug 31, 2012
  • 43. candidate templates – example substance food drink dessert instruction chocolate cookie chocolate cookie recipe query: chocolate cookie recipe candidate templates: <food> cookie recipe <drink> cookie recipe <food> recipe <substance> recipe chocolate cookie <instruction> . . . yandex aug 31, 2012
  • 44. candidate templates – example substance food drink dessert instruction chocolate cookie chocolate cookie recipe query: chocolate cookie recipe candidate templates: <food> cookie recipe <drink> cookie recipe <food> recipe <substance> recipe chocolate cookie <instruction> . . . yandex aug 31, 2012
  • 45. candidate templates – example substance food drink dessert instruction chocolate cookie chocolate cookie recipe query: chocolate cookie recipe candidate templates: <food> cookie recipe <drink> cookie recipe <food> recipe <substance> recipe chocolate cookie <instruction> . . . yandex aug 31, 2012
  • 46. ranking candidate templates ambiguity Jaguar spare parts → <car> spare parts Jaguar spare parts → <animal> spare parts focus name for salt → name for <compound> name for salt → <description> for salt right generalization level Paris hotels → <capital> hotels Paris hotels → <city> hotels Paris hotels → <location> hotels yandex aug 31, 2012
  • 47. ranking candidate templates ambiguity Jaguar spare parts → <car> spare parts Jaguar spare parts → <animal> spare parts focus name for salt → name for <compound> name for salt → <description> for salt right generalization level Paris hotels → <capital> hotels Paris hotels → <city> hotels Paris hotels → <location> hotels yandex aug 31, 2012
  • 48. ranking candidate templates ambiguity Jaguar spare parts → <car> spare parts Jaguar spare parts → <animal> spare parts focus name for salt → name for <compound> name for salt → <description> for salt right generalization level Paris hotels → <capital> hotels Paris hotels → <city> hotels Paris hotels → <location> hotels yandex aug 31, 2012
  • 49. construction of query templates – details hierarchy used: WordNet 3.0 hierarchy and Wikipedia category hierarchy, connected via yago mapping queries are tokenized, and n-grams are looked up and mapped to entities in the hierarchy enriched with heuristic generalizations for <email>, <url>, numbers, and noun-phrases not in the taxonomy yandex aug 31, 2012
  • 50. query-to-template edges mapping from a query q to its set of templates T (q) viewed as query-to-template edges associated edge scores sqt (q, t) = αd when t obtained by generalizing q at distance d in H parameter α set experimentally to 0.9 set sqt (q, q ) = 1, if (q, q ) edge in query-flow graph normalize so that all sqt (q, ·) sum to 1 yandex aug 31, 2012
  • 51. template-to-templates edges reasoning about transitions between templates <food> recipe → healthy <food> recipe for templates (t1 , t2 ) define the support set of query pairs {(q1 , q2 )}, s.t. t1 ∈ T (q1 ) and t2 ∈ T (q2 ) t1 and t2 substitute the same token in q1 and q2 (e.g., dosa recipe and healthy dosa recipe) define template-to-template edge score as stt (t1 , t2 ) = sqq (q1 , q2 ) (q1 ,q2 )∈Sup(t1 ,t2 ) normalize so that all stt (t, ·) sum to 1 yandex aug 31, 2012
  • 52. example – ambiguity consider query transition: jaguar transmission → jaguar spare parts template transition <car> transmission → <car> spare parts supported by bmw transmission → bmw spare parts audi transmission → audi spare parts ... template transition <animal> transmission → <animal> spare parts will not be supported by lion transmission → lion spare parts tiger transmission → tiger spare parts ... yandex aug 31, 2012
  • 53. example – ambiguity consider query transition: jaguar transmission → jaguar spare parts template transition <car> transmission → <car> spare parts supported by bmw transmission → bmw spare parts audi transmission → audi spare parts ... template transition <animal> transmission → <animal> spare parts will not be supported by lion transmission → lion spare parts tiger transmission → tiger spare parts ... yandex aug 31, 2012
  • 54. the query-template flow graph extension of the query-flow graph superposition of all the concepts we have seen so far: set of nodes consists of queries and templates set of edges consists of query to query edges query to template edges template to template edges associated weights yandex aug 31, 2012
  • 55. generating recommendations s4 q q s1 s2 s5 q q t1 t3 s6 s3 t2 s7 t4 r (q, q ) = s1 s4 + s2 s5 + s3 s6 + s3 s7 interpretation: probability of a feasible path dashed lines do not really exist, but discovered on-the-fly queries q and q may not have been seen before transitions in the query-flow graph ranked first yandex aug 31, 2012
  • 56. methodology methods: query-template flow graph query-flow graph evaluation: inspection a sample of the results editorial evaluation automated evaluation yandex aug 31, 2012
  • 57. training dataset queries templates # nodes 95 279 132 5 382 051 983 # edges 83 513 590 4 345 497 267 avg degree 0.88 0.81 max out-degree 14 145 34 249 (craigslist) (<album>) max in-degree 14 317 133 874 (youtube) (<institution>) yandex aug 31, 2012
  • 58. anecdotal evidence {“guangzhou flights”, “guangzhou map”} <capital> flights → <capital> map {“a thousand miles notes”, “a thousand miles piano notes”} <single> notes → <single> piano notes {“8 week old weimaraner”, “8 week old weimaraner puppy”} 8 week old <breed> → 8 week old <breed> puppy {“aaa office twin falls idaho”, “aaa twin falls idaho”} aaa office <city> → aaa <city> {“air force titles”, “air force ranks”} <military service> titles → <military service> ranks {“name for salt”, “chemical name for salt”} name for <compound> → chemical name for <compound> yandex aug 31, 2012
  • 59. editorial evaluation set-A: 300 pairs from each configuration, recommendation in the top-10 set-B: 100 pairs, same queries in each configuration, same position set-C: 100 pairs for which query-flow graph has no recommendation editors labeled query-recommendation pairs as: relevant, not relevant, cannot tell two editors, 100 common queries, kappa-statistic 0.37 qfg qtfg set-A 98.48% 97.84% set-B 97.65% 98.86% set-C — 94.38% yandex aug 31, 2012
  • 60. automated evaluation – guiding principle extract query pairs {qi , qi+1 } from a testing dataset, such that user submitted qi+1 after qi in the same session measure if qi+1 is predicted by our methods, and in which position assumption: qi+1 should be relevant and useful for qi yandex aug 31, 2012
  • 61. results qfg qtfg relative increase pair occurrences total pairs 3134388 3134388 coverage 22.65 % 28.17 % 24.37 % # in top-100 16.97 % 25.49 % 50.23 % # in top-10 9.49 % 20.74 % 118.49 % # in top-1 2.86 % 10.01 % 249.5 % MAP 0.050 0.137 avg. position 18.35 8.3 unique pairs total pairs 2755922 2755922 coverage 13.28 % 19.38 % 45.87 % # in top-100 12.06 % 17.25 % 42.96 % # in top-10 8.41 % 13.52 % 60.68 % # in top-1 2.86 % 6.5 % 127.32 % MAP 0.047 0.089 yandex avg. position 12.33 9.43 aug 31, 2012
  • 62. results 20 QFG 18 QTFG # test-pairs at top-10 (%) 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 16 query length (words) yandex aug 31, 2012
  • 63. conclusions improve coverage of query recommendation systems recommendations for rare or previously unseen queries well suited for tail queries complements rather than replaces existing methods future work: improve quality of extracted templates yandex aug 31, 2012
  • 64. yahoo! tips [Weber et al., 2011] yandex aug 31, 2012
  • 65. motivation provide answers, not links identify “how to” queries and provide tips tip: piece of advice that is 1 short 2 concrete 3 self-contained 4 non-obvious yandex aug 31, 2012
  • 66. yahoo! tips yandex aug 31, 2012
  • 67. yahoo! tips yandex aug 31, 2012
  • 68. yahoo! tips yandex aug 31, 2012
  • 69. yahoo! tips yandex aug 31, 2012
  • 70. extract tips from yahoo! answers tip: To tell if your eggs are fresh : place eggs in a bowl/glass of water.....if it floats it’s bad. if it sinks it’s good. yandex aug 31, 2012
  • 71. system diagram zest lime without zester rule-based extraction 250k candidate tips Does query have no show normal how-to intent? search results Obtain quality labels for 20k candidate tip using CrowdFlower yes machine learning Are there relevant show normal 22k high quality tips no high quality tips? search results yes rank the matching tips and display highest ranking one TIP: To zest a lime if you don‘t have a zester : use a cheese grater yandex aug 31, 2012
  • 72. mining tips from yahoo! answers consider tips of a specific structure: “X : Y ” X : goal of the tip Y : action of the tip examples To get the mildew smell out of your towels : try soaking it in a salt water solution, then washing with soap and cold water, that tends to get rid of smells To style your hair without heat, gel or straighteners : try coconut oil mark k yandex aug 31, 2012
  • 73. mining tips from yahoo! answers english only literal “how to” queries answer should start with a verb consider only best answers replace I, my, me, myself, etc. with you, your, you, yourself, etc. yandex aug 31, 2012
  • 74. quality filtering generated 249 675 tips manually label 20 000 using CrowdFlower classes: very good (25%), ok (48%), bad (27%) algorithms svm (rbf) decision trees k-nn (Euclidean, k = 21 . . . 50) feature families: 18 handcrafted features: e.g., style (Flesch-Kincaid reading level), sentiment, # urls, emoticons, etc. content: SVD on the tip×term matrix yandex aug 31, 2012
  • 75. quality filtering generated 249 675 tips manually label 20 000 using CrowdFlower classes: very good (25%), ok (48%), bad (27%) algorithms svm (rbf) decision trees k-nn (Euclidean, k = 21 . . . 50) feature families: 18 handcrafted features: e.g., style (Flesch-Kincaid reading level), sentiment, # urls, emoticons, etc. content: SVD on the tip×term matrix yandex aug 31, 2012
  • 76. quality filtering generated 249 675 tips manually label 20 000 using CrowdFlower classes: very good (25%), ok (48%), bad (27%) algorithms svm (rbf) decision trees k-nn (Euclidean, k = 21 . . . 50) feature families: 18 handcrafted features: e.g., style (Flesch-Kincaid reading level), sentiment, # urls, emoticons, etc. content: SVD on the tip×term matrix yandex aug 31, 2012
  • 77. quality filtering — machine learning results Method handcrafted content both features features SVM 0.63/0.13 0.60/0.09 0.63/0.16 Hard Decision Tree 0.67/0.07 0.61/0.06 0.66/0.13 k-NN 0.62/0.23 0.56/0.11 0.63/0.11 SVM 0.95/0.11 0.93/0.05 0.95/0.08 Soft Decision Tree 0.95/0.03 0.92/0.03 0.94/0.06 k-NN 0.94/0.11 0.91/0.05 0.94/0.05 yandex aug 31, 2012
  • 78. quality filtering — machine learning results Category P,R VG size Beauty & Style 0.53,0.08 0.16 0.08 Business & Finance 0.57,0.20 0.20 0.03 Cars & Transportation 0.64,0.12 0.23 0.03 Computers & Internet 0.69,0.33 0.45 0.15 Consumer Electronics 0.70,0.23 0.38 0.06 Entertainment & Music 0.60,0.39 0.15 0.05 Family & Relationships 0.35,0.05 0.06 0.14 Games & Recreation 0.61,0.31 0.24 0.04 Health 0.62,0.07 0.15 0.09 Home & Garden 0.43,0.06 0.27 0.04 Society & Culture 0.50,0.19 0.09 0.03 Sports 0.68,0.24 0.19 0.03 Yahoo! Products 0.73,0.43 0.45 0.07 yandex aug 31, 2012
  • 79. detecting “how to” queries how many? 2-3% of volume, 3-4% of distinct queries start with “how to” “how do i” or “how can i” how do you fix keys on a laptop P: 96-99%, cover: 1.0% queries start with an action verb play my music on tool bar raido P: 7-14%, cover: 3.2% if exists “how to X” then “X” craft ideas for boys P: 87-94%, cover: 1.1% incoming queries to “how to” web sites fixing a wet cell phone P: 61-75%, cover: 0.08% yandex aug 31, 2012
  • 80. detecting “how to” queries how many? 2-3% of volume, 3-4% of distinct queries start with “how to” “how do i” or “how can i” how do you fix keys on a laptop P: 96-99%, cover: 1.0% queries start with an action verb play my music on tool bar raido P: 7-14%, cover: 3.2% if exists “how to X” then “X” craft ideas for boys P: 87-94%, cover: 1.1% incoming queries to “how to” web sites fixing a wet cell phone P: 61-75%, cover: 0.08% yandex aug 31, 2012
  • 81. detecting “how to” queries how many? 2-3% of volume, 3-4% of distinct queries start with “how to” “how do i” or “how can i” how do you fix keys on a laptop P: 96-99%, cover: 1.0% queries start with an action verb play my music on tool bar raido P: 7-14%, cover: 3.2% if exists “how to X” then “X” craft ideas for boys P: 87-94%, cover: 1.1% incoming queries to “how to” web sites fixing a wet cell phone P: 61-75%, cover: 0.08% yandex aug 31, 2012
  • 82. detecting “how to” queries how many? 2-3% of volume, 3-4% of distinct queries start with “how to” “how do i” or “how can i” how do you fix keys on a laptop P: 96-99%, cover: 1.0% queries start with an action verb play my music on tool bar raido P: 7-14%, cover: 3.2% if exists “how to X” then “X” craft ideas for boys P: 87-94%, cover: 1.1% incoming queries to “how to” web sites fixing a wet cell phone P: 61-75%, cover: 0.08% yandex aug 31, 2012
  • 83. detecting “how to” queries how many? 2-3% of volume, 3-4% of distinct queries start with “how to” “how do i” or “how can i” how do you fix keys on a laptop P: 96-99%, cover: 1.0% queries start with an action verb play my music on tool bar raido P: 7-14%, cover: 3.2% if exists “how to X” then “X” craft ideas for boys P: 87-94%, cover: 1.1% incoming queries to “how to” web sites fixing a wet cell phone P: 61-75%, cover: 0.08% yandex aug 31, 2012
  • 84. matching queries to tips precision–recall trade-off index only the “goal” or also “action” use AND or OR mode for query require minimum “span” for the goal ranking rank by number of query tokens in goal, then tf·idf yandex aug 31, 2012
  • 85. matching queries to tips — evaluation mode min span vol. dist. P@1 median AND .50 8.7% 2.7% .428/.680 1 AND .66 6.8% 1.8% .557/.770 1 AND 1.0 4.4% 0.8% .625/.835 1 OR .50 87.4% 88.4% .048/.110 18 OR .66 36.8% 36.3% .092/.200 2 OR 1.0 13.5% 10.3% .160/.300 1 yandex aug 31, 2012
  • 86. future work mine tips from other recourses twitter wikitravel improve quality of existing system incorporating more features improving rule extraction classification yandex aug 31, 2012
  • 87. information dissemination in social networks yandex aug 31, 2012
  • 88. the information dissemination spectrum news sites content-provider sites web search editorially curated url, images, music, users browse ... no specific info need clear intent social media (twitter, facebook) recommendations (content- or context- or geo-aware) user-generated content (blogs, images, q/a) yandex aug 31, 2012
  • 89. the information dissemination spectrum news sites content-provider sites web search editorially curated url, images, music, users browse ... no specific info need clear intent social media (twitter, facebook) recommendations (content- or context- or geo-aware) user-generated content (blogs, images, q/a) yandex aug 31, 2012
  • 90. the information dissemination spectrum news sites content-provider sites web search editorially curated url, images, music, users browse ... no specific info need clear intent social media (twitter, facebook) recommendations (content- or context- or geo-aware) user-generated content (blogs, images, q/a) yandex aug 31, 2012
  • 91. social media yandex aug 31, 2012
  • 92. the information overload problem yandex aug 31, 2012
  • 93. social media and user-generated content paradigm shift from a broadcast one-to-many mechanism to a many-to-many model users at the role of information producers yandex aug 31, 2012
  • 94. benefits and opportunities wealth of information of extreme volume and diversity wisdom of crowd phenomena accurate profiling and personalization (toolbar, search, clicks) content- and context- information available social and geo information available yandex aug 31, 2012
  • 95. challenges heterogeneous sources high variability in quality needle-in-the-haystack problems we want to: support users to seek, filter, and disseminate information build efficient platforms that support social-media functionalities yandex aug 31, 2012
  • 96. challenges heterogeneous sources high variability in quality needle-in-the-haystack problems we want to: support users to seek, filter, and disseminate information build efficient platforms that support social-media functionalities yandex aug 31, 2012
  • 97. personalized news recommendations by harnessing the real-time web [De Francisci Morales et al., 2012] yandex aug 31, 2012
  • 98. overview a news recommendation system based on real-time web, e.g., twitter suggest news articles to twitter users infer user preferences from twitter activity yandex aug 31, 2012
  • 99. yahoo! news yandex aug 31, 2012
  • 100. yahoo! news yandex aug 31, 2012
  • 101. yahoo! news yandex aug 31, 2012
  • 102. sources characteristics news stream + high coverage − sparse and noisy data for user profiling − latency on collecting user feedback twitter stream + much more accurate personalization + news spread very fast yandex aug 31, 2012
  • 103. otivation 1.2 1.4 news $+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0 $+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0 twitter 1.2 1 clicks 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -0.2 -0.2 M M M M M M M M M M ay ay ay ay ay ay ay ay ay ay -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 10000 1 2 2 2 2 2 2 3 3 3 h2 h0 h0 h0 h1 h1 h2 h0 h0 h0 0 0 4 8 2 6 0 0 4 8 9:;<;'=-1'>;?$1%9*"$10 yandex aug 31, 2012
  • 104. ke into account recency: new Motivat pularity45counts of older enti- 1.2 e popularity counts using an News-click delay $+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0 ":5% 40 1 ails in Section 5.3.1. However, -% 35 0.8 $8:<"*%+>%+''8**"$'"0 30 dent of 25 recommendation +405 our 0.6 0.4 n be used.20 15 0.2 for recommending news arti- 10 0 r combination of the scoring 5 -0.2 05 investigate the effect of100non- 0 1 10 1000 10000 Minutes R"?0V',('-%1",#E%1(09*(<89(+$ yandex aug 31, 2012
  • 105. yandex aug 31, 2012
  • 106. challenges scale to large volumes of news and tweets high dynamicity of news and tweets news have short life-cycle twitter users use jargon language find the right degree of personalization cope with inactive twitter users yandex aug 31, 2012
  • 107. relate users, tweets, and news articles yandex aug 31, 2012
  • 108. 9:;<;'=-1'>;?$1%9*"$10 @ABC-1'!AD1;?A T.rex architecture "*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1 Method T.Rex Followee User User tweets tweets Model Π " Personalized ranked list of "% Followee news articles ! 1/5 tweets twitter # tweets Followee I- tweets news articles R ECE C LIC E% S OCI T.Rex C ON $% !"#$%%<8(,10%80"*%)*+=,"0%>*+:%9?(99"*5 P OPU yandex aug 31, 2012
  • 109. recommendation model Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n) social model Σ(i, j) social relevance of news j to user i content model Γ(i, j) content relevance of news j to user i popularity model Π(j) popularity model of news article j yandex aug 31, 2012
  • 110. recommendation model Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n) social model Σ(i, j) social relevance of news j to user i content model Γ(i, j) content relevance of news j to user i popularity model Π(j) popularity model of news article j yandex aug 31, 2012
  • 111. recommendation model Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n) social model Σ(i, j) social relevance of news j to user i content model Γ(i, j) content relevance of news j to user i popularity model Π(j) popularity model of news article j yandex aug 31, 2012
  • 112. recommendation model Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n) social model Σ(i, j) social relevance of news j to user i content model Γ(i, j) content relevance of news j to user i popularity model Π(j) popularity model of news article j yandex aug 31, 2012
  • 113. Personalized News Recommendation popularity update rule orales Aristides Gionis Claudio Lucche gionis@yahoo-inc.om claudio.lucchese@isti.c take into account recency: new Motivation popularity45counts of older enti- 1.2 1.4 e the popularity counts using an News-click delay news news $+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0 $+*:#,(Q"1%$8:<"*%+>%+''8**"$'"0 twitter twitter %0E09":5% 40 1 clicks 1.2 clicks details in Section 5.3.1. However, V*#$-% 35 0.8 1 $8:<"*%+>%+''8**"$'"0 5 ,('-%,+405 30 pendent of 25 recommendation our 0.6 news become stale after two 0.8 0.6 n can be used. 0.4 20 15 0.2 days 0.4 on for recommending news arti- 0.2 10 0 near combination of the scoring 5 -0.2 track mentions in news and 0 -0.2 #*%,+405 M M M M M M M M M M M M M M M M to investigate the effect of100non- ay ay ay ay ay ay ay ay ay ay ay ay ay ay ay a 0 tweets with exponential -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -2 -2 -2 -2 -2 1 10 1000 10000 1 2 2 2 2 2 2 3 3 3 2 2 3 3 4 h2 h0 h0 h0 h1 h1 h2 h0 h0 h0 h0 h1 h0 h1 h0 Minutes 0 0 4 8 2 6 0 0 4 8 0 2 0 2 0 R"?0V',('-%1",#E%1(09*(<89(+$ 9:;<;'=-1'>;?$1%9*"$10 @ABC-1'!AD1;?A'9*"$10 #'E% decay $1% g Rτ (u, n)). Given the components ',"05 Why Twitter?%%P(:",($"00%#$1%)"*0+$#,(Q#9(+$5%R"?0%<"'+:"%09#,"%3"*E%>#09%#$1%0)*"#1%>#09"*%+$%9?(99"*5%P?(99"*%(0%#%4++1%)*"1 news N and a stream of tweets T mmendation score of a news article as τ Method Z = λZτ −1 + wT HT + wN HN Model R · Γτ (u, n) + γ · Πτ (n), T.Rex Alg Followee User tweets tweets User R EC Model C LI e relative weight of the components. del Γ Popularity Model Π " Personalized S OC ranked list of 0%9@"%'+$9"$9% 6'('7'*'8%?@"*"'6,/0%(0%9@"% Followee news articles C ON r system produces a set of news *%80"*%2-5 )+)8,#*(9E%+>%$"?0%#*9(',"%1/5 tweets ! P OP T.R andidate yandex e.g., the most re- news, twitter # aug 31, 2012 T.R
  • 114. model learning and evaluation Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n) Yahoo! toolbar data the recommendation model should rank high news articles that users click learn the model using SVM use clicks and twitter profiles of 3K users to train and test the system yandex aug 31, 2012
  • 115. systems evaluated T.rex: basic model using only user profiles Rτ (u, n) = α · Στ (u, n) + β · Γτ (u, n) + γ · Πτ (n) T.rex+: additional features entity hotness news click count news article age yandex aug 31, 2012
  • 116. 0%#%4++1%)*"1('9+*%+>%($9"*"095 $(3.!4)/!5.(/!&!2&!&#-(τ6 results Results Table 5.2: MRR, precision and coverage. Algorithm MRR P@1 P@5 P@10 Coverage R ECENCY 0.020 0.002 0.018 0.036 1.000 C LICK C OUNT 0.059 0.024 0.086 0.135 1.000 S OCIAL 0.017 0.002 0.018 0.036 0.606 C ONTENT 0.107 0.029 0.171 0.286 0.158 P OPULARITY 0.008 0.003 0.005 0.012 1.000 T.R EX 0.107 0.073 0.130 0.168 1.000 T.R EX+ 0.109 0.062 0.146 0.189 1.000 !"#$%&"'()*+'#,%&#$-.%/*"'(0(+$%#$1%2+3"*#4"5 R ECENCY: it ranks news articles by time of publication (most recent first); C LICK C OUNT: it ranks news articles by click count (highest count first); S OCIAL:14 ranks news articles by using T.R EX with β = γ = 0; it yandex T.Rex+ aug 31, 2012
  • 117. results : R ECENCY it ranks news articles by time of publication (most recent first) C LICK C OUNT: it ranks news articles by click count (highest count first); S OCIAL:14 ranks news articles by using T.R EX with β = γ = 0; it T.Rex+ C ONTENT: it ranks news articles by using T.R EX with α = γ = 0; T.Rex 12 Popularity P OPULARITY: it ranks news articles by using T.R EX with α = β = 0. Content Social 10 Recency 5.6.5 Results Click count Average DCG 8 We report MRR, precision and coverage results in Table 5.6.3. The two variants of our system, T.R EX and T.R EX+, have the best results overall. 6 T.R EX+ has the highest MRR of all the alternatives. This result means 4 that our model has a good overall performance across the dataset. C ON - TENT has 2also a very high MRR. Unfortunately, the coverage level achieve by the C ONTENT strategy is very low. This issue is mainly caused by the 0 sparsity of 1 2 user4 profiles. It is well know 14 15 most 18 19 20 users the 3 5 6 7 8 9 10 11 12 13 that 16 17 of twitter belong to the “silent majority,” andRanknot tweet very much. do The S OCIAL strategy is affected by the same problem, albeit to a much 63"*#4"%7(0'+8$9"1%28:8,#9(3"%;#($5 yandex aug 31, 2012
  • 118. conclusions real-time web information can be leveraged to deliver relevant information future directions LSI analysis on entities models for different user clusters georgaphic information yandex aug 31, 2012
  • 119. conclusions real-time web information can be leveraged to deliver relevant information future directions LSI analysis on entities models for different user clusters georgaphic information yandex aug 31, 2012
  • 120. summary review concepts on query-log mining answering directly queries with useful tips challenges and opportunities in information dissemination news recommendations using real-time web many nice problems and research opportunities yandex aug 31, 2012
  • 121. thank you! yandex aug 31, 2012