SlideShare ist ein Scribd-Unternehmen logo
1 von 75
Machine Learned Relevance at a Large Scale Search Engine
  Salford Analytics and Data Mining Conference 2012
Machine Learned Relevance at a Large Scale
             Search Engine
              Salford Data Mining – May 25, 2012
                          Presented by:
               Dr. Eric Glover – eric@quixey.com
       Dr. James Shanahan – james.shanahan@gmail.com
About the Authors
   James G. Shanahan - PhD in Machine Learning University of Bristol, UK
     – 20+ years in the field AI and information science
     – Principal and Founder, Boutique Data Consultancy
         • Clients include: Adobe, Digg, SearchMe, AT&T, Ancestry SkyGrid, Telenav
     – Affiliated with University of California Santa Cruz (UCSC)
     – Adviser to Quixey
     – Previously
         • Chief Scientist, Turn Inc. (A CPX ad network, DSP)
         • Principal Scientist, Clairvoyance Corp (CMU spinoff)
         • Co-founder of Document Souls (task centric info access system)
         • Research Scientist, Xerox Research (XRCE)
         • AI Research Engineer, Mitsubishi Group
About the Authors
   Eric Glover - PhD CSE (AI) From U of M in 2001
     – Fellow at Quixey, where among other things, he focuses on the architecture
         and processes related to applied machine learning for relevance and
         evaluation methodologies
     – More than a dozen years of Search Engine experience including: NEC Labs, Ask
         Jeeves, SearchMe, and own startup.
     – Multiple relevant publications ranging from classification to automatically
         discovering topical hierarchies
     – Dissertation studied Personalizing Web Search through incorporation of user-
         preferences and machine learning
     – More than a dozen filed patents
Talk Outline
   Introduction: Search and Machine Learned Ranking
   Relevance and evaluation methodologies
   Data collection and metrics
   Quixey – Functional Application Search™
   System Architecture, features, and model training
   Alternative approaches
   Conclusion
Google
Search Engine: SearchMe
Search engine: lets you see and hear what you're searching for
6 Steps to MLR in Practice
                                                 6
                                                       Deploy System in the wild (and
                                                                   test)
                                   5
                                              Interpret and Evaluate
                                              discovered knowledge
                    4
                             Modeling: Extract Patterns/Models
            3
                        Feature Engineering
     2
             Collect requirements, and Data
1
     Understand the domain and                           Systems Modeling is inherently
         Define problems                                       interactive and iterative.
How is ML for Search Unique
 Many Machine Learning (ML) systems start with source data
   – Goal is to analyze, model, predict
   – Features are often pre-defined, in a well-studied area
 MLR for Search Engines is different from many other ML applications:
   – Does not start with labeled data
       • Need to pay judges to provide labels
   – Opportunity to invent new features (Feature Engineering)
   – Often require real-time operation
       • Processing tens of billions of possible results, microseconds matter
   – Require domain-specific metrics for evaluation
If we can’t measure “it”, then…

 ….we should think twice about doing “it”
 Measurement has enabled us to compare systems
  and also to machine learn them
 Search is about measurement, measurement and
  measurement
Improve in a Measured Way
From Information Needs Queries
   The idea of using computers to search for relevant pieces of information was
    popularized in the article As We May Think by Vannevar Bush in 1945

   An information need is an individual or group's desire to locate and obtain
    information to satisfy a conscious or unconscious need.

   Within the context of web search information needs are expressed as textual
    queries (possibly with constraints)

   E.g., “Analytics Data Mining Conference” program

   Metric: “Relevance” as a measure of how well is a system performing
Relevance is a Huge Challenge
    Relevance typically denotes how well a retrieved object (document) or set of objects
     meets the information need of the user.
    Relevance is often viewed as multifaceted.
      – A core facet of relevance relates to topical relevance or aboutness,
            • i.e., to what extent the topic of a result matches the topic of the query or
               information need.
            • Another facet of relevance is based on user perception, and sometimes referred
               to as user relevance; it encompasses other concerns of the user such as
               timeliness, authority or novelty of the result
    In local search type queries, yet another facet of relevance that comes into play is
     geographical aboutness,
      – i.e., to what extent the location of a result, a business listing, matches the location of
          the query or information need
From Cranfield to TREC
                            Text REtrieval
                             Conference/Competition
                              – http://trec.nist.gov/
                              –   Run by NIST (National Institute of
                                  Standards & Technology)

                            Started in 1992

                            Collections: > 6 Gigabytes (5
                             CRDOMs), >1.5 Million Docs
                              –   Newswire & full text news (AP,
                                  WSJ, Ziff, FT)
                              –   Government documents
                                  (federal register, Congressional
                                  Record)
                              –   Radio Transcripts (FBIS)
                              –   Web “subsets”
                              –   Tweets
The TREC Benchmark
   TREC: Text REtrieval Conference (http://trec.nist.gov/) Originated from the
    TIPSTER program sponsored by Defense Advanced Research Projects Agency
    (DARPA).

   Became an annual conference in 1992, co-sponsored by National Institute of
    Standards and Technology (NIST) and DARPA.

   Participants are given parts of a standard set of documents and TOPICS (from
    which queries have to be derived) in different stages for training and testing.

   Participants submit the P/R values for the final document and query corpus
    and present their results at the conference.



                                                                       15
User’s
Information
Need                       Collections

                                            Pre-process
 text input


  Parse                 Query                  Index


                                    Match




  Query Reformulation
User’s
Information
Need                       Collections

                                          Pre-process
 text input


  Parse                 Query                   Index


                                Rank or Match


                                                        Evaluation
  Query Reformulation
Talk Outline
   Introduction: Search and Machine Learned Ranking
   Relevance and evaluation methodologies
   Data collection and metrics
   Quixey – Functional Application Search™
   System Architecture, features, and model training
   Alternative approaches
   Conclusion
Difficulties in Evaluating IR Systems
 Effectiveness is related to the relevancy of the set of returned
  items.
 Relevancy is not typically binary but continuous.
 Even if relevancy is binary, it can be a difficult judgment to make.
 Relevancy, from a human standpoint, is:
    – Subjective: Depends upon a specific user’s judgment.
    – Situational: Relates to user’s current needs.
    – Cognitive: Depends on human perception and behavior.
    – Dynamic: Changes over time.
Relevance as a Measure
Relevance is everything!
 How relevant is the document retrieved
    – for the user’s information need.
 Subjective, but one assumes it’s measurable
 Measurable to some extent
    – How often do people agree a document is relevant to a query
        • More often than expected
 How well does it answer the question?
    – Complete answer? Partial?
    – Background Information?
    – Hints for further exploration?
What to Evaluate?
    What can be measured that reflects users’ ability to use
    system? (Cleverdon 66)
     –   Coverage of Information
     –   Form of Presentation
     –   Effort required/Ease of Use
     –   Time and Space Efficiency
     –   Effectiveness
  Recall
     – proportion of relevant material actually retrieved
  Precision
     – proportion of retrieved material actually relevant
  Typically a 5-point scale is used 5=best, 1=worst
Talk Outline
   Introduction: Search and Machine Learned Ranking
   Relevance and evaluation methodologies
   Data collection and metrics
   Quixey – Functional Application Search™
   System Architecture, features, and model training
   Alternative approaches
   Conclusion
Data Collection is a Challenge
   Most search engines do not start with labeled data (relevance judgments)
   Good labeled data is required to perform evaluations and perform learning
   Not practical to hand-label all possibilities for modern large-scale search engines
   Using 3rd party sources such as Mechanical Turk is often very noisy/inconsistent

   Data collection is non-trivial
     – A custom system (specific to the domain) is often required
     – Phrasing of the “questions”, options (including a skip option), UI design and
        judge training are critical to increase the chance of consistency
   Can leverage judgment collection to aid in feature engineering
     – Judges can provide reasons and observations
Relevance/Usefulness/Ranking
 Web Search: topical relevance or aboutness, trustability of source
 Local Search: topical relevance and geographical applicability
 Functional App Search:
   – Task relevance – User must be convinced app results can solve need
   – Finding the “best” apps that address the users task needs
   – Very domain and user specific
 Advertising
   – Performance measure – expected revenue P(click) * revenue(click)
   – Consistency with user-search (showing irrelevant ads hurts brand)
Commonly used Search Metrics
   Early search systems used binary judgments (relevant/not relevant) and evaluated
    based on precision and recall
     – Recall difficult to assess for large sets
   Modern search systems often use DCG or nDCG:
     – Easy to collect and compare large sets of “independent judgments”
           • Independent judgments map easily to MSE minimization learners
     – Relevance is not binary, and depends on the order of results
   Other measures exist
     – Subjective “how did I do”, but these are difficult to use for MLR or compare
     – Pairwise comparison – measure number of out-of order pairs
           • Lots of recent research on pairwise based MLR
           • Most companies use “independent judgments”
Metrics for Web Search
 Existing metrics limited such as Precision and Recall
   –   Not always clear-cut binary decision: relevant vs. not relevant
   –   Not position sensitive:
       p: relevant, n: not relevant
          ranking 1: p n p n n
          ranking 2: n n n p p

 How do you measure recall over the whole web?
   –   How many of the potentially billions results will get looked at? Which ones actually need to be good?

 Normalized Discounted Cumulated Gain (NDCG)
   –   K. Jaervelin and J. Kekaelaeinen (TOIS 2002)
   –   Gain: relevance of a document is no longer binary
   –   Sensitive to the position of highest rated documents
         • Log-discounting of gains according to the positions
   –   Normalize the DCG with the “ideal set” DCG (NDCG)
Cumulative Gain
  With graded relevance                               relevance
                                                         (gain)
                                          n    doc #               CG n
   judgments, we can compute              1    588        1.0      1.0
                                          2    589        0.6      1.6
   the gain at each rank.                 3    576        0.0      1.6
  Cumulative Gain at rank n:             4
                                          5
                                               590
                                               986
                                                          0.8
                                                          0.0
                                                                   2.4
                                                                   2.4
                                          6    592        1.0      3.4
                                          7    984        0.0      3.4
                                          8    988        0.0      3.4
                                          9    578        0.0      3.4
                                          10   985        0.0      3.4
    (Where reli is the graded relevance   11   103        0.0      3.4
    of the document at position i)        12   591        0.0      3.4
                                          13   772        0.2      3.6
                                          14   990        0.0      3.6

                                                            28
Discounting Based on Position
                                               rel
 Users care more about high-   n    doc #   (gain)   CG n   logn   DCG n
                                1    588      1.0     1.0      -    1.00
  ranked documents, so we       2    589      0.6     1.6    1.00   1.60
                                3    576      0.0     1.6    1.58   1.60
  discount results by           4    590      0.8     2.4    2.00   2.00
  1/log2(rank)                  5    986      0.0     2.4    2.32   2.00
                                6    592      1.0     3.4    2.58   2.39
                                7    984      0.0     3.4    2.81   2.39
                                8    988      0.0     3.4    3.00   2.39
 Discounted Cumulative Gain:   9    578      0.0     3.4    3.17   2.39
                                10   985      0.0     3.4    3.32   2.39
                                11   103      0.0     3.4    3.46   2.39
                                12   591      0.0     3.4    3.58   2.39
                                13   772      0.2     3.6    3.70   2.44
                                14   990      0.0     3.6    3.81   2.44

                                                       29
Normalized Discounted Cumulative Gain
(NDCG)
   To compare DCGs, normalize values so that a ideal ranking would have a
    Normalized DCG of 1.0
   Ideal ranking:
                rel                                         rel
              (gain)                         n    doc #   (gain)   CG n   logn   IDCG n
    n doc #            CG n   logn   DCG n
    1  588     1.0     1.0    0.00   1.00    1    588      1.0     1.0    0.00   1.00
    2  589     0.6     1.6    1.00   1.60    2    592      1.0     2.0    1.00   2.00
    3  576     0.0     1.6    1.58   1.60    3    590      0.8     2.8    1.58   2.50
    4  590     0.8     2.4    2.00   2.00    4    589      0.6     3.4    2.00   2.80
    5  986     0.0     2.4    2.32   2.00    5    772      0.2     3.6    2.32   2.89
    6  592     1.0     3.4    2.58   2.39    6    576      0.0     3.6    2.58   2.89
    7  984     0.0     3.4    2.81   2.39    7    986      0.0     3.6    2.81   2.89
    8  988     0.0     3.4    3.00   2.39    8    984      0.0     3.6    3.00   2.89
    9  578     0.0     3.4    3.17   2.39    9    988      0.0     3.6    3.17   2.89
    10 985     0.0     3.4    3.32   2.39    10   578      0.0     3.6    3.32   2.89
    11 103     0.0     3.4    3.46   2.39    11   985      0.0     3.6    3.46   2.89
    12 591     0.0     3.4    3.58   2.39    12   103      0.0     3.6    3.58   2.89
    13 772     0.2     3.6    3.70   2.44    13   591      0.0     3.6    3.70   2.89
    14 990     0.0     3.6    3.81   2.44    14   990      0.0     3.6    3.81   2.89

                                                                          30
Normalized Discounted Cumulative Gain
(NDCG)
                                                 rel
                                  n    doc #   (gain)   DCG n IDCG n NDCG n
 Normalize by DCG of the ideal
                                  1    588      1.0     1.00    1.00   1.00
  ranking:                        2    589      0.6     1.60    2.00   0.80
                                  3    576      0.0     1.60    2.50   0.64
                                  4    590      0.8     2.00    2.80   0.71
                                  5    986      0.0     2.00    2.89   0.69
                                  6    592      1.0     2.39    2.89   0.83
                                  7    984      0.0     2.39    2.89   0.83
 NDCG ≤ 1 at all ranks           8    988      0.0     2.39    2.89   0.83
 NDCG is comparable across       9    578      0.0     2.39    2.89   0.83
                                  10   985      0.0     2.39    2.89   0.83
  different queries               11   103      0.0     2.39    2.89   0.83
                                  12   591      0.0     2.39    2.89   0.83
                                  13   772      0.2     2.44    2.89   0.84
                                  14   990      0.0     2.44    2.89   0.84

                                                           31
Machine Learning Uses in Commercial SE

   Query parsing

   SPAM Classification
   Result Categorization
   Behavioral Categories

   Search engine results ranking
6 Steps to MLR in Practice
                                                 6
                                                        Deploy System in the wild (and
                                                                    test)
                                   5
                                         Interpret and Evaluate discovered
                                                     knowledge
                    4
                             Modeling: Extract Patterns/Models
            3
                        Feature Engineering
     2
             Collect requirements, and Data
1
     Understand the domain and                   Systems Modeling is inherently
         Define problems
                                                       interactive and iterative
In Practice
 QPS, Deploy model
 Imbalanced data
 Relevance changes over time; non-stationary behavior
 Speed, accuracy, (SVMs,)
 Practical : Grid search, 8-16 nodes, 500 trees
 million of records, Interactions
Variable selection: 1000  100s variables, add random variables
 ~6 weeks cycle
 Training time is days; lab evaluation is weeks; live AB testing
 Why Treenet? No missing values, Categorical variables
MLR – Typical Approach by Companies
1. Define goals and “specific problem”
2. Collect human judged training data:
   – Given a large set of <query, result> tuples
       • Judges rate “relevance” on a 1 to 5 scale (5=“perfect”,
         1=“worst”)
3. Generate training data from the provided <query, result> tuples
   – <q,r>  Features, Input to model is: <F,judgment>
4. Train model typically minimize MSE (Mean Squared Error)
5. Lab evaluation using DCG-type metrics
6. Deploy model in a test system and evaluate
MLR Training Data
1.     Collect human judged training data:
       Given a large set of <query, result> tuples
            Judges rate “relevance” on a 1 to 5 scale (5=“perfect”, 1=“worst”)
2.     Featurize the training data from the provided <query, result> tuples
       <q,r>  Features, Input to model is: <F,judgment>

     InstanceAttr       x0        x1        x2        …         xn       Label
 <query1, Doc1>          1         3         0         ..         7         4
 <query1, Doc2>          1                                                  5
          …              …         …         …         …         …         …
 <queryn, Docn>          1         0         4         ...        8         3
The Evaluation Disconnect
   Evaluation in a supervised learner tries to minimize MSE of the targets
     – for each tuple Fi,xi the learner predicts a target yi
           • Error is f(yi – xi) – typically (yi – xi) ^2
           • Optimum is some function of the “errors” – i.e. try minimize total error
   Evaluation of the deployed model is different evaluation of the learner - typically
    DCG or nDCG
   Individual result error calculation is different from error based on result ordering
     – A small error between predicted target for a result could have substantial
        impact on result ordering – likewise, the “best result ordering” might not
        exactly match the predicted targets for any results
     – An affine transform of the targets produces no change to DCG, but large
        change to calculated MSE
From Grep to Machine Learnt Ranking
Relative Performance




                                                                                   ??
                                                                             Personalization,
      (e.g., DCG)




                                                                                  Social
                                                         Machine Learning,
                                                         Behavioral Data
                                       Graph-Features,
                                      Language Models
                       Boolean,VSM,
                          TF_IDF
                         Pre-1990s         1990s             2000s              2010s
Real World MLR Systems
   SearchMe was a visual/media search engine – about 3 Billion pages in index, and
    hundreds of unique features used to predict the score (and ultimately rank results).
    Results could be video, audio, images, or regular web pages.
     – The goal was for a given input query, return the best ordering of relevant results –
         in an immersive UI (mixing different results types simultaneously)
   Quixey – Functional App Search™ - over 1M apps, many sources of data for each app
    (multiple stores, reviews, blog sites, etc…) – goal is given a “functional query” i.e. “a
    good offline san diego map for iphone” or “kids games for android”– find the most
    relevant apps (ranked properly)
     – Dozens of sources of data for each app, many potential features used to:
           • Predict “quality”, “text relevance” and other meta-features
           • Calculate a meaningful score used to make decisions by partners
           • Rank-order and raw score matter (important to know “how good” an app is)
   Local Search (Telenav, YellowPages)
Talk Outline
   Introduction: Search and Machine Learned Ranking
   Relevance and evaluation methodologies
   Data collection and metrics
   Quixey – Functional Application Search™
   System Architecture, features, and model training
   Alternative approaches
   Conclusion
Quixey: What is an App?
   An app is a piece of computer software designed to help a user perform specific
    tasks.
     – Contrast with systems software and middleware

   Apps were originally intended for productivity
     – (email, calendar and contact databases), but consumer and business demand
        has caused rapid expansion into other areas such as games, factory
        automation, GPS and location-based services, banking, order-tracking, and
        ticket purchases
   Run on various devices (phones, tablets, game consoles, cars)
My house is awash with platforms
My car...




            NPR programs such as Car Talk are available 24/7 on the NPR
            News app for Ford SYNC
My life...
©
Own “The Millionaires App" for $1,000
    .
Law Students App
   ..
Apps for Pets
   ..
Pablo Picatso!
   ..
50 Best iPhone Apps 2011 [Time]
Games                On the Go         Lifestyle    Music &             Entertainment      Social
Angry Birds          Kayak             Amazon       Photography         Netflix            Facebook
Scrabble             Yelp              Epicurious   Mog                 IMDb               Twitter
Plants v. Zombies    Word Lens         Mixology     Pandora             ESPN Scorecenter   Google
Doodle Jump          Weather Channel   Paypal       SoundHound          Instapaper         AIM
Fruit Ninja          OpenTable         Shop Savvy   Bloom               Kindle             Skype
Cut the Rope         Wikipedia         Mint         Camera+             PulseNews          Foursquare
Pictureka            Hopstop           WebMD        Photoshop Express                      Bump
Wurdle               AroundMe          Lose It!     Hipstamatic
GeoDefense           Google Earth      Springpad    Instagram
Swarm                Zipcar                         ColorSplash




                    [http://www.time.com/time/specials/packages/completelist/0,29569,204
                    4480,00.html#ixzz1s1pAMNWM]
©
Examples of Functional Search™




©
App World: Integrating Multi-Data Sources


 A1    App Store                                                  A1
                                               App Catalog        A5
 A2        1
 A3                                                               A7

                                                    Blogs
 A2
 A4
       App Store
           2
                                 ?            blah blah Angry Birds
 A5
                                     ?
  A5     App Store                           App Review Site
  A7         3                               blah blah Learn Spanish
  A8                               App
                                 Developer
                     Developer
                     Homepage
                                                …
Talk Outline
   Introduction: Search and Machine Learned Ranking
   Relevance and evaluation methodologies
   Data collection and metrics
   Quixey – Functional Application Search™
   System Architecture, features, and model training
   Alternative approaches
   Conclusion
Search Architecture (Online)
            Query
query                       DBQ = data storage queries
          Processing

                                                                      Offline
                                          Indexes                   Processing
                                  Data
                                          Feature                    and Data
                                storage    Data                      Building
                        ML
                                                                 Simple
                       Models                                    Scoring
                                                              (set reducer)


Shown     Result           Result                   Consideration                 Feature
Results   Sorting          Scoring                      Set                      Generation
Architecture Details
 Online Flow:
 1. Given a “query” generate Query-specific features, Fq
 2. Using Fq generate appropriate “database queries”
 3. Cheaply pare down initial possible results
 4. Obtain result features Fr for remaining consideration set
 5. Generate query-result features Fqr for remaining consideration set
 6. Given all features score each result (assuming independent scoring)
 7. Present and organize the “best results” (not nesc. linearized by score)
Examples of Possible Features
   Query features:
     – popularity/frequency of query
     – number of words in query, individual POS tags per term/token
     – collection term-frequency information (per term/token)
     – Geo-location of user
   Result features
     – (web) – in-links/page-rank, anchortext match (might be processed with query)
     – (app) – download rate, app-popularity, platform(s), star-rating(s), review-text
     – (app) – ML-Quality score, etc..
   Query-result features
     – BM-25 (per text-block)
     – Frequency in specific sections – lexical similarity query to title
     – etc…
Features Are Key
   Typically MLR systems use both textual and non-textual features:
          • What makes one app better than another?
          • Text-match alone insufficient
          • Popularity alone insufficient
   No single feature or simple combination is sufficient
   At both SearchMe and Quixey we built learned “meta-features” (next slide)

                                           non-title freq of
query: Games          Title Text Match        "game"           App Popularity   How good for query
Angry Birds           low                high                  Very high        very high
Sudoku (genina.com)   low                low                   high             high
PacMan                low                high                  high             high
Cave Shooter          low/medium         medium                low              medium
Stupid Maze Game      very high          medium                very low         low
Features Are Key: Learned Meta-Features
   Meta-features can capture multiple simple features into fewer “super-features”

   SearchMe: SpamScore, SiteAuthority, Category-related
   Quixey: App-Quality, TextMatch (as distinct from overall-relevance)

   SpamScore and App-Quality are complex learned meta-features
     – Potentially hundreds of “smaller features” feed into simpler model
     – SpamScore considered – average-pageRank, num-ads, distinct
       concepts, several language-related features
     – App-Quality is learned (TreeNet) – designed to be resistant to gaming
         • An app developer might pay people to give high-ratings
         • Has a well defined meaning
Idea of Metafeatures (Example)
   In this case – each Metafeature is independently solved on different training data

                          Final Model              Many data points (expensive)
                             …                     Many complex trees
                                                   Judgments prone to human errors
        F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
                              VS
                                                   Explicit human decided metafeatures
                          Final Model              produces simpler, faster models
                    MF1      MF2 MF3               requires fewer total training points
                                 …                 Humans can define metafeatures to
        F1 F2 F3 F4 F5 F6 F7 F8 F9 F10             minimize human errors, and possibly
                                                   use different targets
Data and Feature Engineering are Key!
   Selection of “good” query/result pairs for labeling, and good metafeatures
     – Should cover various areas of the sub-space (i.e. popular and rare queries)
     – Be sure to only pick examples which “can be learned” and are representative
          • Misspellings are a bad choice if no spell-corrector
          • “exceptions” - i.e. special cases (i.e. Spanish results) for an English engine
             are bad and should be avoided unless features can capture this
     – Distribution is important
          • Bias the data to focus on business goals
               – If the goal is be the best for “long queries” have more “long queries”
   Features are critical – must be able to capture the variations (good metafeatures)
   Feature engineering is probably the single most important (and most difficult)
    aspect of MLR
Applying TreeNet for MLR
    Starting with a set of query, result pairs obtain human judgments [1-5], and features
      – 5=perfect, 1=worst (maps to target [0-1]
Query, Result, Judgment
                                Query, Result, Features
q1,r1, 2
                                q1,r1, f1,1, f1,2,f1,3,…,f1,n                  Candidate Models:
q1,r2, 5
                                q1,r2, f2,1, f2,2,f2,3,…,f2,n      TreeNet     M1, M2, M3, ….
q1,r3, 2
                                …
q2,r1, 4
…
               Candidate Model: M

                                    Results from Test Queries
      Test
                    Search              q1  r1,1 r1,2 …          Human           DCG
    Queries
                    Engine              q2  r2,1 r2,2 …        Judgments       Calculation
    (q1,…qn)
                                                …
Talk Outline
   Introduction: Search and Machine Learned Ranking
   Relevance and evaluation methodologies
   Data collection and metrics
   Quixey – Functional Application Search™
   System Architecture, features, and model training
   Alternative approaches
   Conclusion
Choosing the Best Model - Disconnect
   TreeNet uses a mean-squared error minimization
     – The “best” model is the one with the lowest MSE where error is:
         • abs(target – predicted_score)
     – Each result is independent

   DCG minimizes rank-ordering error
     – The ranking is query-dependent

   Might require evaluating several TreeNet models before a real DCG improvement
     – Try new features,
     – TreeNet options (learn rate, max-trees), change splits of data
     – Collect more/better data (clean errors), consider active learning
Assumptions Made (Are there choices)
   MSE is used because the input data is independent judgment pairs
   Assumptions of consistency over time and between users (stationarity of
    judgments)
     – Is Angry Birds v1 a perfect score for “popular game” in 10 years?
     – Directions need to be very clear to ensure user consistency
          • Independent model assumes all users are consistent with each other
   Collect judgments in a different form:
     – Pairwise comparisons <q1,r1> is better than <q1,r2>, etc…
     – Evaluate a “set” of results
     – Use a different scale for judgments which is more granular
     – Full-ordering (lists)
Other Ways to do MLR
   Changing data collection:
     – Use inferred as opposed to direct data
          • Click/user behavior to infer relevance targets
     – From independent judgments to pairwise or listwise
   Pairwise SVM:
     –   R. Herbrich, T. Graepel, K. Obermayer. “Support Vector Learning for Ordinal Regression.” In
         Proceedings of ICANN 1999.
     –   T. Joachims, “A Support Vector Method for Multivariate Performance Measures.” In Proceedings of
         ICML 2005. (http://www.cs.cornell.edu/People/tj/svm_light/svm_perf.html)
   Listwise learning
     – LambdaRank, Chris Burghes et al, 2007
     – LambdaMART, Qiang Wu, Chris J.C. Burges, Krysta M. Svore and Jianfeng Gao,
         2008
Talk Outline
   Introduction: Search and Machine Learned Ranking
   Relevance and evaluation methodologies
   Data collection and metrics
   Quixey – Functional Application Search™
   System Architecture, features, and model training
   Alternative approaches
   Conclusion
Conclusion
   Machine Learning is very important to Search
     – Metafeatures reduce model complexity and lower costs
          • Divide and conquer (parallel development)
     – MLR – is real, and is just one part of ML in search
   Major challenges include data collection and feature engineering
     – Must pay for data – non-trivial, but have a say in what you collect
     – Features must be reasonable for given problem (domain specific)
   Evaluation is critical
     – How to evaluate effectively is important to ensure improvement
     – MSE vs DCG disconnect
   TreeNet can and is an effective tool for Machine Learning in Search
Quixey is hiring



   If you want a cool internship, or a great job, contact us afterwards or e-mail:

   jobs@quixey.com and mention this presentation
Questions
James_DOT_Shanahan_AT_gmail_DOT_com

       Eric_AT_Quixey_DOT_com
3250 Ash St.
Palo Alto, CA 94306

888.707.4441
www.quixey.com

Weitere ähnliche Inhalte

Was ist angesagt?

Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)9866825059
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphTrey Grainger
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
The Next Generation of AI-powered Search
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered SearchTrey Grainger
 
INFORMATION RETRIEVAL ‎AND DISSEMINATION
INFORMATION RETRIEVAL ‎AND DISSEMINATIONINFORMATION RETRIEVAL ‎AND DISSEMINATION
INFORMATION RETRIEVAL ‎AND DISSEMINATIONLibcorpio
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrievalKU Leuven
 
Tovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio CostantiniTovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio Costantinimaxfalc
 
Konsep Dasar Information Retrieval - Edi faizal
Konsep Dasar Information Retrieval - Edi faizal Konsep Dasar Information Retrieval - Edi faizal
Konsep Dasar Information Retrieval - Edi faizal EdiFaizal2
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...GarethKnight
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engines0P5a41b
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
Research on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesResearch on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesKausar Mukadam
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData ManagementUlrike Wittig
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378nitttin
 

Was ist angesagt? (20)

Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
The Next Generation of AI-powered Search
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered Search
 
INFORMATION RETRIEVAL ‎AND DISSEMINATION
INFORMATION RETRIEVAL ‎AND DISSEMINATIONINFORMATION RETRIEVAL ‎AND DISSEMINATION
INFORMATION RETRIEVAL ‎AND DISSEMINATION
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Text Analytics - JCC2014 Kimelfeld
Text Analytics - JCC2014 KimelfeldText Analytics - JCC2014 Kimelfeld
Text Analytics - JCC2014 Kimelfeld
 
Tovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio CostantiniTovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio Costantini
 
Konsep Dasar Information Retrieval - Edi faizal
Konsep Dasar Information Retrieval - Edi faizal Konsep Dasar Information Retrieval - Edi faizal
Konsep Dasar Information Retrieval - Edi faizal
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Segmentation
SegmentationSegmentation
Segmentation
 
Research on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesResearch on ontology based information retrieval techniques
Research on ontology based information retrieval techniques
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378
 

Ähnlich wie Machine Learned Relevance at A Large Scale Search Engine

Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Building Surveys in Qualtrics for Efficient Analytics
Building Surveys in Qualtrics for Efficient AnalyticsBuilding Surveys in Qualtrics for Efficient Analytics
Building Surveys in Qualtrics for Efficient AnalyticsShalin Hai-Jew
 
Query formulation process
Query formulation processQuery formulation process
Query formulation processmalathimurugan
 
AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016Manjula Ambur
 
Philips john huffman
Philips john huffmanPhilips john huffman
Philips john huffmanBigDataExpo
 
information retrieval in artificial intelligence
information retrieval in artificial intelligenceinformation retrieval in artificial intelligence
information retrieval in artificial intelligencePriyadharshiniG41
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval ssilambu111
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise SearchFindwise
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Jean Brenda
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologiesenterprisesearchmeetup
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Marianne Sweeny
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEbutest
 

Ähnlich wie Machine Learned Relevance at A Large Scale Search Engine (20)

Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Introduction
IntroductionIntroduction
Introduction
 
Building Surveys in Qualtrics for Efficient Analytics
Building Surveys in Qualtrics for Efficient AnalyticsBuilding Surveys in Qualtrics for Efficient Analytics
Building Surveys in Qualtrics for Efficient Analytics
 
Query formulation process
Query formulation processQuery formulation process
Query formulation process
 
AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016AIAA Conference - Big Data Session_ Final - Jan 2016
AIAA Conference - Big Data Session_ Final - Jan 2016
 
Philips john huffman
Philips john huffmanPhilips john huffman
Philips john huffman
 
information retrieval in artificial intelligence
information retrieval in artificial intelligenceinformation retrieval in artificial intelligence
information retrieval in artificial intelligence
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
White Manipulating Metadata to Enhance Access
White Manipulating Metadata to Enhance AccessWhite Manipulating Metadata to Enhance Access
White Manipulating Metadata to Enhance Access
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AE
 

Mehr von Salford Systems

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsSalford Systems
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Salford Systems
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Salford Systems
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningSalford Systems
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerSalford Systems
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To RememberSalford Systems
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetSalford Systems
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to marsSalford Systems
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher EducationSalford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingSalford Systems
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hivSalford Systems
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning CombinationSalford Systems
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSalford Systems
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998Salford Systems
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPMSalford Systems
 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7Salford Systems
 

Mehr von Salford Systems (20)

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher Education
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPM
 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7
 

Kürzlich hochgeladen

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Kürzlich hochgeladen (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Machine Learned Relevance at A Large Scale Search Engine

  • 1. Machine Learned Relevance at a Large Scale Search Engine Salford Analytics and Data Mining Conference 2012
  • 2. Machine Learned Relevance at a Large Scale Search Engine Salford Data Mining – May 25, 2012 Presented by: Dr. Eric Glover – eric@quixey.com Dr. James Shanahan – james.shanahan@gmail.com
  • 3. About the Authors  James G. Shanahan - PhD in Machine Learning University of Bristol, UK – 20+ years in the field AI and information science – Principal and Founder, Boutique Data Consultancy • Clients include: Adobe, Digg, SearchMe, AT&T, Ancestry SkyGrid, Telenav – Affiliated with University of California Santa Cruz (UCSC) – Adviser to Quixey – Previously • Chief Scientist, Turn Inc. (A CPX ad network, DSP) • Principal Scientist, Clairvoyance Corp (CMU spinoff) • Co-founder of Document Souls (task centric info access system) • Research Scientist, Xerox Research (XRCE) • AI Research Engineer, Mitsubishi Group
  • 4. About the Authors  Eric Glover - PhD CSE (AI) From U of M in 2001 – Fellow at Quixey, where among other things, he focuses on the architecture and processes related to applied machine learning for relevance and evaluation methodologies – More than a dozen years of Search Engine experience including: NEC Labs, Ask Jeeves, SearchMe, and own startup. – Multiple relevant publications ranging from classification to automatically discovering topical hierarchies – Dissertation studied Personalizing Web Search through incorporation of user- preferences and machine learning – More than a dozen filed patents
  • 5. Talk Outline  Introduction: Search and Machine Learned Ranking  Relevance and evaluation methodologies  Data collection and metrics  Quixey – Functional Application Search™  System Architecture, features, and model training  Alternative approaches  Conclusion
  • 7. Search Engine: SearchMe Search engine: lets you see and hear what you're searching for
  • 8. 6 Steps to MLR in Practice 6 Deploy System in the wild (and test) 5 Interpret and Evaluate discovered knowledge 4 Modeling: Extract Patterns/Models 3 Feature Engineering 2 Collect requirements, and Data 1 Understand the domain and Systems Modeling is inherently Define problems interactive and iterative.
  • 9. How is ML for Search Unique  Many Machine Learning (ML) systems start with source data – Goal is to analyze, model, predict – Features are often pre-defined, in a well-studied area  MLR for Search Engines is different from many other ML applications: – Does not start with labeled data • Need to pay judges to provide labels – Opportunity to invent new features (Feature Engineering) – Often require real-time operation • Processing tens of billions of possible results, microseconds matter – Require domain-specific metrics for evaluation
  • 10. If we can’t measure “it”, then…  ….we should think twice about doing “it”  Measurement has enabled us to compare systems and also to machine learn them  Search is about measurement, measurement and measurement
  • 11. Improve in a Measured Way
  • 12. From Information Needs Queries  The idea of using computers to search for relevant pieces of information was popularized in the article As We May Think by Vannevar Bush in 1945  An information need is an individual or group's desire to locate and obtain information to satisfy a conscious or unconscious need.  Within the context of web search information needs are expressed as textual queries (possibly with constraints)  E.g., “Analytics Data Mining Conference” program  Metric: “Relevance” as a measure of how well is a system performing
  • 13. Relevance is a Huge Challenge  Relevance typically denotes how well a retrieved object (document) or set of objects meets the information need of the user.  Relevance is often viewed as multifaceted. – A core facet of relevance relates to topical relevance or aboutness, • i.e., to what extent the topic of a result matches the topic of the query or information need. • Another facet of relevance is based on user perception, and sometimes referred to as user relevance; it encompasses other concerns of the user such as timeliness, authority or novelty of the result  In local search type queries, yet another facet of relevance that comes into play is geographical aboutness, – i.e., to what extent the location of a result, a business listing, matches the location of the query or information need
  • 14. From Cranfield to TREC  Text REtrieval Conference/Competition – http://trec.nist.gov/ – Run by NIST (National Institute of Standards & Technology)  Started in 1992  Collections: > 6 Gigabytes (5 CRDOMs), >1.5 Million Docs – Newswire & full text news (AP, WSJ, Ziff, FT) – Government documents (federal register, Congressional Record) – Radio Transcripts (FBIS) – Web “subsets” – Tweets
  • 15. The TREC Benchmark  TREC: Text REtrieval Conference (http://trec.nist.gov/) Originated from the TIPSTER program sponsored by Defense Advanced Research Projects Agency (DARPA).  Became an annual conference in 1992, co-sponsored by National Institute of Standards and Technology (NIST) and DARPA.  Participants are given parts of a standard set of documents and TOPICS (from which queries have to be derived) in different stages for training and testing.  Participants submit the P/R values for the final document and query corpus and present their results at the conference. 15
  • 16. User’s Information Need Collections Pre-process text input Parse Query Index Match Query Reformulation
  • 17. User’s Information Need Collections Pre-process text input Parse Query Index Rank or Match Evaluation Query Reformulation
  • 18. Talk Outline  Introduction: Search and Machine Learned Ranking  Relevance and evaluation methodologies  Data collection and metrics  Quixey – Functional Application Search™  System Architecture, features, and model training  Alternative approaches  Conclusion
  • 19. Difficulties in Evaluating IR Systems  Effectiveness is related to the relevancy of the set of returned items.  Relevancy is not typically binary but continuous.  Even if relevancy is binary, it can be a difficult judgment to make.  Relevancy, from a human standpoint, is: – Subjective: Depends upon a specific user’s judgment. – Situational: Relates to user’s current needs. – Cognitive: Depends on human perception and behavior. – Dynamic: Changes over time.
  • 20. Relevance as a Measure Relevance is everything!  How relevant is the document retrieved – for the user’s information need.  Subjective, but one assumes it’s measurable  Measurable to some extent – How often do people agree a document is relevant to a query • More often than expected  How well does it answer the question? – Complete answer? Partial? – Background Information? – Hints for further exploration?
  • 21. What to Evaluate? What can be measured that reflects users’ ability to use system? (Cleverdon 66) – Coverage of Information – Form of Presentation – Effort required/Ease of Use – Time and Space Efficiency – Effectiveness  Recall – proportion of relevant material actually retrieved  Precision – proportion of retrieved material actually relevant  Typically a 5-point scale is used 5=best, 1=worst
  • 22. Talk Outline  Introduction: Search and Machine Learned Ranking  Relevance and evaluation methodologies  Data collection and metrics  Quixey – Functional Application Search™  System Architecture, features, and model training  Alternative approaches  Conclusion
  • 23. Data Collection is a Challenge  Most search engines do not start with labeled data (relevance judgments)  Good labeled data is required to perform evaluations and perform learning  Not practical to hand-label all possibilities for modern large-scale search engines  Using 3rd party sources such as Mechanical Turk is often very noisy/inconsistent  Data collection is non-trivial – A custom system (specific to the domain) is often required – Phrasing of the “questions”, options (including a skip option), UI design and judge training are critical to increase the chance of consistency  Can leverage judgment collection to aid in feature engineering – Judges can provide reasons and observations
  • 24.
  • 25. Relevance/Usefulness/Ranking  Web Search: topical relevance or aboutness, trustability of source  Local Search: topical relevance and geographical applicability  Functional App Search: – Task relevance – User must be convinced app results can solve need – Finding the “best” apps that address the users task needs – Very domain and user specific  Advertising – Performance measure – expected revenue P(click) * revenue(click) – Consistency with user-search (showing irrelevant ads hurts brand)
  • 26. Commonly used Search Metrics  Early search systems used binary judgments (relevant/not relevant) and evaluated based on precision and recall – Recall difficult to assess for large sets  Modern search systems often use DCG or nDCG: – Easy to collect and compare large sets of “independent judgments” • Independent judgments map easily to MSE minimization learners – Relevance is not binary, and depends on the order of results  Other measures exist – Subjective “how did I do”, but these are difficult to use for MLR or compare – Pairwise comparison – measure number of out-of order pairs • Lots of recent research on pairwise based MLR • Most companies use “independent judgments”
  • 27. Metrics for Web Search  Existing metrics limited such as Precision and Recall – Not always clear-cut binary decision: relevant vs. not relevant – Not position sensitive: p: relevant, n: not relevant ranking 1: p n p n n ranking 2: n n n p p  How do you measure recall over the whole web? – How many of the potentially billions results will get looked at? Which ones actually need to be good?  Normalized Discounted Cumulated Gain (NDCG) – K. Jaervelin and J. Kekaelaeinen (TOIS 2002) – Gain: relevance of a document is no longer binary – Sensitive to the position of highest rated documents • Log-discounting of gains according to the positions – Normalize the DCG with the “ideal set” DCG (NDCG)
  • 28. Cumulative Gain  With graded relevance relevance (gain) n doc # CG n judgments, we can compute 1 588 1.0 1.0 2 589 0.6 1.6 the gain at each rank. 3 576 0.0 1.6  Cumulative Gain at rank n: 4 5 590 986 0.8 0.0 2.4 2.4 6 592 1.0 3.4 7 984 0.0 3.4 8 988 0.0 3.4 9 578 0.0 3.4 10 985 0.0 3.4 (Where reli is the graded relevance 11 103 0.0 3.4 of the document at position i) 12 591 0.0 3.4 13 772 0.2 3.6 14 990 0.0 3.6 28
  • 29. Discounting Based on Position rel  Users care more about high- n doc # (gain) CG n logn DCG n 1 588 1.0 1.0 - 1.00 ranked documents, so we 2 589 0.6 1.6 1.00 1.60 3 576 0.0 1.6 1.58 1.60 discount results by 4 590 0.8 2.4 2.00 2.00 1/log2(rank) 5 986 0.0 2.4 2.32 2.00 6 592 1.0 3.4 2.58 2.39 7 984 0.0 3.4 2.81 2.39 8 988 0.0 3.4 3.00 2.39  Discounted Cumulative Gain: 9 578 0.0 3.4 3.17 2.39 10 985 0.0 3.4 3.32 2.39 11 103 0.0 3.4 3.46 2.39 12 591 0.0 3.4 3.58 2.39 13 772 0.2 3.6 3.70 2.44 14 990 0.0 3.6 3.81 2.44 29
  • 30. Normalized Discounted Cumulative Gain (NDCG)  To compare DCGs, normalize values so that a ideal ranking would have a Normalized DCG of 1.0  Ideal ranking: rel rel (gain) n doc # (gain) CG n logn IDCG n n doc # CG n logn DCG n 1 588 1.0 1.0 0.00 1.00 1 588 1.0 1.0 0.00 1.00 2 589 0.6 1.6 1.00 1.60 2 592 1.0 2.0 1.00 2.00 3 576 0.0 1.6 1.58 1.60 3 590 0.8 2.8 1.58 2.50 4 590 0.8 2.4 2.00 2.00 4 589 0.6 3.4 2.00 2.80 5 986 0.0 2.4 2.32 2.00 5 772 0.2 3.6 2.32 2.89 6 592 1.0 3.4 2.58 2.39 6 576 0.0 3.6 2.58 2.89 7 984 0.0 3.4 2.81 2.39 7 986 0.0 3.6 2.81 2.89 8 988 0.0 3.4 3.00 2.39 8 984 0.0 3.6 3.00 2.89 9 578 0.0 3.4 3.17 2.39 9 988 0.0 3.6 3.17 2.89 10 985 0.0 3.4 3.32 2.39 10 578 0.0 3.6 3.32 2.89 11 103 0.0 3.4 3.46 2.39 11 985 0.0 3.6 3.46 2.89 12 591 0.0 3.4 3.58 2.39 12 103 0.0 3.6 3.58 2.89 13 772 0.2 3.6 3.70 2.44 13 591 0.0 3.6 3.70 2.89 14 990 0.0 3.6 3.81 2.44 14 990 0.0 3.6 3.81 2.89 30
  • 31. Normalized Discounted Cumulative Gain (NDCG) rel n doc # (gain) DCG n IDCG n NDCG n  Normalize by DCG of the ideal 1 588 1.0 1.00 1.00 1.00 ranking: 2 589 0.6 1.60 2.00 0.80 3 576 0.0 1.60 2.50 0.64 4 590 0.8 2.00 2.80 0.71 5 986 0.0 2.00 2.89 0.69 6 592 1.0 2.39 2.89 0.83 7 984 0.0 2.39 2.89 0.83  NDCG ≤ 1 at all ranks 8 988 0.0 2.39 2.89 0.83  NDCG is comparable across 9 578 0.0 2.39 2.89 0.83 10 985 0.0 2.39 2.89 0.83 different queries 11 103 0.0 2.39 2.89 0.83 12 591 0.0 2.39 2.89 0.83 13 772 0.2 2.44 2.89 0.84 14 990 0.0 2.44 2.89 0.84 31
  • 32. Machine Learning Uses in Commercial SE  Query parsing  SPAM Classification  Result Categorization  Behavioral Categories  Search engine results ranking
  • 33. 6 Steps to MLR in Practice 6 Deploy System in the wild (and test) 5 Interpret and Evaluate discovered knowledge 4 Modeling: Extract Patterns/Models 3 Feature Engineering 2 Collect requirements, and Data 1 Understand the domain and Systems Modeling is inherently Define problems interactive and iterative
  • 34. In Practice  QPS, Deploy model  Imbalanced data  Relevance changes over time; non-stationary behavior  Speed, accuracy, (SVMs,)  Practical : Grid search, 8-16 nodes, 500 trees  million of records, Interactions Variable selection: 1000  100s variables, add random variables  ~6 weeks cycle  Training time is days; lab evaluation is weeks; live AB testing  Why Treenet? No missing values, Categorical variables
  • 35. MLR – Typical Approach by Companies 1. Define goals and “specific problem” 2. Collect human judged training data: – Given a large set of <query, result> tuples • Judges rate “relevance” on a 1 to 5 scale (5=“perfect”, 1=“worst”) 3. Generate training data from the provided <query, result> tuples – <q,r>  Features, Input to model is: <F,judgment> 4. Train model typically minimize MSE (Mean Squared Error) 5. Lab evaluation using DCG-type metrics 6. Deploy model in a test system and evaluate
  • 36. MLR Training Data 1. Collect human judged training data: Given a large set of <query, result> tuples Judges rate “relevance” on a 1 to 5 scale (5=“perfect”, 1=“worst”) 2. Featurize the training data from the provided <query, result> tuples <q,r>  Features, Input to model is: <F,judgment> InstanceAttr x0 x1 x2 … xn Label <query1, Doc1> 1 3 0 .. 7 4 <query1, Doc2> 1 5 … … … … … … … <queryn, Docn> 1 0 4 ... 8 3
  • 37. The Evaluation Disconnect  Evaluation in a supervised learner tries to minimize MSE of the targets – for each tuple Fi,xi the learner predicts a target yi • Error is f(yi – xi) – typically (yi – xi) ^2 • Optimum is some function of the “errors” – i.e. try minimize total error  Evaluation of the deployed model is different evaluation of the learner - typically DCG or nDCG  Individual result error calculation is different from error based on result ordering – A small error between predicted target for a result could have substantial impact on result ordering – likewise, the “best result ordering” might not exactly match the predicted targets for any results – An affine transform of the targets produces no change to DCG, but large change to calculated MSE
  • 38. From Grep to Machine Learnt Ranking Relative Performance ?? Personalization, (e.g., DCG) Social Machine Learning, Behavioral Data Graph-Features, Language Models Boolean,VSM, TF_IDF Pre-1990s 1990s 2000s 2010s
  • 39. Real World MLR Systems  SearchMe was a visual/media search engine – about 3 Billion pages in index, and hundreds of unique features used to predict the score (and ultimately rank results). Results could be video, audio, images, or regular web pages. – The goal was for a given input query, return the best ordering of relevant results – in an immersive UI (mixing different results types simultaneously)  Quixey – Functional App Search™ - over 1M apps, many sources of data for each app (multiple stores, reviews, blog sites, etc…) – goal is given a “functional query” i.e. “a good offline san diego map for iphone” or “kids games for android”– find the most relevant apps (ranked properly) – Dozens of sources of data for each app, many potential features used to: • Predict “quality”, “text relevance” and other meta-features • Calculate a meaningful score used to make decisions by partners • Rank-order and raw score matter (important to know “how good” an app is)  Local Search (Telenav, YellowPages)
  • 40. Talk Outline  Introduction: Search and Machine Learned Ranking  Relevance and evaluation methodologies  Data collection and metrics  Quixey – Functional Application Search™  System Architecture, features, and model training  Alternative approaches  Conclusion
  • 41. Quixey: What is an App?  An app is a piece of computer software designed to help a user perform specific tasks. – Contrast with systems software and middleware  Apps were originally intended for productivity – (email, calendar and contact databases), but consumer and business demand has caused rapid expansion into other areas such as games, factory automation, GPS and location-based services, banking, order-tracking, and ticket purchases  Run on various devices (phones, tablets, game consoles, cars)
  • 42. My house is awash with platforms
  • 43. My car... NPR programs such as Car Talk are available 24/7 on the NPR News app for Ford SYNC
  • 45.
  • 46. ©
  • 47. Own “The Millionaires App" for $1,000  .
  • 51. 50 Best iPhone Apps 2011 [Time] Games On the Go Lifestyle Music & Entertainment Social Angry Birds Kayak Amazon Photography Netflix Facebook Scrabble Yelp Epicurious Mog IMDb Twitter Plants v. Zombies Word Lens Mixology Pandora ESPN Scorecenter Google Doodle Jump Weather Channel Paypal SoundHound Instapaper AIM Fruit Ninja OpenTable Shop Savvy Bloom Kindle Skype Cut the Rope Wikipedia Mint Camera+ PulseNews Foursquare Pictureka Hopstop WebMD Photoshop Express Bump Wurdle AroundMe Lose It! Hipstamatic GeoDefense Google Earth Springpad Instagram Swarm Zipcar ColorSplash [http://www.time.com/time/specials/packages/completelist/0,29569,204 4480,00.html#ixzz1s1pAMNWM]
  • 52. ©
  • 53. Examples of Functional Search™ ©
  • 54.
  • 55.
  • 56. App World: Integrating Multi-Data Sources A1 App Store A1 App Catalog A5 A2 1 A3 A7 Blogs A2 A4 App Store 2 ? blah blah Angry Birds A5 ? A5 App Store App Review Site A7 3 blah blah Learn Spanish A8 App Developer Developer Homepage …
  • 57. Talk Outline  Introduction: Search and Machine Learned Ranking  Relevance and evaluation methodologies  Data collection and metrics  Quixey – Functional Application Search™  System Architecture, features, and model training  Alternative approaches  Conclusion
  • 58. Search Architecture (Online) Query query DBQ = data storage queries Processing Offline Indexes Processing Data Feature and Data storage Data Building ML Simple Models Scoring (set reducer) Shown Result Result Consideration Feature Results Sorting Scoring Set Generation
  • 59. Architecture Details Online Flow: 1. Given a “query” generate Query-specific features, Fq 2. Using Fq generate appropriate “database queries” 3. Cheaply pare down initial possible results 4. Obtain result features Fr for remaining consideration set 5. Generate query-result features Fqr for remaining consideration set 6. Given all features score each result (assuming independent scoring) 7. Present and organize the “best results” (not nesc. linearized by score)
  • 60. Examples of Possible Features  Query features: – popularity/frequency of query – number of words in query, individual POS tags per term/token – collection term-frequency information (per term/token) – Geo-location of user  Result features – (web) – in-links/page-rank, anchortext match (might be processed with query) – (app) – download rate, app-popularity, platform(s), star-rating(s), review-text – (app) – ML-Quality score, etc..  Query-result features – BM-25 (per text-block) – Frequency in specific sections – lexical similarity query to title – etc…
  • 61. Features Are Key  Typically MLR systems use both textual and non-textual features: • What makes one app better than another? • Text-match alone insufficient • Popularity alone insufficient  No single feature or simple combination is sufficient  At both SearchMe and Quixey we built learned “meta-features” (next slide) non-title freq of query: Games Title Text Match "game" App Popularity How good for query Angry Birds low high Very high very high Sudoku (genina.com) low low high high PacMan low high high high Cave Shooter low/medium medium low medium Stupid Maze Game very high medium very low low
  • 62. Features Are Key: Learned Meta-Features  Meta-features can capture multiple simple features into fewer “super-features”  SearchMe: SpamScore, SiteAuthority, Category-related  Quixey: App-Quality, TextMatch (as distinct from overall-relevance)  SpamScore and App-Quality are complex learned meta-features – Potentially hundreds of “smaller features” feed into simpler model – SpamScore considered – average-pageRank, num-ads, distinct concepts, several language-related features – App-Quality is learned (TreeNet) – designed to be resistant to gaming • An app developer might pay people to give high-ratings • Has a well defined meaning
  • 63. Idea of Metafeatures (Example)  In this case – each Metafeature is independently solved on different training data Final Model Many data points (expensive) … Many complex trees Judgments prone to human errors F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 VS Explicit human decided metafeatures Final Model produces simpler, faster models MF1 MF2 MF3 requires fewer total training points … Humans can define metafeatures to F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 minimize human errors, and possibly use different targets
  • 64. Data and Feature Engineering are Key!  Selection of “good” query/result pairs for labeling, and good metafeatures – Should cover various areas of the sub-space (i.e. popular and rare queries) – Be sure to only pick examples which “can be learned” and are representative • Misspellings are a bad choice if no spell-corrector • “exceptions” - i.e. special cases (i.e. Spanish results) for an English engine are bad and should be avoided unless features can capture this – Distribution is important • Bias the data to focus on business goals – If the goal is be the best for “long queries” have more “long queries”  Features are critical – must be able to capture the variations (good metafeatures)  Feature engineering is probably the single most important (and most difficult) aspect of MLR
  • 65. Applying TreeNet for MLR  Starting with a set of query, result pairs obtain human judgments [1-5], and features – 5=perfect, 1=worst (maps to target [0-1] Query, Result, Judgment Query, Result, Features q1,r1, 2 q1,r1, f1,1, f1,2,f1,3,…,f1,n Candidate Models: q1,r2, 5 q1,r2, f2,1, f2,2,f2,3,…,f2,n TreeNet M1, M2, M3, …. q1,r3, 2 … q2,r1, 4 … Candidate Model: M Results from Test Queries Test Search q1  r1,1 r1,2 … Human DCG Queries Engine q2  r2,1 r2,2 … Judgments Calculation (q1,…qn) …
  • 66.
  • 67. Talk Outline  Introduction: Search and Machine Learned Ranking  Relevance and evaluation methodologies  Data collection and metrics  Quixey – Functional Application Search™  System Architecture, features, and model training  Alternative approaches  Conclusion
  • 68. Choosing the Best Model - Disconnect  TreeNet uses a mean-squared error minimization – The “best” model is the one with the lowest MSE where error is: • abs(target – predicted_score) – Each result is independent  DCG minimizes rank-ordering error – The ranking is query-dependent  Might require evaluating several TreeNet models before a real DCG improvement – Try new features, – TreeNet options (learn rate, max-trees), change splits of data – Collect more/better data (clean errors), consider active learning
  • 69. Assumptions Made (Are there choices)  MSE is used because the input data is independent judgment pairs  Assumptions of consistency over time and between users (stationarity of judgments) – Is Angry Birds v1 a perfect score for “popular game” in 10 years? – Directions need to be very clear to ensure user consistency • Independent model assumes all users are consistent with each other  Collect judgments in a different form: – Pairwise comparisons <q1,r1> is better than <q1,r2>, etc… – Evaluate a “set” of results – Use a different scale for judgments which is more granular – Full-ordering (lists)
  • 70. Other Ways to do MLR  Changing data collection: – Use inferred as opposed to direct data • Click/user behavior to infer relevance targets – From independent judgments to pairwise or listwise  Pairwise SVM: – R. Herbrich, T. Graepel, K. Obermayer. “Support Vector Learning for Ordinal Regression.” In Proceedings of ICANN 1999. – T. Joachims, “A Support Vector Method for Multivariate Performance Measures.” In Proceedings of ICML 2005. (http://www.cs.cornell.edu/People/tj/svm_light/svm_perf.html)  Listwise learning – LambdaRank, Chris Burghes et al, 2007 – LambdaMART, Qiang Wu, Chris J.C. Burges, Krysta M. Svore and Jianfeng Gao, 2008
  • 71. Talk Outline  Introduction: Search and Machine Learned Ranking  Relevance and evaluation methodologies  Data collection and metrics  Quixey – Functional Application Search™  System Architecture, features, and model training  Alternative approaches  Conclusion
  • 72. Conclusion  Machine Learning is very important to Search – Metafeatures reduce model complexity and lower costs • Divide and conquer (parallel development) – MLR – is real, and is just one part of ML in search  Major challenges include data collection and feature engineering – Must pay for data – non-trivial, but have a say in what you collect – Features must be reasonable for given problem (domain specific)  Evaluation is critical – How to evaluate effectively is important to ensure improvement – MSE vs DCG disconnect  TreeNet can and is an effective tool for Machine Learning in Search
  • 73. Quixey is hiring  If you want a cool internship, or a great job, contact us afterwards or e-mail:  jobs@quixey.com and mention this presentation
  • 75. 3250 Ash St. Palo Alto, CA 94306 888.707.4441 www.quixey.com

Hinweis der Redaktion

  1. Document souls A document-centric way of doing anticipatory information accessBuilt personality based upon type information sources and services that a user would leverage to accomplish a taskDocument would run and forage the selected information sources and then mark itself with found resultsPersonalities could be built by hand or automatically based on bookmarks and forms and query logs.There are many parallels between the problems we were trying to tackle in DS and task centric retrieval. Ask me at the break about what happened to DS.
  2. Putting all this together we End up Federated search engines Integrated media experienceContentUser ExperienceSearchme is probably, the most featured visual search engine. Under the words &quot;It lets you see what you&apos;re searching for&quot; this new search engine uses a dynamic flash interface to manage and integrate all kind of media results. Initially, without typing any query, relevant webs on the news, sports or some others featured categories are displayed to explore them just to test the flexibility of the application.� The same horizontal scroll recently introduced for Apple is used to navigate through different websites. These results are presented in a good resolution to provide a detailed overview of the content. Moreover, the integrated Media-UI allows music or videos to play in-the-SERP, as well as zoom into Images – in real-time, optimized for each type of content.Once a query is entered, the content is quicklydisplayed. Once the page is loaded, the words in the query are highlighted on the result image previews, providing a fast visual overview. Furthermore, subcategories appeared next to the search field to filter the results. For instance, we can see how music, colleges &amp; universities, news, fitness or tickets appear to redefine the query adding the intention of the user into the original search.�Finally the user is allowed to store all results in customised subcategories called stacks. The ease of creating customised folders turns searchme into a good application to store or share any interesting content – of a variety of media types found throughout the web.
  3. QPS, Deploy modelImbalanced dataRelevance changes over time; non-stationary behaviorSpeed, accuracy, (SVMs,)Practical : Grid search, 8-16 nodes, 500 treesmillion of records,InteractionsVariable selection:1000  100s variables, add random variables6weeks cycle, Training time is days; lab evaluation is weeks; live AB testingWhy Treenet? No missing values, Categorical variables
  4. What are we trying to accomplish in search?Empircal sport
  5. Regular websearhTypical local search queries include not only informationabout \\what&quot; the site visitor is searching for (suchas keywords, a business category, or the name of a consumerproduct) but also \\where&quot; information, such as a street address,city name, postal code, or geographic coordinates likelatitude and longitude.
  6. How to label? One by one, pairwise or listwiseOne by one or top-N based labeling?What questions/guidelines?
  7. QPS, Deploy modelImbalanced dataRelevance changes over time; non-stationary behaviorSpeed, accuracy, (SVMs,)Practical : Grid search, 8-16 nodes, 500 treesmillion of records, InteractionsVariable selection: 1000  100s variables, add random variables6weeks cycle, Training time is days; lab evaluation is weeks; live AB testingWhy Treenet? No missing values, Categorical variables
  8. Thanks to public evaluations such as TREC and internal corporate tracking
  9. contrasted with system software and middleware, which manage and integrate a computer&apos;s capabilities, but typically do not directly apply them in the performance of tasks that benefit the user. These Apps platforms enable a Self-serve model where anybody can purchase an app (crowdsourced reviews); no salesforce. Cheap/freeCheap
  10. Interesting partnership – Ford and Spotify’s first integrated in-car entertainment systemLizzie Donnachie  |  20 Sep 2011, 04:01 PM Comments: 0 I tweeted about this a few days ago and thought it warranted a little more than 140 characters allows...A recent hypothetical partnership saw Spotify integrated into Ford&apos;s in-car SYNC platform at San Francisco&apos;s Hackathon event. In the advent of smartphone app growth, this partnership concisely demonstrates the current opportunities for in-car app innovation; enabling users to safely use their apps while driving. In addition, showcasing Ford&apos;s innovative voice activated controls.Both parties obtain a mutual benefit from the collaboration; the strengths of the two systems are equally highlighted, which reinforces the competitive edge they have in their respective industries. Here, Spotify is shown to be versatile with the ability to be adapted in more and more applications. While Ford showcases their pioneering in-car app technology with voice-activated controls and illustrating it with a popular user centric example.Ford is continually searching for new partners through forming their SYNC developer community - Mobile Application Developer Network - that opens the platform for developers to create SYNC enabled applications.
  11. The digital self,The quantified self, Gar WolfFitbitFitbit is a small device to track your physical activity or sleep. You can wear the device all day because it easily clips in your pocket, pants, shirt, bra, or to your wrist when you are sleeping. The data collected is automatically synched online when the device is near the base station. After uploading, you can explore visualizations of your physical activity and sleep quality on the web site. You can also view your data using their new mobile web site. You can also track what you eat, other exercises that you do, and your weight.DigifitThe Digifit ecosystem is a full suite of Apple apps that records heart rate, pace, speed, cadence, and power of your running, cycling and other athletic endeavors. Data can be uploaded to the well established training sites Training Peaks and New Leaf. The ecosystem is is split up into the Digifit™, iCardio™, iRunner™, iBiker™, iSpinner™ and iPower™ apps. To utilize the full functionality of the app you must purchase the Digifit Connect ANT+ dongle and and the purchase of an advanced functionality App.
  12. Free tooIn a world where smartphone users cringe at the thought of paying more than 99 cents for the latest apps, can you imagine paying $1,000 for an iPhone app that, say, helps ease your stuttering? How about paying that much for an app that helps you prepare for the state bar exam?Those are just a sample of the mobile apps that are part of an elite list of software for your iPhone or iPad — the most expensive apps on the iTunes App Store.
  13. In a world where smartphone users cringe at the thought of paying more than 99 cents for the latest apps, can you imagine paying $1,000 for an iPhone app that, say, helps ease your stuttering? How about paying that much for an app that helps you prepare for the state bar exam?Those are just a sample of the mobile apps that are part of an elite list of software for your iPhone or iPad — the most expensive apps on the iTunes App Store.
  14. certify they are High Net Worth Individuals with assets and/or income in excess of £1 million. Members get access to everything you’d expect – and more – from the world’s first luxury lifestyle app, including:HOW MANY PEOPLE Have installed this app?- Be treated like a VIP across our partner venues- Benefit from unique VIP privilege rates with many of our partner services- Receive complimentary room upgrades at luxurious hotels- Take advantage of priority booking at premium restaurants- Receive complimentary amenities at various partner venues- Get priority access to unique events and experiences- Access a concierge directly through the app- Book private yachts, private jets, private islands, and more, directly through the application- Receive invitations to exclusive VIP eveningsDescription Exclusivity, Luxury, Privilege.VIP Black, &apos;The Millionaire&apos;s App&apos;, is the first and only premium lifestyle application for the iPhone. Members receive &apos;VIP Treatment&apos; - personalised attention and heightened experiences across the range of luxury partners.VIP treatment allows members to geo-locate partner venues and receive extra-special experiences through surprise gifts, welcome packages, complimentary room upgrades, exclusive rates, priority access, and other unique privileges. For premium members VIP treatment is available across our global range of luxury partners.Partners include Gordon Ramsay Restaurants, Virgin Limited Edition (Necker Island, The Lodge Verbier, and more.), Firmdale Hotels, and many other premium brands, venues and services around the world. Membership covers all aspects of the luxury lifestyle including butlers, theatres, personal trainers, private jets, a concierge, casinos, personal styling... And much more. *Please note: Black is the premium version, the first ‘Millionaire’s App’. Upon download, prospective members will be required to certify they are High Net Worth Individuals with assets and/or income in excess of £1 million. Upon completion each approved member will be eligible for a personal consultation to explore how iVIP Ltd. can manage their VIP lifestyle.
  15. $1,000
  16. Nate Murray (a friend of Jimi’s)
  17. Current SE does sufficiently tackle this problem Inadequacy of current search systems to solve the increasing user need to discovering apps
  18. Quixey was built from the ground up to provide functional search capability…
  19. Not dealing with webpages but with apps that are available for different platforms/ different versions
  20. App titles words not aligned with queryEric takes over here