SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
Bay Area Search


                   Wednesday
                  July 27, 2011
Agenda

Ÿ  6:30 Eat & Greet - Free Food & Beer
Ÿ  7:00 Speaker #1 – Brian Johnson
Ÿ  7:45 Speaker #2 – Ravi Jammalamadaka

Ÿ  Plan on 2 fabulous 45 minute presentations by excellent local search experts.
    Please suggest speakers or topics you would like to hear.
Ÿ  Great speakers, good food, fine beer, and everyone's favorite search term - Free,
    Free, Free:-)

Ÿ  Event will be held at the eBay campus just off 17/880 @ Hamilton in the main
    Community building. Look for lobby/flagpole.
Ÿ  4th Wednesday of every month
Ÿ  http://www.meetup.com/Bay-Area-Search/
How Can I Help?




Ÿ Speakers
Ÿ  Feedback
Ÿ  Organizers
Ÿ  Videographers
Brian Johnson

Ÿ  Brian is the Director of Engineering for Query Services at eBay. He has held this
    role since January of 2011. Prior to that he managed the engineering teams for
    Query Understanding (metrics and crowdsourced human judgment), classification,
    data publishing, and browsing. Brian has been at eBay since 2002.
Ÿ  Prior to eBay Brian was at (http://www.linkedin.com/in/brianscottjohnson)
       –  Handspring - Managed the team working on email/IM/web browsing for one of
          the first smartphones (Treo)
       –  Excite@Home - Director of Engineering for the Excite homepage
       –  Synopsys - Engineer for chip design visualization
       –  AT&T Bell Labs - Data visualization research
Ÿ  Brian received his PHD in Computer Science from the University of Maryland in
    1993. His papers regarding visualizing hierarchical and categorical data with
    Treemaps have been cited hundreds of times.
Ÿ  Brian is a pleasure to listen to and I'm sure you'll appreciate his insights from the
    trenches regarding search query rewrite research and practice at eBay.
Ravi Jammalamadaka


Ÿ Ravi works in the query services team at eBay
   looking at ways to rewrite user queries to improve
   both precision and recall.
Ÿ Received his PhD from University of California,
   Irvine.
  –  Research on Data Security, Databases
Ÿ Ravi published 10 research papers in the areas of
   databases, data security and data mining.
Ÿ Ravi was invited to be a Program committee member
   for IEEE ISI 2010, 2011 and ICDE 2010 (demo
   track).
Query	
  Rewrites	
  
	
  
                 Brian Johnson
                Bay Area Search
                 July 27, 2011
Documents + Users

SEARCH
What Is A Query?

Ÿ  Queries are more than a text box

Ÿ    Keywords=Red Size 7 Shoes
Ÿ    Keywords=Red, Category=Shoes
Ÿ    Keywords=Red, Category=Shoes, Size = 7
Ÿ    Many filter variables affects recall
Ÿ    Query, category, attributes current context dimension targets
Ÿ    Format, condition, location/distance, shipping, seller, price
Questions About Queries

Ÿ    Popularity/Rank
Ÿ    Supply
Ÿ    Demand
Ÿ    Click Through Rate (CTR)
Ÿ    Conversion
Ÿ    Rewrites/Expansions
Ÿ    Related Searches with CTR & Conversion
Ÿ    Category Supply/Demand/CTR/Sales
Ÿ    Product Supply/Demand/CTR/Sales
Ÿ    Top Products
Ÿ    Items (recalled, view, bin, bid, offer, watch, ask, purchase)
Ÿ    Autocompletes
Ÿ    Classification (broad, narrow, ambiguous, help, navigational)
Ÿ    Purchase Site
Ÿ    Frequency by day, day of week, time of day
Ÿ    Cross Border
Ÿ    Sales
Ÿ    Position distribution in user sessions
Ÿ    Result set size
Ÿ    Exit Rate
Ÿ    Exit Destination



         9
Data Mining & Machine Learning

TRENDS
Query Rewrite Trends




Intelligence:    Human    è   Machine
Data:            Small    è   Big
Sources:         Few      è   Many
Context:         Little   è   Some
EXAMPLES
Example Query Services/Rewrites

•  Related Search
     canon sd1300is, canon sd1400 is, canon sd4000, canon sd1400is, canon sd, canon sd1300 is waterproof,
         canon sd 1300, canon

•    Stemming (ipod or ipods)
•    Spelling (cannon or canon)
•    Condition (new or condition=new)
•    Synonyms (boat carpet or marine carpet)
•    Space Synonyms (MarioKart > Mario Kart)
•    Item Specifics (blue or color=blue)
•    Acronyms (os = one size in CSA | Operating Systems in Electronics)
•    Category (shoes or category=63850)
•    Cross Border (site=0 and category =123) or (site=3 and category=456)
•    Fitment (fits model=X)
•    Term Removal (Harry Potter and the Order of the Phoenix (daily deal))


                                                                                                            13
Context & Specificity

Ÿ  Beyond decontextualized single entities
Ÿ  Examples
      –  Stemming failures
           ○  (cowboy v cowboys) and (hat v hats)
           ○  Doesn’t work for cowboy hats & dallas cowboy caps/hats
      –  hp printer > (hp v “hewlett packard”) printer
      –  15 hp pump > 15 (hp v horsepower) pump
      –  motor bike > motor (bike v cycle)
      –  audi b6 > (audi v make=audi) & (b6 v platform=b6) v (product=789)
      –  the who != who the
      –  Time
           ○  Today: latest generation > latest generation v (generation=4)
           ○  Tomorrow: latest generation > latest generation v (generation=5)
HOW
Architecture



               Online (Code + NoSQL Cache)




               Offline (Hadoop)




               Document & Behavioral Data
Better, Faster, Cheaper



Better
•    Better recall
•    Awesome related search suggestions
•    Mind reading spell corrections

Faster
•    <3 milliseconds per query
•    1.2 billion queries per day
•    1,000’s of queries per second on a single machine

Cheaper
•    Hadoop offline
•    Caching online
Metrics/Evaluation

Ÿ    Revenue (A/B Test)
Ÿ    Relevance (Recall, Precision, DCG, etc.)
Ÿ    Result Count
Ÿ    Result Set Overlap
Ÿ    Click Through Rate
Ÿ    Feedback (site links)
Ÿ    Human Judgment
Ÿ    Competitive/Benchmark data
Ÿ    “Gold” test sets




                                                 18
Thinking about rewrites

 Ÿ  Query length              Ÿ  Language detection
 Ÿ  Intent identification     Ÿ  Concept vs instantiation
 Ÿ  Autocomplete,                 (ex: car vs honda)
     autosuggest               Ÿ  Phrases
 Ÿ  Summarization             Ÿ  Bracketing
 Ÿ  Inference (ex: movie 9)   Ÿ  Normalization
 Ÿ  Stemming                  Ÿ  Key term extraction
 Ÿ  Synonyms                  Ÿ  Term relaxation /
 Ÿ  Spell checking                constraining
 Ÿ  Stopwords, noise words    Ÿ  Session context
 Ÿ  Abbreviations, acronyms   Ÿ  Trend detection
 Ÿ  Units, brands, sizes,     Ÿ  Online feedback
     dimensions                Ÿ  Temporal queries, recency
                               Ÿ  Buzz

                                                               19
SYNONYMS
Synonym Candidates


                             Synonyms	
  derived	
  from	
  top	
  changes	
  in	
  successive	
  queries	
  
       frame	
                                                       frames	
  
       lamp	
                                                        lamps	
  
       case	
                                                        cases	
  
       grill	
                                                       grille	
  
       shoe	
                                                        shoes	
  
                                                                   	
  
                             Synonyms	
  derived	
  from	
  top	
  queries	
  in	
  item	
  query	
  clusters	
  
       texas	
  instruments	
  ba	
  ii	
  plus	
           4	
  ba	
  ii	
  plus	
  
       brighton	
  handbag	
                                brighton	
  purse	
  
       lenovo	
  x200	
                                     thinkpad	
  x200	
  
       king	
  bedspread	
                                  king	
  coverlet	
  
       rockabilly	
  dress	
                                swing	
  dress	
  
       1963	
  ford	
  falcon	
                             63	
  falcon	
  
       jessica	
  simpson	
  hair	
  extensions	
           jessica	
  simpson	
  hairdo	
  
                                                          	
  
                                Abbrevia<ons/acronym	
  derived	
  from	
  query	
  transi<ons	
  
       stanford	
  ky	
                                     stanford	
  kentucky	
  
       dc	
  sub	
                                          dc	
  subwoofer	
  
       meridian	
  ms	
                                     meridian	
  mississippi	
  
       front	
  royal	
  va	
                               front	
  royal	
  virginia	
  
       baseball	
  pin	
                                    baseball	
  pinback	
  
       snowboard	
  helmet	
  l	
                           snowboard	
  helmet	
  large	
  
       motorcycle	
  cam	
                                  motorcycle	
  camera	
  
       diamond	
  amp	
                                     diamond	
  amplifier	
  
       ac4ve	
  sub	
                                       ac4ve	
  subwoofer	
  
       shapleigh	
  me	
                                    shapleigh	
  maine	
  
SPELLING
Spell Check – Offline



  Ÿ  Successive queries qi and qi’ are candidates       q1
      for spell correction analysis if the edit
      distance is within 40% of the average query
      length.                                            q2
      •  qi and qi’ may have tokens in common, called
         anchors.                                        q3   q1’
      •  Use transitivity remove intermediate queries.
  Ÿ  Create a bipartite graph for spell correction      q4   q2’
      candidates.
  Ÿ  Same query can exist on the source and sink        q5
      sides of the graph.
  Ÿ  Compute input and output degrees of each
      sink node, indicating how info flows in and        q6
      out of a query.
  Ÿ  A correct spelling candidate is a sink node
      with a far more flow into rather than out of it.
Spell Check – Online




                 query

                                                Tokenize to tokens
                In the white
                    list?                        (wi-2, wi-1, wi)

     Found a
     match
                                               Calculate
                                               contextual            Priority
                                               possibility           Queue
                           Search in
                           dictionary
                                                                                No, go
                                             Obtain entropy                     to next
               N-Gram Index                                           Last?

                                                                        Yes, get the
                                A list of                               best
               Edit distance,   candidates   Obtain cosine
                phonetics                     similarity             Result
ACRONYMS
Acronyms


  Ÿ  Expand User Queries
        –  Increase recall without sacrificing precision
        –  Better deals for buyers
  Ÿ  Examples


                                BAPE               2,540 results




             OR(Bathing Ape, Bape)              2987 results




                                      Rescue Project               26
Mining Acronyms From Query Reformulations

Ÿ  Learn from user behavioral data
Ÿ  Example




                UCB Sweatshirt                         CSA




          University of California Berkeley             CSA
          Sweatshirt




                                      Rescue Project          27
Acronym Context & Specificity

Ÿ  Need to express context sensitive expansions
     –  Categorical
          ○  ATC >          Armored Troop Carrier        in Toys and Hobbies
          ○  ATC >          Artist trading card          in ART
          ○  ATC >          Automatic Tool Change        in Business and Industrial
     –  Directional
          ○  Old                      >         Antique
          ○  Yoga towels/mats         >         Yogitoes




                                        Rescue Project                                28
Acronym/Abbreviation                         Category Based
 Mining                                       Expansions
•  Acronyms/Abbreviation mined from Raw
text and query logs                                          hp
                                              Electronics          Cars and Trucks
•  Look for patterns of text
     •  long form (short form)
     •  short form (long form)
• Employ intelligent matching algorithms to    Hewlett Packard    horsepower
mine candidates
Example title:                                 System allows
new cheap Playstation portable (PSP)           •  Category based expansions
Acronym discovered                             •  Directional expansions
PSP -> PlayStation Portable                    •  Positive and Negative
Candidates mined are fed through a             expansions
machine learning classifier to remove the
false positives
THANKS
&
QUESTIONS
Mining	
  Acronyms/Abbrevia<ons	
  from	
  Raw	
  Text	
  



               Ravi Chandra Jammalamadaka
                         eBay, Inc
                        07/27/2011
Talk Overview

Ÿ  Motivation
     –  Introduction of the Acronym mining problem.
Ÿ  Related Work
     –  Algorithm overview.
Ÿ  eBay Acronym Mining algorithm.
     –  Architecture.
     –  Algorithm overview.
Ÿ  Results.
Ÿ  Conclusions.
Motivation

Ÿ     User queries are incomplete representation of their
      information needs
      –  Spelling mistakes
          ○  Jetsky instead of Jetski
      –  Synonyms are not considered
          ○  PS3 and PlayStation 3 ( Acronym, topic of talk)
          ○  JetSki and Personal Watercraft
      –  Users are not experts in search engine technology
          ○  Example: Anniversary gifts for men




                               eBay, Inc.                      33
Need for Query Rewrites



                         JetSky             2 results

                             Spelling Correction

                        JetSki            23782 results

                             Synonym Expansion



                OR( Jetski, Personal WaterCraft)   24151 results




                            eBay, Inc.                             34
Motivation: Acronyms/Abbreviation


       Uke                               IQD




                                      Iraqi Dinar
      Ukulele




                         eBay, Inc.                 35
Where can we find Acronyms?



                                     Grand Theft Auto III (GTA 3) (PlayStation 2, 2001)
          New Uke
                                     Grand Theft Auto IV (GTA 4) PS3 mint condition

                                    Warhawk (No Headset) PlayStation 3 (PS3) BRAND NEW!



          New Ukulele               COLD LASER. Low Level Laser Therapy(LLLT) + Acupuncture




                                             From Item Title/Descriptions
  From Query Reformulations
   i.e how users change their
             queries.


                                eBay, Inc.                                         36
Related Work




    eBay, Inc.   37
Schwartz et al: Greedy Match Algorithm




Warhawk (No Headset) PlayStation 3 (PS3) BRAND NEW!




Warhawk (No Headset) PlayStation 3 (PS3) BRAND NEW!




                          eBay, Inc.            38
Identifying Abbreviation Definitions in Biomedical Text.


   Ÿ  Mining for patterns
       –  long form ( short form)
       –  short form ( long form)
       –  Long form is no more than min ( |A| + 5 , |A| * 2).
       –  Roche et. al. proposes that number to be less than
          |A|*3.
   Ÿ  The characters in the short form should match the long
       form in the same order and the first character in the
       short form should be at the beginning of a word.
   Ÿ  Example:
       –  PS3 -> PlayStation 3

                                 eBay, Inc.                39
Schwartz et al

  Ÿ  Pros:
      –  Finds almost all abbreviations and acronyms
  Ÿ  Cons:
      –  High False positive rate.
         ○  Foot Massage Diabetes Treatment (FEET)
      –  Suffers from truncated long form problem.
      –  Example: American Automobile Association (AAA)




                          eBay, Inc.                 40
Acronym-Expansion Recognition and Ranking on the Web


   Ÿ  First few characters match
   Ÿ  Ignore Stop words
   Ÿ  Example:
       –  Cool - > Cooperation in Ontology and Linguistics.




     Alpa Jain, Silviu Cucerzan, Saliha Azzam. Acronym-Expansion
                 Recognition and Ranking on the Web.




                                  eBay, Inc.                       41
Jain et al

Ÿ  Pros:
    –  Low false positive rate
Ÿ  Cons:
    –  Does not do a good job at identifying abbreviations
    –  Misses out on a lot of actual acronyms
       ○  Will not find PlayStation 3 and PS3 association.




                             eBay, Inc.                      42
eBay Acronym Mining Architecture



                Candidate	
        Feature	
        Classifier	
  
                Generator	
       Extractor	
  




 User	
                                                             Dic4onary	
  
 Data	
  	
  



                 Live	
  on	
                        Human	
  
                                  A/B	
  Test	
  
                   Site	
                           Judgment	
  
eBay Acronym/Abbreviation Mining Algorithm

Ÿ    Desirable Properties
      –  Find all abbreviation and Acronyms like the greedy match
      –  Reduce the amount of false positives
      –  Solve the truncated long form problem.
Ÿ    What makes a good acronym – expansion pair?
      –  Characters in the acronym are found at the beginning of the words.
      –  Expansions generally do not have words that are skipped or not
         represented in the acronym.
      –  Can a cost metric capture the intuition ?




                                    eBay, Inc.                         44
Cost Based Approach for Mining Abbreviations


 CIM ------- Computer Interface Module
         Total Cost: Low cost

        PVC ------- PolyVinyl Cloride
           Total Cost: medium cost

    HSF –-- Heat shock transcription factor
              Total Cost: High Cost



                         eBay, Inc.            45
Cost Based Recursive Algorithm


        Title: new American Automobile Association (AAA) map of
                                mexico

        Objective: Find the longest form with the lowest cost

                   American Automobile Association (AAA)




Min ( American Automobile Associ (AA) ,    American Automobile Associ (AAA)   )
                   +
                Cost so far



                                    eBay, Inc.                                46
Salient Properties of the new algorithm

  Ÿ    If Cost > Threshold, then the long form is a false positive.
  Ÿ    As cost increases
        –  False positives increase
        –  The chance that a real acronym is not identified decreases
  Ÿ    As cost decreases
        –  False positives decrease
        –  The chance that a real acronym is not identified increases.
  Ÿ    At lower costs, the algorithm behaves like the first few characters
        match.
  Ÿ    At high costs, the algorithm behaves like the greedy match
        algorithm.




                                    eBay, Inc.                           47
Experiments

Sample Dataset: 2.5 million item titles
  Algorithm           Total Candidates        False Positive Rate   Yield


  Greedy Match        2548                    39 %                  1554

  First Few           759                     4%                    728
  Characters Match


  Cost Based Match,   1223                    14 %                  1051
  k1
  Cost Based Match,   1604                    16 %                  1284
  k2
  Cost Based Match,   2023                    20 %                  1554
  k3




                                         eBay, Inc.                         48
Removing false positives

Ÿ  Goal
    –  Develop a classification algorithm that will classify is a
       candidate is a acronym or not.
Ÿ  Classification algorithm
    –  Decision trees
       ○  TreeNet data mining tool.
Ÿ  Candidate are tagged with many features.
Ÿ  Classifier learns on the tagged golden set.
Ÿ  New candidates are then run through the classifier.




                               eBay, Inc.                      49
Example of a Decision Tree



     Tid Refund Marital    Taxable
                                                                                    Splitting Attributes
                Status     Income Cheat

     1    Yes    Single    125K       No
     2    No     Married   100K       No                                    Refund
                                      No
                                                                  Yes                     No
     3    No     Single    70K
     4    Yes    Married   120K       No                          NO                      MarSt
     5    No     Divorced 95K         Yes                                                          Married
                                                                           Single, Divorced
     6    No     Married   60K        No
     7    Yes    Divorced 220K        No                                         TaxInc            NO
     8    No     Single    85K        Yes                             < 80K                > 80K
     9    No     Married   75K        No
                                                                          NO              YES
     10   No     Single    90K        Yes
                                                                    Model: Decision Tree
10




            Training Data

                                                     eBay, Inc.                                      50
                                 Acknowledgements: George Kollios, gkollios@cs.bu.edu
Features: Neighborhood Similarity

  Ÿ  Rationale: Two synonym candidates A and B, will tend
      to have similar neighbors (viz keywords) surrounding
      them.




   Neighborhood
   similarity = Intersection ( Neighbours(A) , Neighbours(b) )
                   Min (Neighbours(a), Neighbours(b))




                                 eBay, Inc.                      51
Features: Mutual Information

Ÿ    Rationale: The goal of this metric is determine if the co-occurrence of the
      candidates in the description is significantly more than the random
      chance of them co-occurring.




                                     eBay, Inc.                            52
Features: KL divergence

Ÿ  Rationale: Two synonym candidates will have similar
    category distributions of their inventory.




                            eBay, Inc.                    53
Kl distance: Example



 Ipods:      Electronics (50),                Electronics (100),
                                    Ipod:
           Clothing Shoes and
                                             Clothing Shoes and
             Accessories (1)
                                               Accessories (3)



   Ipod:      Electronics (100),
                                   T-shirt    Clothing Shoes and
             Clothing Shoes and               Accessories (1000),
               Accessories (3)                   Uniforms ( 50)


   KL divergence: 0.83                       KL divergence:
                                               128592.74
Classifier Decision Tree Example



                             KL Distance

                  > 2.5
                                            ≤ 2.5

               False Positive        Neighbourhood Similarity

                                   > 0.2                      ≤ 0.2


                          Mutual Information
                                                            False Positive
                   > 0.003                 ≤ 0.003


                 True Positive             False Positive
Classifier Results

Ÿ  False positive rate at the candidate generation stage 20 %
Ÿ  False positive rate after going through the classifier is 5.5 %
Ÿ  The remaining false positives are removed by human
    judges.




                              eBay, Inc.                      56
Conclusions

Ÿ  We presented the state of the art algorithms for acronym
    mining and their limitations.
Ÿ  We presented a new cost based algorithm for mining
    acronyms from raw text that seeks to address the limitations
    of the previous algorithms.
Ÿ  We presented a classifier approach to remove false
    positives.
Ÿ  We experimentally validated our approach and show it is a
    viable approach for mining acronyms.




                             eBay, Inc.                    57
References

  Ÿ    [1] Ariel S Schwartz, Marti A. Hearst. A simple Aglorithm for Identifying
        Abbreviation definition in BioMedical Text.
  Ÿ    [2] Yongja Park, Roy J. Byrd. Hybrid text mining for finding abbreviations
        and their definitions.
  Ÿ    [3] Mathieu Roche, Violaine Prince. Managing the Acronym/Expansion
        Identification Process for Text-mining Applications.




                                      eBay, Inc.                                58
References(2)

Ÿ    [4] Yee Fan Tan, Ergin Elmacioglu, Min-Yen Kan, Dongwon Lee. Efficient Web-
      Based Linkage of Short to Long Forms.
Ÿ    [5] Alpa Jain, Silviu Cucerzan, Saliha Azzam. Acronym-Expansion Recognition
      and Ranking on the Web.
Ÿ    [6]Xiaonan Ji, Gu Xu, James Bailey and Hang Li. Mining, Ranking and Using
      Acronym Patterns.




                                      eBay, Inc.                           59
Thanks




eBay, Inc.   60
2011 Search Query Rewrites - Synonyms & Acronyms

Weitere ähnliche Inhalte

Ähnlich wie 2011 Search Query Rewrites - Synonyms & Acronyms

Triggering and Managing Knowledge Panels for Brands and Companies - Jason Bar...
Triggering and Managing Knowledge Panels for Brands and Companies - Jason Bar...Triggering and Managing Knowledge Panels for Brands and Companies - Jason Bar...
Triggering and Managing Knowledge Panels for Brands and Companies - Jason Bar...Jason Barnard
 
Big Data, Analytics, and Content Recommendations on AWS
Big Data, Analytics, and Content Recommendations on AWSBig Data, Analytics, and Content Recommendations on AWS
Big Data, Analytics, and Content Recommendations on AWSAmazon Web Services
 
SVC101 Building Search into Your App - AWS re: Invent 2012
SVC101 Building Search into Your App - AWS re: Invent 2012SVC101 Building Search into Your App - AWS re: Invent 2012
SVC101 Building Search into Your App - AWS re: Invent 2012Amazon Web Services
 
Embrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with RippleEmbrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with RippleSean Cribbs
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackTypenathanmarz
 
Introduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf TrainingIntroduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf TrainingSean Cribbs
 
Lessons Learnt From Working With Rails
Lessons Learnt From Working With RailsLessons Learnt From Working With Rails
Lessons Learnt From Working With Railsmartinbtt
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Somnath Banerjee
 
Building a web framework: Django's design decisions
Building a web framework: Django's design decisionsBuilding a web framework: Django's design decisions
Building a web framework: Django's design decisionsJacob Kaplan-Moss
 
SARTA: Business Model Canvas Workshop
SARTA: Business Model Canvas WorkshopSARTA: Business Model Canvas Workshop
SARTA: Business Model Canvas WorkshopAlex Cowan
 
CloudCon Data Mining Presentation
CloudCon Data Mining PresentationCloudCon Data Mining Presentation
CloudCon Data Mining PresentationBrian Johnson
 
Inheritance - the myth of code reuse | Andrei Raifura | CodeWay 2015
Inheritance - the myth of code reuse | Andrei Raifura | CodeWay 2015Inheritance - the myth of code reuse | Andrei Raifura | CodeWay 2015
Inheritance - the myth of code reuse | Andrei Raifura | CodeWay 2015YOPESO
 
Schema matching for merging data feeds
Schema matching for merging data feedsSchema matching for merging data feeds
Schema matching for merging data feedsbutest
 
Darin Briskman_Amazon_June_9_2017_Presentation
Darin Briskman_Amazon_June_9_2017_PresentationDarin Briskman_Amazon_June_9_2017_Presentation
Darin Briskman_Amazon_June_9_2017_PresentationTriNimbus
 
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...Chris Fregly
 
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Spark Summit
 
wadar_poster_final
wadar_poster_finalwadar_poster_final
wadar_poster_finalGiorgio Orsi
 
QCon New York - Migrating to Cloud Native with Microservices
QCon New York - Migrating to Cloud Native with MicroservicesQCon New York - Migrating to Cloud Native with Microservices
QCon New York - Migrating to Cloud Native with MicroservicesAdrian Cockcroft
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jMax De Marzi
 
Venture Design Workshop: Business Model Canvas
Venture Design Workshop: Business Model CanvasVenture Design Workshop: Business Model Canvas
Venture Design Workshop: Business Model CanvasAlex Cowan
 

Ähnlich wie 2011 Search Query Rewrites - Synonyms & Acronyms (20)

Triggering and Managing Knowledge Panels for Brands and Companies - Jason Bar...
Triggering and Managing Knowledge Panels for Brands and Companies - Jason Bar...Triggering and Managing Knowledge Panels for Brands and Companies - Jason Bar...
Triggering and Managing Knowledge Panels for Brands and Companies - Jason Bar...
 
Big Data, Analytics, and Content Recommendations on AWS
Big Data, Analytics, and Content Recommendations on AWSBig Data, Analytics, and Content Recommendations on AWS
Big Data, Analytics, and Content Recommendations on AWS
 
SVC101 Building Search into Your App - AWS re: Invent 2012
SVC101 Building Search into Your App - AWS re: Invent 2012SVC101 Building Search into Your App - AWS re: Invent 2012
SVC101 Building Search into Your App - AWS re: Invent 2012
 
Embrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with RippleEmbrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with Ripple
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackType
 
Introduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf TrainingIntroduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf Training
 
Lessons Learnt From Working With Rails
Lessons Learnt From Working With RailsLessons Learnt From Working With Rails
Lessons Learnt From Working With Rails
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​
 
Building a web framework: Django's design decisions
Building a web framework: Django's design decisionsBuilding a web framework: Django's design decisions
Building a web framework: Django's design decisions
 
SARTA: Business Model Canvas Workshop
SARTA: Business Model Canvas WorkshopSARTA: Business Model Canvas Workshop
SARTA: Business Model Canvas Workshop
 
CloudCon Data Mining Presentation
CloudCon Data Mining PresentationCloudCon Data Mining Presentation
CloudCon Data Mining Presentation
 
Inheritance - the myth of code reuse | Andrei Raifura | CodeWay 2015
Inheritance - the myth of code reuse | Andrei Raifura | CodeWay 2015Inheritance - the myth of code reuse | Andrei Raifura | CodeWay 2015
Inheritance - the myth of code reuse | Andrei Raifura | CodeWay 2015
 
Schema matching for merging data feeds
Schema matching for merging data feedsSchema matching for merging data feeds
Schema matching for merging data feeds
 
Darin Briskman_Amazon_June_9_2017_Presentation
Darin Briskman_Amazon_June_9_2017_PresentationDarin Briskman_Amazon_June_9_2017_Presentation
Darin Briskman_Amazon_June_9_2017_Presentation
 
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...
Practical Data Science Workshop - Recommendation Systems - Collaborative Filt...
 
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
 
wadar_poster_final
wadar_poster_finalwadar_poster_final
wadar_poster_final
 
QCon New York - Migrating to Cloud Native with Microservices
QCon New York - Migrating to Cloud Native with MicroservicesQCon New York - Migrating to Cloud Native with Microservices
QCon New York - Migrating to Cloud Native with Microservices
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4j
 
Venture Design Workshop: Business Model Canvas
Venture Design Workshop: Business Model CanvasVenture Design Workshop: Business Model Canvas
Venture Design Workshop: Business Model Canvas
 

Mehr von Brian Johnson

Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail Brian Johnson
 
eBay Search Query Intent
eBay Search Query IntenteBay Search Query Intent
eBay Search Query IntentBrian Johnson
 
2015-04 eBay Statistics
2015-04 eBay Statistics2015-04 eBay Statistics
2015-04 eBay StatisticsBrian Johnson
 
eBay Search Science, IEEE Big Data, April 3rd, 2015
eBay Search Science, IEEE Big Data, April 3rd, 2015eBay Search Science, IEEE Big Data, April 3rd, 2015
eBay Search Science, IEEE Big Data, April 3rd, 2015Brian Johnson
 
2011 Crowdsourcing Search Evaluation
2011 Crowdsourcing Search Evaluation2011 Crowdsourcing Search Evaluation
2011 Crowdsourcing Search EvaluationBrian Johnson
 
Treemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical DataTreemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical DataBrian Johnson
 
11 964 181 System And Method For Providi
11 964 181 System And Method For Providi11 964 181 System And Method For Providi
11 964 181 System And Method For ProvidiBrian Johnson
 
11 641 262 Proprietor Currency Assignmen
11 641 262 Proprietor Currency Assignmen11 641 262 Proprietor Currency Assignmen
11 641 262 Proprietor Currency AssignmenBrian Johnson
 
10 977 279 Method And System For Categor
10 977 279 Method And System For Categor10 977 279 Method And System For Categor
10 977 279 Method And System For CategorBrian Johnson
 
11 869 290 Electronic Publication System
11 869 290 Electronic Publication System11 869 290 Electronic Publication System
11 869 290 Electronic Publication SystemBrian Johnson
 

Mehr von Brian Johnson (10)

Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
 
eBay Search Query Intent
eBay Search Query IntenteBay Search Query Intent
eBay Search Query Intent
 
2015-04 eBay Statistics
2015-04 eBay Statistics2015-04 eBay Statistics
2015-04 eBay Statistics
 
eBay Search Science, IEEE Big Data, April 3rd, 2015
eBay Search Science, IEEE Big Data, April 3rd, 2015eBay Search Science, IEEE Big Data, April 3rd, 2015
eBay Search Science, IEEE Big Data, April 3rd, 2015
 
2011 Crowdsourcing Search Evaluation
2011 Crowdsourcing Search Evaluation2011 Crowdsourcing Search Evaluation
2011 Crowdsourcing Search Evaluation
 
Treemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical DataTreemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical Data
 
11 964 181 System And Method For Providi
11 964 181 System And Method For Providi11 964 181 System And Method For Providi
11 964 181 System And Method For Providi
 
11 641 262 Proprietor Currency Assignmen
11 641 262 Proprietor Currency Assignmen11 641 262 Proprietor Currency Assignmen
11 641 262 Proprietor Currency Assignmen
 
10 977 279 Method And System For Categor
10 977 279 Method And System For Categor10 977 279 Method And System For Categor
10 977 279 Method And System For Categor
 
11 869 290 Electronic Publication System
11 869 290 Electronic Publication System11 869 290 Electronic Publication System
11 869 290 Electronic Publication System
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 

Kürzlich hochgeladen (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

2011 Search Query Rewrites - Synonyms & Acronyms

  • 1. Bay Area Search Wednesday July 27, 2011
  • 2. Agenda Ÿ  6:30 Eat & Greet - Free Food & Beer Ÿ  7:00 Speaker #1 – Brian Johnson Ÿ  7:45 Speaker #2 – Ravi Jammalamadaka Ÿ  Plan on 2 fabulous 45 minute presentations by excellent local search experts. Please suggest speakers or topics you would like to hear. Ÿ  Great speakers, good food, fine beer, and everyone's favorite search term - Free, Free, Free:-) Ÿ  Event will be held at the eBay campus just off 17/880 @ Hamilton in the main Community building. Look for lobby/flagpole. Ÿ  4th Wednesday of every month Ÿ  http://www.meetup.com/Bay-Area-Search/
  • 3. How Can I Help? Ÿ Speakers Ÿ  Feedback Ÿ  Organizers Ÿ  Videographers
  • 4. Brian Johnson Ÿ  Brian is the Director of Engineering for Query Services at eBay. He has held this role since January of 2011. Prior to that he managed the engineering teams for Query Understanding (metrics and crowdsourced human judgment), classification, data publishing, and browsing. Brian has been at eBay since 2002. Ÿ  Prior to eBay Brian was at (http://www.linkedin.com/in/brianscottjohnson) –  Handspring - Managed the team working on email/IM/web browsing for one of the first smartphones (Treo) –  Excite@Home - Director of Engineering for the Excite homepage –  Synopsys - Engineer for chip design visualization –  AT&T Bell Labs - Data visualization research Ÿ  Brian received his PHD in Computer Science from the University of Maryland in 1993. His papers regarding visualizing hierarchical and categorical data with Treemaps have been cited hundreds of times. Ÿ  Brian is a pleasure to listen to and I'm sure you'll appreciate his insights from the trenches regarding search query rewrite research and practice at eBay.
  • 5. Ravi Jammalamadaka Ÿ Ravi works in the query services team at eBay looking at ways to rewrite user queries to improve both precision and recall. Ÿ Received his PhD from University of California, Irvine. –  Research on Data Security, Databases Ÿ Ravi published 10 research papers in the areas of databases, data security and data mining. Ÿ Ravi was invited to be a Program committee member for IEEE ISI 2010, 2011 and ICDE 2010 (demo track).
  • 6. Query  Rewrites     Brian Johnson Bay Area Search July 27, 2011
  • 8. What Is A Query? Ÿ  Queries are more than a text box Ÿ  Keywords=Red Size 7 Shoes Ÿ  Keywords=Red, Category=Shoes Ÿ  Keywords=Red, Category=Shoes, Size = 7 Ÿ  Many filter variables affects recall Ÿ  Query, category, attributes current context dimension targets Ÿ  Format, condition, location/distance, shipping, seller, price
  • 9. Questions About Queries Ÿ  Popularity/Rank Ÿ  Supply Ÿ  Demand Ÿ  Click Through Rate (CTR) Ÿ  Conversion Ÿ  Rewrites/Expansions Ÿ  Related Searches with CTR & Conversion Ÿ  Category Supply/Demand/CTR/Sales Ÿ  Product Supply/Demand/CTR/Sales Ÿ  Top Products Ÿ  Items (recalled, view, bin, bid, offer, watch, ask, purchase) Ÿ  Autocompletes Ÿ  Classification (broad, narrow, ambiguous, help, navigational) Ÿ  Purchase Site Ÿ  Frequency by day, day of week, time of day Ÿ  Cross Border Ÿ  Sales Ÿ  Position distribution in user sessions Ÿ  Result set size Ÿ  Exit Rate Ÿ  Exit Destination 9
  • 10. Data Mining & Machine Learning TRENDS
  • 11. Query Rewrite Trends Intelligence: Human è Machine Data: Small è Big Sources: Few è Many Context: Little è Some
  • 13. Example Query Services/Rewrites •  Related Search canon sd1300is, canon sd1400 is, canon sd4000, canon sd1400is, canon sd, canon sd1300 is waterproof, canon sd 1300, canon •  Stemming (ipod or ipods) •  Spelling (cannon or canon) •  Condition (new or condition=new) •  Synonyms (boat carpet or marine carpet) •  Space Synonyms (MarioKart > Mario Kart) •  Item Specifics (blue or color=blue) •  Acronyms (os = one size in CSA | Operating Systems in Electronics) •  Category (shoes or category=63850) •  Cross Border (site=0 and category =123) or (site=3 and category=456) •  Fitment (fits model=X) •  Term Removal (Harry Potter and the Order of the Phoenix (daily deal)) 13
  • 14. Context & Specificity Ÿ  Beyond decontextualized single entities Ÿ  Examples –  Stemming failures ○  (cowboy v cowboys) and (hat v hats) ○  Doesn’t work for cowboy hats & dallas cowboy caps/hats –  hp printer > (hp v “hewlett packard”) printer –  15 hp pump > 15 (hp v horsepower) pump –  motor bike > motor (bike v cycle) –  audi b6 > (audi v make=audi) & (b6 v platform=b6) v (product=789) –  the who != who the –  Time ○  Today: latest generation > latest generation v (generation=4) ○  Tomorrow: latest generation > latest generation v (generation=5)
  • 15. HOW
  • 16. Architecture Online (Code + NoSQL Cache) Offline (Hadoop) Document & Behavioral Data
  • 17. Better, Faster, Cheaper Better •  Better recall •  Awesome related search suggestions •  Mind reading spell corrections Faster •  <3 milliseconds per query •  1.2 billion queries per day •  1,000’s of queries per second on a single machine Cheaper •  Hadoop offline •  Caching online
  • 18. Metrics/Evaluation Ÿ  Revenue (A/B Test) Ÿ  Relevance (Recall, Precision, DCG, etc.) Ÿ  Result Count Ÿ  Result Set Overlap Ÿ  Click Through Rate Ÿ  Feedback (site links) Ÿ  Human Judgment Ÿ  Competitive/Benchmark data Ÿ  “Gold” test sets 18
  • 19. Thinking about rewrites Ÿ  Query length Ÿ  Language detection Ÿ  Intent identification Ÿ  Concept vs instantiation Ÿ  Autocomplete, (ex: car vs honda) autosuggest Ÿ  Phrases Ÿ  Summarization Ÿ  Bracketing Ÿ  Inference (ex: movie 9) Ÿ  Normalization Ÿ  Stemming Ÿ  Key term extraction Ÿ  Synonyms Ÿ  Term relaxation / Ÿ  Spell checking constraining Ÿ  Stopwords, noise words Ÿ  Session context Ÿ  Abbreviations, acronyms Ÿ  Trend detection Ÿ  Units, brands, sizes, Ÿ  Online feedback dimensions Ÿ  Temporal queries, recency Ÿ  Buzz 19
  • 21. Synonym Candidates Synonyms  derived  from  top  changes  in  successive  queries   frame   frames   lamp   lamps   case   cases   grill   grille   shoe   shoes     Synonyms  derived  from  top  queries  in  item  query  clusters   texas  instruments  ba  ii  plus   4  ba  ii  plus   brighton  handbag   brighton  purse   lenovo  x200   thinkpad  x200   king  bedspread   king  coverlet   rockabilly  dress   swing  dress   1963  ford  falcon   63  falcon   jessica  simpson  hair  extensions   jessica  simpson  hairdo     Abbrevia<ons/acronym  derived  from  query  transi<ons   stanford  ky   stanford  kentucky   dc  sub   dc  subwoofer   meridian  ms   meridian  mississippi   front  royal  va   front  royal  virginia   baseball  pin   baseball  pinback   snowboard  helmet  l   snowboard  helmet  large   motorcycle  cam   motorcycle  camera   diamond  amp   diamond  amplifier   ac4ve  sub   ac4ve  subwoofer   shapleigh  me   shapleigh  maine  
  • 23. Spell Check – Offline Ÿ  Successive queries qi and qi’ are candidates q1 for spell correction analysis if the edit distance is within 40% of the average query length. q2 •  qi and qi’ may have tokens in common, called anchors. q3 q1’ •  Use transitivity remove intermediate queries. Ÿ  Create a bipartite graph for spell correction q4 q2’ candidates. Ÿ  Same query can exist on the source and sink q5 sides of the graph. Ÿ  Compute input and output degrees of each sink node, indicating how info flows in and q6 out of a query. Ÿ  A correct spelling candidate is a sink node with a far more flow into rather than out of it.
  • 24. Spell Check – Online query Tokenize to tokens In the white list? (wi-2, wi-1, wi) Found a match Calculate contextual Priority possibility Queue Search in dictionary No, go Obtain entropy to next N-Gram Index Last? Yes, get the A list of best Edit distance, candidates Obtain cosine phonetics similarity Result
  • 26. Acronyms Ÿ  Expand User Queries –  Increase recall without sacrificing precision –  Better deals for buyers Ÿ  Examples BAPE 2,540 results OR(Bathing Ape, Bape) 2987 results Rescue Project 26
  • 27. Mining Acronyms From Query Reformulations Ÿ  Learn from user behavioral data Ÿ  Example UCB Sweatshirt CSA University of California Berkeley CSA Sweatshirt Rescue Project 27
  • 28. Acronym Context & Specificity Ÿ  Need to express context sensitive expansions –  Categorical ○  ATC > Armored Troop Carrier in Toys and Hobbies ○  ATC > Artist trading card in ART ○  ATC > Automatic Tool Change in Business and Industrial –  Directional ○  Old > Antique ○  Yoga towels/mats > Yogitoes Rescue Project 28
  • 29. Acronym/Abbreviation Category Based Mining Expansions •  Acronyms/Abbreviation mined from Raw text and query logs hp Electronics Cars and Trucks •  Look for patterns of text •  long form (short form) •  short form (long form) • Employ intelligent matching algorithms to Hewlett Packard horsepower mine candidates Example title: System allows new cheap Playstation portable (PSP) •  Category based expansions Acronym discovered •  Directional expansions PSP -> PlayStation Portable •  Positive and Negative Candidates mined are fed through a expansions machine learning classifier to remove the false positives
  • 31. Mining  Acronyms/Abbrevia<ons  from  Raw  Text   Ravi Chandra Jammalamadaka eBay, Inc 07/27/2011
  • 32. Talk Overview Ÿ  Motivation –  Introduction of the Acronym mining problem. Ÿ  Related Work –  Algorithm overview. Ÿ  eBay Acronym Mining algorithm. –  Architecture. –  Algorithm overview. Ÿ  Results. Ÿ  Conclusions.
  • 33. Motivation Ÿ  User queries are incomplete representation of their information needs –  Spelling mistakes ○  Jetsky instead of Jetski –  Synonyms are not considered ○  PS3 and PlayStation 3 ( Acronym, topic of talk) ○  JetSki and Personal Watercraft –  Users are not experts in search engine technology ○  Example: Anniversary gifts for men eBay, Inc. 33
  • 34. Need for Query Rewrites JetSky 2 results Spelling Correction JetSki 23782 results Synonym Expansion OR( Jetski, Personal WaterCraft) 24151 results eBay, Inc. 34
  • 35. Motivation: Acronyms/Abbreviation Uke IQD Iraqi Dinar Ukulele eBay, Inc. 35
  • 36. Where can we find Acronyms? Grand Theft Auto III (GTA 3) (PlayStation 2, 2001) New Uke Grand Theft Auto IV (GTA 4) PS3 mint condition Warhawk (No Headset) PlayStation 3 (PS3) BRAND NEW! New Ukulele COLD LASER. Low Level Laser Therapy(LLLT) + Acupuncture From Item Title/Descriptions From Query Reformulations i.e how users change their queries. eBay, Inc. 36
  • 37. Related Work eBay, Inc. 37
  • 38. Schwartz et al: Greedy Match Algorithm Warhawk (No Headset) PlayStation 3 (PS3) BRAND NEW! Warhawk (No Headset) PlayStation 3 (PS3) BRAND NEW! eBay, Inc. 38
  • 39. Identifying Abbreviation Definitions in Biomedical Text. Ÿ  Mining for patterns –  long form ( short form) –  short form ( long form) –  Long form is no more than min ( |A| + 5 , |A| * 2). –  Roche et. al. proposes that number to be less than |A|*3. Ÿ  The characters in the short form should match the long form in the same order and the first character in the short form should be at the beginning of a word. Ÿ  Example: –  PS3 -> PlayStation 3 eBay, Inc. 39
  • 40. Schwartz et al Ÿ  Pros: –  Finds almost all abbreviations and acronyms Ÿ  Cons: –  High False positive rate. ○  Foot Massage Diabetes Treatment (FEET) –  Suffers from truncated long form problem. –  Example: American Automobile Association (AAA) eBay, Inc. 40
  • 41. Acronym-Expansion Recognition and Ranking on the Web Ÿ  First few characters match Ÿ  Ignore Stop words Ÿ  Example: –  Cool - > Cooperation in Ontology and Linguistics. Alpa Jain, Silviu Cucerzan, Saliha Azzam. Acronym-Expansion Recognition and Ranking on the Web. eBay, Inc. 41
  • 42. Jain et al Ÿ  Pros: –  Low false positive rate Ÿ  Cons: –  Does not do a good job at identifying abbreviations –  Misses out on a lot of actual acronyms ○  Will not find PlayStation 3 and PS3 association. eBay, Inc. 42
  • 43. eBay Acronym Mining Architecture Candidate   Feature   Classifier   Generator   Extractor   User   Dic4onary   Data     Live  on   Human   A/B  Test   Site   Judgment  
  • 44. eBay Acronym/Abbreviation Mining Algorithm Ÿ  Desirable Properties –  Find all abbreviation and Acronyms like the greedy match –  Reduce the amount of false positives –  Solve the truncated long form problem. Ÿ  What makes a good acronym – expansion pair? –  Characters in the acronym are found at the beginning of the words. –  Expansions generally do not have words that are skipped or not represented in the acronym. –  Can a cost metric capture the intuition ? eBay, Inc. 44
  • 45. Cost Based Approach for Mining Abbreviations CIM ------- Computer Interface Module Total Cost: Low cost PVC ------- PolyVinyl Cloride Total Cost: medium cost HSF –-- Heat shock transcription factor Total Cost: High Cost eBay, Inc. 45
  • 46. Cost Based Recursive Algorithm Title: new American Automobile Association (AAA) map of mexico Objective: Find the longest form with the lowest cost American Automobile Association (AAA) Min ( American Automobile Associ (AA) , American Automobile Associ (AAA) ) + Cost so far eBay, Inc. 46
  • 47. Salient Properties of the new algorithm Ÿ  If Cost > Threshold, then the long form is a false positive. Ÿ  As cost increases –  False positives increase –  The chance that a real acronym is not identified decreases Ÿ  As cost decreases –  False positives decrease –  The chance that a real acronym is not identified increases. Ÿ  At lower costs, the algorithm behaves like the first few characters match. Ÿ  At high costs, the algorithm behaves like the greedy match algorithm. eBay, Inc. 47
  • 48. Experiments Sample Dataset: 2.5 million item titles Algorithm Total Candidates False Positive Rate Yield Greedy Match 2548 39 % 1554 First Few 759 4% 728 Characters Match Cost Based Match, 1223 14 % 1051 k1 Cost Based Match, 1604 16 % 1284 k2 Cost Based Match, 2023 20 % 1554 k3 eBay, Inc. 48
  • 49. Removing false positives Ÿ  Goal –  Develop a classification algorithm that will classify is a candidate is a acronym or not. Ÿ  Classification algorithm –  Decision trees ○  TreeNet data mining tool. Ÿ  Candidate are tagged with many features. Ÿ  Classifier learns on the tagged golden set. Ÿ  New candidates are then run through the classifier. eBay, Inc. 49
  • 50. Example of a Decision Tree Tid Refund Marital Taxable Splitting Attributes Status Income Cheat 1 Yes Single 125K No 2 No Married 100K No Refund No Yes No 3 No Single 70K 4 Yes Married 120K No NO MarSt 5 No Divorced 95K Yes Married Single, Divorced 6 No Married 60K No 7 Yes Divorced 220K No TaxInc NO 8 No Single 85K Yes < 80K > 80K 9 No Married 75K No NO YES 10 No Single 90K Yes Model: Decision Tree 10 Training Data eBay, Inc. 50 Acknowledgements: George Kollios, gkollios@cs.bu.edu
  • 51. Features: Neighborhood Similarity Ÿ  Rationale: Two synonym candidates A and B, will tend to have similar neighbors (viz keywords) surrounding them. Neighborhood similarity = Intersection ( Neighbours(A) , Neighbours(b) ) Min (Neighbours(a), Neighbours(b)) eBay, Inc. 51
  • 52. Features: Mutual Information Ÿ  Rationale: The goal of this metric is determine if the co-occurrence of the candidates in the description is significantly more than the random chance of them co-occurring. eBay, Inc. 52
  • 53. Features: KL divergence Ÿ  Rationale: Two synonym candidates will have similar category distributions of their inventory. eBay, Inc. 53
  • 54. Kl distance: Example Ipods: Electronics (50), Electronics (100), Ipod: Clothing Shoes and Clothing Shoes and Accessories (1) Accessories (3) Ipod: Electronics (100), T-shirt Clothing Shoes and Clothing Shoes and Accessories (1000), Accessories (3) Uniforms ( 50) KL divergence: 0.83 KL divergence: 128592.74
  • 55. Classifier Decision Tree Example KL Distance > 2.5 ≤ 2.5 False Positive Neighbourhood Similarity > 0.2 ≤ 0.2 Mutual Information False Positive > 0.003 ≤ 0.003 True Positive False Positive
  • 56. Classifier Results Ÿ  False positive rate at the candidate generation stage 20 % Ÿ  False positive rate after going through the classifier is 5.5 % Ÿ  The remaining false positives are removed by human judges. eBay, Inc. 56
  • 57. Conclusions Ÿ  We presented the state of the art algorithms for acronym mining and their limitations. Ÿ  We presented a new cost based algorithm for mining acronyms from raw text that seeks to address the limitations of the previous algorithms. Ÿ  We presented a classifier approach to remove false positives. Ÿ  We experimentally validated our approach and show it is a viable approach for mining acronyms. eBay, Inc. 57
  • 58. References Ÿ  [1] Ariel S Schwartz, Marti A. Hearst. A simple Aglorithm for Identifying Abbreviation definition in BioMedical Text. Ÿ  [2] Yongja Park, Roy J. Byrd. Hybrid text mining for finding abbreviations and their definitions. Ÿ  [3] Mathieu Roche, Violaine Prince. Managing the Acronym/Expansion Identification Process for Text-mining Applications. eBay, Inc. 58
  • 59. References(2) Ÿ  [4] Yee Fan Tan, Ergin Elmacioglu, Min-Yen Kan, Dongwon Lee. Efficient Web- Based Linkage of Short to Long Forms. Ÿ  [5] Alpa Jain, Silviu Cucerzan, Saliha Azzam. Acronym-Expansion Recognition and Ranking on the Web. Ÿ  [6]Xiaonan Ji, Gu Xu, James Bailey and Hang Li. Mining, Ranking and Using Acronym Patterns. eBay, Inc. 59