SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   1/16




   Measuring Website Similarity using
     an Entity-Aware Click Graph


 Pablo N. Mendes1, Peter Mika2, Hugo Zaragoza2, Roi Blanco2

                                 1. Freie Universität Berlin
                              2. Yahoo! Research Barcelona


                             Nov 1st 2012, Maui, CIKM 2012
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   2/16



            Introduction: query log analysis
   ●   Query logs record user interaction with Web
       search engines
   ●   Query log analysis has been proven critical to
       improving search
   ●   For search engines
        –   Ranking, autosuggest, “Also try”, etc.

   ●   For site owners
        –   insight into user needs, allows optimizing Web
            presence, etc.
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph    3/16



           Introduction: website similarity
   ●   Click graph: relating queries and websites,
          edges are clicks




                            Click graph                                    Site similarity graph (SG)

   ●   Allows modeling website relatedness based on
       shared queries leading to each website pair
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   4/16



                             Problems: Sparsity
   ●   44% of queries occur only once even when
       considering a full year of data [1]

   ●   using “shared queries” as relatedness
       measure relatedness becomes tough in the
       long tail.




         [1] Baeza-Yates. Relating content through web usage. In HT ’09, 2009.
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   5/16



                  Problems: partial overlaps




  ●   Breaking up into words distorts semantics
        –   “Forest” vs “Forest Gump”
        –   “Pitt” vs “Brad Pitt”
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   6/16



                                        Introduction
 ●   >62% of queries contain entity name or type [20]




[20] Pound, Mika, & Zaragoza. Ad-hoc object retrieval in the web of data. In WWW’10, 2010.
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   7/16



                   Entity-aware Click Graph

  ●   Websites can share
      entities and/or
      modifiers
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   8/16


          Entity-aware Website Similarity
                      Graph


 ●   More connected
 ●   Preserves semantics
 ●   Allows analysis of
     how websites relate
     to entities and modifiers
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   9/16



                                      Experiments
   ●   Website similarity
        –   Find top K similar sites
        –   Evaluation: two sites are “similar” if they are in the
            same category in ODP (Open Directory Project)

   ●   Website characteristics from the searcher POV
        –   What entities lead to a website
        –   What context words lead to a website
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   10/16



             Dataset Statistics: Query Log
   ●   1 month of queries from Yahoo!, 45M sessions
   ●   5M entities from Freebase
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   11/16



                                           Results 1
   ●   Similarity edge prediction
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   12/16



                                           Results 1
   ●   Similarity edge prediction with credit to partial
       category overlap
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph      13/16


                                           Results 2
                                Many entities
                                Few modifiers

                                                                                    Many entities
                                                                                   Many modifiers
  Entropy of
distribution of
    entities


                                                  Few entities
                                                 Many modifiers




                                                                 Entropy of
                                                          distribution of modifiers
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   14/16



                                           Results 2
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   15/16



                                         Conclusion
 ●   Recognizing entities in Web search logs allows for
     click graphs that account for internal composition of
     queries
 ●   New similarity graphs built from entity-aware click
     graphs allow enable more robust and flexible
     similarity analysis (evaluated for website similarity)

 ●   Future:
      –   Exploit the knowledge base (e.g. type hierarchy)
      –   More complex queries
      –   etc
Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph   16/16



                                         Thank you!
●   Web: http://pablomendes.com
●   E-mail: pablo.mendes@fu-berlin.de
●   Twitter: @pablomendes
●   Slideshare: slideshare.net/pablomendes



    Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Propiedades.ORG.MX SEO Website Analysis Overview Report - Digital Marketing S...
Propiedades.ORG.MX SEO Website Analysis Overview Report - Digital Marketing S...Propiedades.ORG.MX SEO Website Analysis Overview Report - Digital Marketing S...
Propiedades.ORG.MX SEO Website Analysis Overview Report - Digital Marketing S...Agencia de Mercadotecnia
 
653 discussion questions for the week
653 discussion questions for the week653 discussion questions for the week
653 discussion questions for the weeksbyrnes
 
Social Media Data Mining
Social Media Data MiningSocial Media Data Mining
Social Media Data MiningRyan Reede
 
Storytelling, social media and metrics
Storytelling, social media and metricsStorytelling, social media and metrics
Storytelling, social media and metricsMari Pierce-Quinonez
 
Facebook and Data Mining
Facebook and Data MiningFacebook and Data Mining
Facebook and Data MiningPratik Dalvi
 
Open Data Sources for Grants
Open Data Sources for GrantsOpen Data Sources for Grants
Open Data Sources for Grantsjasonparker83
 

Was ist angesagt? (7)

Propiedades.ORG.MX SEO Website Analysis Overview Report - Digital Marketing S...
Propiedades.ORG.MX SEO Website Analysis Overview Report - Digital Marketing S...Propiedades.ORG.MX SEO Website Analysis Overview Report - Digital Marketing S...
Propiedades.ORG.MX SEO Website Analysis Overview Report - Digital Marketing S...
 
653 discussion questions for the week
653 discussion questions for the week653 discussion questions for the week
653 discussion questions for the week
 
Social Media Data Mining
Social Media Data MiningSocial Media Data Mining
Social Media Data Mining
 
Davai predictive user modeling
Davai predictive user modelingDavai predictive user modeling
Davai predictive user modeling
 
Storytelling, social media and metrics
Storytelling, social media and metricsStorytelling, social media and metrics
Storytelling, social media and metrics
 
Facebook and Data Mining
Facebook and Data MiningFacebook and Data Mining
Facebook and Data Mining
 
Open Data Sources for Grants
Open Data Sources for GrantsOpen Data Sources for Grants
Open Data Sources for Grants
 

Ähnlich wie Entity Aware Click Graph

Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Connotate
 
Searchland: Search quality for Beginners
Searchland: Search quality for BeginnersSearchland: Search quality for Beginners
Searchland: Search quality for BeginnersValeria de Paiva
 
Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1Marianne Sweeny
 
Factualnote online annotation tool
Factualnote online annotation toolFactualnote online annotation tool
Factualnote online annotation toolJegadeeswaranM1
 
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLPA NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLPijnlc
 
2018 NYC Localogy: Using Data to Build Exceptional Local Pages
2018 NYC Localogy: Using Data to Build Exceptional Local Pages2018 NYC Localogy: Using Data to Build Exceptional Local Pages
2018 NYC Localogy: Using Data to Build Exceptional Local PagesLocalogy
 
UX STRAT USA, Peter Merholz, "My Journey with Experience Strategy"
UX STRAT USA, Peter Merholz, "My Journey with Experience Strategy"UX STRAT USA, Peter Merholz, "My Journey with Experience Strategy"
UX STRAT USA, Peter Merholz, "My Journey with Experience Strategy"UX STRAT
 
Understanding intent data raab
Understanding intent data raabUnderstanding intent data raab
Understanding intent data raabdraab
 
Information Organisation for the Future Web: with Emphasis to Local CIRs
Information Organisation for the Future Web: with Emphasis to Local CIRs Information Organisation for the Future Web: with Emphasis to Local CIRs
Information Organisation for the Future Web: with Emphasis to Local CIRs inventionjournals
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...cscpconf
 
Characterization of National Web Domains
Characterization of National Web DomainsCharacterization of National Web Domains
Characterization of National Web Domainswebhostingguy
 
Characterization of National Web Domains
Characterization of National Web DomainsCharacterization of National Web Domains
Characterization of National Web Domainswebhostingguy
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Editor IJARCET
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Editor IJARCET
 
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
Team of Rivals: UX, SEO, Content & Dev  UXDC 2015Team of Rivals: UX, SEO, Content & Dev  UXDC 2015
Team of Rivals: UX, SEO, Content & Dev UXDC 2015Marianne Sweeny
 

Ähnlich wie Entity Aware Click Graph (20)

Web Mining
Web MiningWeb Mining
Web Mining
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
 
Searchland: Search quality for Beginners
Searchland: Search quality for BeginnersSearchland: Search quality for Beginners
Searchland: Search quality for Beginners
 
Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1
 
Factualnote online annotation tool
Factualnote online annotation toolFactualnote online annotation tool
Factualnote online annotation tool
 
Macran
MacranMacran
Macran
 
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLPA NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
 
2018 NYC Localogy: Using Data to Build Exceptional Local Pages
2018 NYC Localogy: Using Data to Build Exceptional Local Pages2018 NYC Localogy: Using Data to Build Exceptional Local Pages
2018 NYC Localogy: Using Data to Build Exceptional Local Pages
 
Search engines
Search enginesSearch engines
Search engines
 
Search V Next Final
Search V Next FinalSearch V Next Final
Search V Next Final
 
UX STRAT USA, Peter Merholz, "My Journey with Experience Strategy"
UX STRAT USA, Peter Merholz, "My Journey with Experience Strategy"UX STRAT USA, Peter Merholz, "My Journey with Experience Strategy"
UX STRAT USA, Peter Merholz, "My Journey with Experience Strategy"
 
Understanding intent data raab
Understanding intent data raabUnderstanding intent data raab
Understanding intent data raab
 
Information Organisation for the Future Web: with Emphasis to Local CIRs
Information Organisation for the Future Web: with Emphasis to Local CIRs Information Organisation for the Future Web: with Emphasis to Local CIRs
Information Organisation for the Future Web: with Emphasis to Local CIRs
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
 
E3602042044
E3602042044E3602042044
E3602042044
 
Characterization of National Web Domains
Characterization of National Web DomainsCharacterization of National Web Domains
Characterization of National Web Domains
 
Characterization of National Web Domains
Characterization of National Web DomainsCharacterization of National Web Domains
Characterization of National Web Domains
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
Team of Rivals: UX, SEO, Content & Dev  UXDC 2015Team of Rivals: UX, SEO, Content & Dev  UXDC 2015
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
 

Mehr von Pablo Mendes

WWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL QueriesWWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL QueriesPablo Mendes
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Pablo Mendes
 
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...Pablo Mendes
 
Ligado nos Políticos at ESWC'2011 Workshop
Ligado nos Políticos at ESWC'2011 WorkshopLigado nos Políticos at ESWC'2011 Workshop
Ligado nos Políticos at ESWC'2011 WorkshopPablo Mendes
 
SMWCon Fall 2011 Lightning Talk
SMWCon Fall 2011 Lightning TalkSMWCon Fall 2011 Lightning Talk
SMWCon Fall 2011 Lightning TalkPablo Mendes
 
DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011Pablo Mendes
 
Dados Ligados (Linked Data) CONSEGI 2011
Dados Ligados (Linked Data) CONSEGI 2011Dados Ligados (Linked Data) CONSEGI 2011
Dados Ligados (Linked Data) CONSEGI 2011Pablo Mendes
 
Cuebee Architecture
Cuebee ArchitectureCuebee Architecture
Cuebee ArchitecturePablo Mendes
 
Twarql Architecture - Streaming Annotated Tweets
Twarql Architecture - Streaming Annotated TweetsTwarql Architecture - Streaming Annotated Tweets
Twarql Architecture - Streaming Annotated TweetsPablo Mendes
 
Dynamic Associative Relationships on the Linked Open Data Web
Dynamic Associative Relationships on the Linked Open Data WebDynamic Associative Relationships on the Linked Open Data Web
Dynamic Associative Relationships on the Linked Open Data WebPablo Mendes
 

Mehr von Pablo Mendes (10)

WWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL QueriesWWW2012 Tutorial Visualizing SPARQL Queries
WWW2012 Tutorial Visualizing SPARQL Queries
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012
 
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Ber...
 
Ligado nos Políticos at ESWC'2011 Workshop
Ligado nos Políticos at ESWC'2011 WorkshopLigado nos Políticos at ESWC'2011 Workshop
Ligado nos Políticos at ESWC'2011 Workshop
 
SMWCon Fall 2011 Lightning Talk
SMWCon Fall 2011 Lightning TalkSMWCon Fall 2011 Lightning Talk
SMWCon Fall 2011 Lightning Talk
 
DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011
 
Dados Ligados (Linked Data) CONSEGI 2011
Dados Ligados (Linked Data) CONSEGI 2011Dados Ligados (Linked Data) CONSEGI 2011
Dados Ligados (Linked Data) CONSEGI 2011
 
Cuebee Architecture
Cuebee ArchitectureCuebee Architecture
Cuebee Architecture
 
Twarql Architecture - Streaming Annotated Tweets
Twarql Architecture - Streaming Annotated TweetsTwarql Architecture - Streaming Annotated Tweets
Twarql Architecture - Streaming Annotated Tweets
 
Dynamic Associative Relationships on the Linked Open Data Web
Dynamic Associative Relationships on the Linked Open Data WebDynamic Associative Relationships on the Linked Open Data Web
Dynamic Associative Relationships on the Linked Open Data Web
 

Entity Aware Click Graph

  • 1. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 1/16 Measuring Website Similarity using an Entity-Aware Click Graph Pablo N. Mendes1, Peter Mika2, Hugo Zaragoza2, Roi Blanco2 1. Freie Universität Berlin 2. Yahoo! Research Barcelona Nov 1st 2012, Maui, CIKM 2012
  • 2. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 2/16 Introduction: query log analysis ● Query logs record user interaction with Web search engines ● Query log analysis has been proven critical to improving search ● For search engines – Ranking, autosuggest, “Also try”, etc. ● For site owners – insight into user needs, allows optimizing Web presence, etc.
  • 3. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 3/16 Introduction: website similarity ● Click graph: relating queries and websites, edges are clicks Click graph Site similarity graph (SG) ● Allows modeling website relatedness based on shared queries leading to each website pair
  • 4. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 4/16 Problems: Sparsity ● 44% of queries occur only once even when considering a full year of data [1] ● using “shared queries” as relatedness measure relatedness becomes tough in the long tail. [1] Baeza-Yates. Relating content through web usage. In HT ’09, 2009.
  • 5. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 5/16 Problems: partial overlaps ● Breaking up into words distorts semantics – “Forest” vs “Forest Gump” – “Pitt” vs “Brad Pitt”
  • 6. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 6/16 Introduction ● >62% of queries contain entity name or type [20] [20] Pound, Mika, & Zaragoza. Ad-hoc object retrieval in the web of data. In WWW’10, 2010.
  • 7. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 7/16 Entity-aware Click Graph ● Websites can share entities and/or modifiers
  • 8. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 8/16 Entity-aware Website Similarity Graph ● More connected ● Preserves semantics ● Allows analysis of how websites relate to entities and modifiers
  • 9. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 9/16 Experiments ● Website similarity – Find top K similar sites – Evaluation: two sites are “similar” if they are in the same category in ODP (Open Directory Project) ● Website characteristics from the searcher POV – What entities lead to a website – What context words lead to a website
  • 10. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 10/16 Dataset Statistics: Query Log ● 1 month of queries from Yahoo!, 45M sessions ● 5M entities from Freebase
  • 11. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 11/16 Results 1 ● Similarity edge prediction
  • 12. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 12/16 Results 1 ● Similarity edge prediction with credit to partial category overlap
  • 13. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 13/16 Results 2 Many entities Few modifiers Many entities Many modifiers Entropy of distribution of entities Few entities Many modifiers Entropy of distribution of modifiers
  • 14. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 14/16 Results 2
  • 15. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 15/16 Conclusion ● Recognizing entities in Web search logs allows for click graphs that account for internal composition of queries ● New similarity graphs built from entity-aware click graphs allow enable more robust and flexible similarity analysis (evaluated for website similarity) ● Future: – Exploit the knowledge base (e.g. type hierarchy) – More complex queries – etc
  • 16. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 16/16 Thank you! ● Web: http://pablomendes.com ● E-mail: pablo.mendes@fu-berlin.de ● Twitter: @pablomendes ● Slideshare: slideshare.net/pablomendes Questions?