SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Dr. Searcher and Mr. Browser:
       A unified hyperlink-click graph
Barbara Poblete, Carlos Castillo, Aristides Gionis
    University Pompeu Fabra, Yahoo! Research
                                Barcelona, Spain
                                      CIKM 2008
In this talk




       New unified web graph: hyperlink-click graph
       Union of the hyperlink graph and the click graph
       Random walk generates robust results that match or are
       better than the results obtained on either individual graph
Motivation



     Two types of web graphs: click and hyperlink graph
     Represent two most common tasks of users:
     searching and browsing
     Correspond to two prototypical ways of looking for
     information: asking and exploring
     Edges capture semantic relations among nodes:
     e.g., similarity and authority endorsement
Motivation...


  Drawbacks:
    § Hyperlink graph: adversarial increase in PageRank scores,
      i.e., spam or link farms
    § Click graph: sparsity, inherent bias in web-search engine
      ranking, dependency on textual match and click spam

  Our claim: Hyperlink and click graphs are complementary and
  can be used to alleviate the shortcomings of each other
Motivation...


  Drawbacks:
    § Hyperlink graph: adversarial increase in PageRank scores,
      i.e., spam or link farms
    § Click graph: sparsity, inherent bias in web-search engine
      ranking, dependency on textual match and click spam

  Our claim: Hyperlink and click graphs are complementary and
  can be used to alleviate the shortcomings of each other
Contribution

  Introduce the hyperlink-click graph,
  union of hyperlink and click graph
  Consider random walk on the hyperlink-click graph

    1   Model user behavior more accurately
    2   Show that ranking with the hyperlink-click graph is similar
        to the best performance by either of the two graphs
    3   The unified graph compensates where either graph
        performs poorly
    4   more robust and fail-safe
Applications



  Uses of the hyperlink-click graph:

      Ranking of documents
      Query ranking and query recommendation
      Similarity search
      Spam detection
Web graphs
                               hyperlink-click graph


    clicked                     clicked     neighbor documents
    queries                   documents




              frequency(q)




                click graph

                                            hyperlink graph
Random walks on web graphs


  Random walk on the hyperlink graph
      PH = αNH + (1 − α)1H
  Random walk on the click graph
      PC = αNC + (1 − α)1C
  Random walk on the hyperlink-click graph
      PHC = αβNC + α(1 − β)NH + (1 − α)1
Random walks on web graphs


  Random walk on the hyperlink graph
      PH = αNH + (1 − α)1H
  Random walk on the click graph
      PC = αNC + (1 − α)1C
  Random walk on the hyperlink-click graph
      PHC = αβNC + α(1 − β)NH + (1 − α)1
Random walks on web graphs


  Random walk on the hyperlink graph
      PH = αNH + (1 − α)1H
  Random walk on the click graph
      PC = αNC + (1 − α)1C
  Random walk on the hyperlink-click graph
      PHC = αβNC + α(1 − β)NH + (1 − α)1
Web graphs
                               hyperlink-click graph


    clicked                     clicked     neighbor documents
    queries                   documents




              frequency(q)




                click graph

                                            hyperlink graph
Evaluation and results




      Validate utility of the random walk scores on the
      hyperlink-click graph
      Compare scores with those produced from the hyperlink
      and the click graph
Evaluation and results...



   We focus on two tasks in which a good ranking method
   should perform well:

       ranking high-quality documents and
       ranking pairs of documents

   The evaluation is centered on analyzing the dissimilarities
   among the different models
Evaluation and results...


   Dataset:
       Yahoo! query log
       9 K seed documents extracted from the query log
       - documents with at least 10 clicks
       61 K queries with at least one click to the seed documents
       Using a web crawl expanded to 144 million documents
       The expansion considers all in- and out-neighbors of all
       documents in the seed set seed set.
Evaluation and results...




   Two types of data:
       Without sponsored results: query log only with
       algorithmic results
       With sponsored results:
Evaluation and results...


   Random walk evaluation
      we compute scores on the complete datasets, but
      we evaluate only on documents in the intersection of the
      three graphs
      Two tasks:
        1   ranking high-quality documents (dmoz),
        2   ranking pairs of documents (user evaluation).
Evaluation and results...


   dmoz:
   ΠZ : Our first measure is the normalized sum of the π scores
        of DZ documents.

                                   d∈DZπ(d)
                          ΠZ =
                                  d∈DC π(d)

   ΓZ : Goodman-Kruskal Gamma Γ measure between the
        rankings
                               D −A
                           Γ=
                               D +A
Evaluation and results...


   User study:

       13 users
       1 710 accessments
       users expressed not neutral opinion in 32% of document
       pairs
       Use Γ to measure pairs of documents on which rankings
       agree with users
Evaluation and results...


   dmoz:
      Macro-evaluation: captures the overall scores of
      high-quality documents
       ΠZ and ΓZ are computed considering all the documents in
       the evaluation set
       Micro-evaluation: at query level
       ΠZ and ΓZ in the evaluation set are reduced to only those
       documents clicked from a particular query.

   (User study: micro-evaluation)
Evaluation and results...


   dmoz:
      Macro-evaluation: captures the overall scores of
      high-quality documents
       ΠZ and ΓZ are computed considering all the documents in
       the evaluation set
       Micro-evaluation: at query level
       ΠZ and ΓZ in the evaluation set are reduced to only those
       documents clicked from a particular query.

   (User study: micro-evaluation)
Evaluation and results...


   dmoz
     metric                        macro
              without sponsored results with sponsored results
       ΓZ         GC ≈ GHC > GH           GC > GHC > GH
       ΠZ         GH ≈ GHC > GC           GH ≈ GHC > GC

     metric                        micro
              without sponsored results with sponsored results
       ΓZ         GC ≈ GHC > GH           GHC ≈ GC > GH
       ΠZ         GHC ≈ GC > GH           GHC ≈ GH ≈ GC
Evaluation and results...




   User study

      metric    without sponsored results   with sponsored results
        Γ           GC > GHC > GH             GH > GHC > GC
Evaluation and results...
   Exclude documents with very similar scores
Evaluation and results...

         Summary of ΓZ values for dmoz:
         1.000
                                                                                                   click graph
                                                                                                   hyperlink-click graph
                                                                                                   hyperlink graph
         0.800




         0.600




         0.400
 value




         0.200




         0.000




         -0.200
                  Macro (with variations)   Macro (no variations)   Micro (with variations)   Micro (no variations)
Evaluation and results...

         Summary of ΠZ values for dmoz:
         0.900
                                                                                                  click graph
                                                                                                  hyperlink-click graph
         0.800                                                                                    hyperlink graph




         0.700



         0.600
 value




         0.500



         0.400



         0.300



         0.200
                 Macro (with variations)   Macro (no variations)   Micro (with variations)   Micro (no variations)
Conclusions



      We study random walk on a unified web graph
      Intended to model user searching and browsing behavior
      Several combinations of metrics are used for evaluation

  Experimental evaluation shows:
      The unified graph is always close to the best performance
      of either the click or the hyperlink graph
Future work



     Analyze how to deal with the inherent bias that exists in
     any ranking technique based on usage mining
     Study other web mining applications
       1   Link and click spam detection
       2   Similarity search
       3   Query recommendation
Thank you!

Weitere ähnliche Inhalte

Andere mochten auch

Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iLaks Lakshmanan
 
Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivLaks Lakshmanan
 
Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiLaks Lakshmanan
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaMuhammad Imran
 
What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...Carlos Castillo (ChaTo)
 
Emotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaEmotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaDavid Laniado
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiLaks Lakshmanan
 
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...Artificial Intelligence Institute at UofSC
 

Andere mochten auch (12)

Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-i
 
Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-iv
 
Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-ii
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social Media
 
What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Crisis Computing
Crisis ComputingCrisis Computing
Crisis Computing
 
Emotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaEmotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of Wikipedia
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iii
 
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
 
Social Media Mining and Retrieval
Social Media Mining and RetrievalSocial Media Mining and Retrieval
Social Media Mining and Retrieval
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 

Ähnlich wie Dr. Searcher and Mr. Browser: A unified hyperlink-click graph

Web Crawling and Reinforcement Learning
Web Crawling and Reinforcement LearningWeb Crawling and Reinforcement Learning
Web Crawling and Reinforcement LearningFrancesco Gadaleta
 
Web Page Ranking using Machine Learning
Web Page Ranking using Machine LearningWeb Page Ranking using Machine Learning
Web Page Ranking using Machine LearningPradip Rahul
 
page rank explication et exemple formule
page rank explication et exemple  formulepage rank explication et exemple  formule
page rank explication et exemple formuleRamiHarrathi1
 
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfMarkov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfrayyverma
 
Optimized interleaving for online retrieval evaluation
Optimized interleaving for online retrieval evaluationOptimized interleaving for online retrieval evaluation
Optimized interleaving for online retrieval evaluationHan Jiang
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Margaret Wang
 
Linkanalysis handout
Linkanalysis handoutLinkanalysis handout
Linkanalysis handoutcsedays
 
Page rank talk at NTU-EE
Page rank talk at NTU-EEPage rank talk at NTU-EE
Page rank talk at NTU-EEPing Yeh
 
Search engine page rank demystification
Search engine page rank demystificationSearch engine page rank demystification
Search engine page rank demystificationRaja R
 
PageRank & Searching
PageRank & SearchingPageRank & Searching
PageRank & Searchingrahulbindra
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsWQ Fan
 
Click Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsClick Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsAleksandr Chuklin
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.pptrayyverma
 
Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学Xu jiakon
 

Ähnlich wie Dr. Searcher and Mr. Browser: A unified hyperlink-click graph (20)

Web Crawling and Reinforcement Learning
Web Crawling and Reinforcement LearningWeb Crawling and Reinforcement Learning
Web Crawling and Reinforcement Learning
 
Web Page Ranking using Machine Learning
Web Page Ranking using Machine LearningWeb Page Ranking using Machine Learning
Web Page Ranking using Machine Learning
 
page rank explication et exemple formule
page rank explication et exemple  formulepage rank explication et exemple  formule
page rank explication et exemple formule
 
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfMarkov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdf
 
Optimized interleaving for online retrieval evaluation
Optimized interleaving for online retrieval evaluationOptimized interleaving for online retrieval evaluation
Optimized interleaving for online retrieval evaluation
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
 
Linkanalysis handout
Linkanalysis handoutLinkanalysis handout
Linkanalysis handout
 
Social (1)
Social (1)Social (1)
Social (1)
 
Page rank talk at NTU-EE
Page rank talk at NTU-EEPage rank talk at NTU-EE
Page rank talk at NTU-EE
 
Dm page rank
Dm page rankDm page rank
Dm page rank
 
Search engine page rank demystification
Search engine page rank demystificationSearch engine page rank demystification
Search engine page rank demystification
 
PageRank & Searching
PageRank & SearchingPageRank & Searching
PageRank & Searching
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
 
Page rank2
Page rank2Page rank2
Page rank2
 
GitConnect
GitConnectGitConnect
GitConnect
 
Click Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsClick Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval Metrics
 
Mazhiming
MazhimingMazhiming
Mazhiming
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.ppt
 
Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学
 
I04015559
I04015559I04015559
I04015559
 

Mehr von Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Carlos Castillo (ChaTo)
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 

Mehr von Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 

Kürzlich hochgeladen

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 

Kürzlich hochgeladen (20)

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 

Dr. Searcher and Mr. Browser: A unified hyperlink-click graph

  • 1. Dr. Searcher and Mr. Browser: A unified hyperlink-click graph Barbara Poblete, Carlos Castillo, Aristides Gionis University Pompeu Fabra, Yahoo! Research Barcelona, Spain CIKM 2008
  • 2. In this talk New unified web graph: hyperlink-click graph Union of the hyperlink graph and the click graph Random walk generates robust results that match or are better than the results obtained on either individual graph
  • 3. Motivation Two types of web graphs: click and hyperlink graph Represent two most common tasks of users: searching and browsing Correspond to two prototypical ways of looking for information: asking and exploring Edges capture semantic relations among nodes: e.g., similarity and authority endorsement
  • 4. Motivation... Drawbacks: § Hyperlink graph: adversarial increase in PageRank scores, i.e., spam or link farms § Click graph: sparsity, inherent bias in web-search engine ranking, dependency on textual match and click spam Our claim: Hyperlink and click graphs are complementary and can be used to alleviate the shortcomings of each other
  • 5. Motivation... Drawbacks: § Hyperlink graph: adversarial increase in PageRank scores, i.e., spam or link farms § Click graph: sparsity, inherent bias in web-search engine ranking, dependency on textual match and click spam Our claim: Hyperlink and click graphs are complementary and can be used to alleviate the shortcomings of each other
  • 6. Contribution Introduce the hyperlink-click graph, union of hyperlink and click graph Consider random walk on the hyperlink-click graph 1 Model user behavior more accurately 2 Show that ranking with the hyperlink-click graph is similar to the best performance by either of the two graphs 3 The unified graph compensates where either graph performs poorly 4 more robust and fail-safe
  • 7. Applications Uses of the hyperlink-click graph: Ranking of documents Query ranking and query recommendation Similarity search Spam detection
  • 8. Web graphs hyperlink-click graph clicked clicked neighbor documents queries documents frequency(q) click graph hyperlink graph
  • 9. Random walks on web graphs Random walk on the hyperlink graph PH = αNH + (1 − α)1H Random walk on the click graph PC = αNC + (1 − α)1C Random walk on the hyperlink-click graph PHC = αβNC + α(1 − β)NH + (1 − α)1
  • 10. Random walks on web graphs Random walk on the hyperlink graph PH = αNH + (1 − α)1H Random walk on the click graph PC = αNC + (1 − α)1C Random walk on the hyperlink-click graph PHC = αβNC + α(1 − β)NH + (1 − α)1
  • 11. Random walks on web graphs Random walk on the hyperlink graph PH = αNH + (1 − α)1H Random walk on the click graph PC = αNC + (1 − α)1C Random walk on the hyperlink-click graph PHC = αβNC + α(1 − β)NH + (1 − α)1
  • 12. Web graphs hyperlink-click graph clicked clicked neighbor documents queries documents frequency(q) click graph hyperlink graph
  • 13. Evaluation and results Validate utility of the random walk scores on the hyperlink-click graph Compare scores with those produced from the hyperlink and the click graph
  • 14. Evaluation and results... We focus on two tasks in which a good ranking method should perform well: ranking high-quality documents and ranking pairs of documents The evaluation is centered on analyzing the dissimilarities among the different models
  • 15. Evaluation and results... Dataset: Yahoo! query log 9 K seed documents extracted from the query log - documents with at least 10 clicks 61 K queries with at least one click to the seed documents Using a web crawl expanded to 144 million documents The expansion considers all in- and out-neighbors of all documents in the seed set seed set.
  • 16. Evaluation and results... Two types of data: Without sponsored results: query log only with algorithmic results With sponsored results:
  • 17. Evaluation and results... Random walk evaluation we compute scores on the complete datasets, but we evaluate only on documents in the intersection of the three graphs Two tasks: 1 ranking high-quality documents (dmoz), 2 ranking pairs of documents (user evaluation).
  • 18. Evaluation and results... dmoz: ΠZ : Our first measure is the normalized sum of the π scores of DZ documents. d∈DZπ(d) ΠZ = d∈DC π(d) ΓZ : Goodman-Kruskal Gamma Γ measure between the rankings D −A Γ= D +A
  • 19. Evaluation and results... User study: 13 users 1 710 accessments users expressed not neutral opinion in 32% of document pairs Use Γ to measure pairs of documents on which rankings agree with users
  • 20. Evaluation and results... dmoz: Macro-evaluation: captures the overall scores of high-quality documents ΠZ and ΓZ are computed considering all the documents in the evaluation set Micro-evaluation: at query level ΠZ and ΓZ in the evaluation set are reduced to only those documents clicked from a particular query. (User study: micro-evaluation)
  • 21. Evaluation and results... dmoz: Macro-evaluation: captures the overall scores of high-quality documents ΠZ and ΓZ are computed considering all the documents in the evaluation set Micro-evaluation: at query level ΠZ and ΓZ in the evaluation set are reduced to only those documents clicked from a particular query. (User study: micro-evaluation)
  • 22. Evaluation and results... dmoz metric macro without sponsored results with sponsored results ΓZ GC ≈ GHC > GH GC > GHC > GH ΠZ GH ≈ GHC > GC GH ≈ GHC > GC metric micro without sponsored results with sponsored results ΓZ GC ≈ GHC > GH GHC ≈ GC > GH ΠZ GHC ≈ GC > GH GHC ≈ GH ≈ GC
  • 23. Evaluation and results... User study metric without sponsored results with sponsored results Γ GC > GHC > GH GH > GHC > GC
  • 24. Evaluation and results... Exclude documents with very similar scores
  • 25. Evaluation and results... Summary of ΓZ values for dmoz: 1.000 click graph hyperlink-click graph hyperlink graph 0.800 0.600 0.400 value 0.200 0.000 -0.200 Macro (with variations) Macro (no variations) Micro (with variations) Micro (no variations)
  • 26. Evaluation and results... Summary of ΠZ values for dmoz: 0.900 click graph hyperlink-click graph 0.800 hyperlink graph 0.700 0.600 value 0.500 0.400 0.300 0.200 Macro (with variations) Macro (no variations) Micro (with variations) Micro (no variations)
  • 27. Conclusions We study random walk on a unified web graph Intended to model user searching and browsing behavior Several combinations of metrics are used for evaluation Experimental evaluation shows: The unified graph is always close to the best performance of either the click or the hyperlink graph
  • 28. Future work Analyze how to deal with the inherent bias that exists in any ranking technique based on usage mining Study other web mining applications 1 Link and click spam detection 2 Similarity search 3 Query recommendation