SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Seminar Series
   Social Information Systems

                    Manos Papagelis
Department of Computer Science, University of Toronto
              papaggel@cs.toronto.edu




             Toronto, Spring, 2007
     Computer Science Department, University of Toronto   1
Presentation Outline

Part I: Exploiting Social Networks for Internet Search
Part II: An Experimental Study of the Coloring Problem on Human
  Subject Networks




                                                                  2
Exploiting Social Networks for Internet Search
 Alan Mislove, Krishna Gummadi, and Peter Druschel, HotNets 2006



                                    Part I




                 Computer Science Department, University of Toronto   3
Introduction

 Social Networking (SN)
  A new form of publishing and locating information
 Objective
  To understand whether these social links can be exploited by search
  engines to provide better results
 Contributions
   • Comparison of the mechanisms in Web and online SN for
        Publishing: Mechanisms to make information available to users
        Locating: Mechanisms to find information
   • Results from an experiment in social network-based Web Search
   • Challenges and opportunities in using Social Networks for
     Internet Search
                                                                         4
Web vs. SN (1/2)

Web
 Publishing: By placing documents on a Web Server (and then search
  for incoming links)
 Locating: Via Search engines (Exploiting the link graph)

Pros
 Very Effective (incoming links are good indicators of importance)

Limitations
 No fresh data
 No personalized results
 Unlinked pages are not indexed
                                                                      5
Web vs. SN (2/2)

Social Networks
 Publishing: No explicit links between content (photos, videos, blogs)
  but implicit links between content through explicit links between
  users.
 Locating:
   • Navigation through the social network and browsing users’
      content
   • Keyword based search for textual or tagged content
   • Through "Top-10" lists

Pros
 Helps a user find timely, relevant information by browsing adjacent
   regions of the network of users with similar interests
 Content is rated rapidly (by comments and feedback of a community)
                                                                          6
Integration of Web Search and SN

 Web and SN information is disjoint
 No unified search tool that locates information across different
  systems




                                                                     7
PeerSpective: SN-based Web Search

 Technology:
   • Lucene text search engine and FreePastry P2P Overlay
   • Lightweight HTTP Proxy transparently indexes all visited URLs of
     user




                                                                        8
Searching Process

 A query is submitted by a user to Google
 The proxy transparently forwards the query to both Google and the
  Proxies of Users in the network
 Each proxy executes the query on the local index
 Results are then collated and presented alongside Google results
 Peerspective Ranking:
  Lucene Sc. + Pagerank + Scores from users who previously viewed the result




                                                                               9
Search Results Example




                         10
Experiments

   10 grad. students share downloaded or viewed Web content
   One month long experiments
   200.000 Distinct URLs
   25% were of type text/html or application/pdf (so the can be indexed)

Reports On:
 Limits of hyperlink-based search
 Benefits of SN-based Search




                                                                        11
Limits of hyperlink-based search

 Report on fraction of visited URLs that are not indexed by Google
   • Too new page (blogs)
   • Deep Web
   • Dark Web (no links)

Results
 About 1/3 of requests cannot be retrieved by Google
 Peerspective’s indices covers 30% of the requested URLs
 13.3% of URLs were contained in PeerSpective but not in Google's
  index



                                                                      12
Random samples of URLs not in Google and Potential Reason




                                                        13
Benefits of SN-based Search

 Experiments on clicks on results on first page
   For 1730 queries (1079 resulted in clicks)

Results
 86.5% of the clicked results were returned only by Google
 5.7% of the clicked results were returned by both
 7.7% of the clicked results were returned only by PeerSpective

Conclusions
 This 7.7% is considered to be the gold standard of web search
  engineering
 Inherent advantage of using social links in web search
                                                                   14
Reasons for Clicks on Peerspective

 Disambiguation
  Community tend to share definitions or interpretation of popular
  terms (bus)
 Ranking
  SN information can bias the ranking algorithms to the interests of
  users (CoolStreaming)
 Serendipity
  Ample opportunity of finding interesting things without searching




                                                                       15
Example of URLs found in Peerspective




                                        16
Opportunities and Challenges

 Privacy
   • Willingness of users to disclose information
   • Need for mechanisms to control information flow and anonymity
 Membership and Clustering of SN
   • Users may participate in many networks
   • Need for searching with respect to the different clusters
 Content rating and ranking
   • New approaches to ranking search results
   • System Architecture: centralized or Distributed?




                                                                     17
An Experimental Study of the Coloring Problem on Human
                   Subject Networks
    Michael Kearns, Siddharth Suri, Nick Montfort, SCIENCE, (313), Aug 2006



                                        Part II




                      Computer Science Department, University of Toronto      18
Experimental Study on Human Subject Networks

 Theoretical work suggests that structural properties of naturally
  occurring networks are important in shaping behavior and dynamics
   • E.g. Hubs in networks are important in routing information
 Empirical Structural Properties established by many disciplines
   • Small Diameter (the “six” degrees of separation)
   • Local clustering of connectivity
   • Heavy-tailed distribution of connectivity (Power-law distributions)
 Empirical Studies of Networks
   • Limitation: Networks are fixed and given (no alternatives)
   • Other approach: Controlled laboratory study



                                                                       19
Experiment

 Experimental Scenario
   • Distributed problem-solving from local information
 Experimental Setting
   • 38 human subjects (network vertices)
   • Each subject controls the color of a vertex in a network
   • Networks: simple and more complex
   • Goal: Select a different color from that of all neighbors
   • Problem: Coloring problem
   • Information Available: Variable (Low, Medium, High)




                                                                 20
Graph Coloring Problem

 Graph coloring
  An assignment of "colors" to certain objects in a graph such that no
  two adjacent objects are assigned the same color
 Graph Coloring Problem
  Find the minimum number of colors for an arbitrary graph (NP-hard)
 Chromatic number
  The least number of colors needed to color the graph

Example
 Vertex coloring
 A 3-coloring suits this graph but fewer colors
  would result in adjacent vertices of the same
  color
                                                                         21
Network Topologies




   Simple Cycle      5-Chord Cycle    20-Chord Cycle




   Leader Cycle      Pref. Att. v=2     Pref. Att. v=3
                                                         22
Information View

             Low                     Medium                      All
  (Color of each Neighbor) (#of Links of each Neighbor)    (All network)

                                        3

                                3               6

                                                              YOU
           YOU                         YOU



                                7              10




     Overall Progress            Overall Progress         Overall Progress
                                                                             23
Graph Properties and Experimental Results

 Graph                   Graph Properties                   Experimental Results
                                                       Avg. Exp.   # Exp.
               Colors    Min        Max       Avg.                            No. of
                                                       Duration    Solved
              Required   Links      Links   Distance                         Changes
                                                         (sec)      (sec)
  Simple
                 2         2         2        9.76      144.17      5/6        378
  Cycle
 5-Chord
                 2         2         4        5.63      121.14      7/7        687
  Cycle
20-Chord
                 2         2         7        3.34      65.67       6/6       8265
 Cycle
 Leader
                 2         3         19       2.31      40.86       7/7       8797
  Cycle
 Pref. Att.
                 3         2         13       2.63      219.67      2/6       1744
   V=2
 Pref. Att.
                 4         3         22       2.08      154.83      4/6       4703
   V=3
                                                                                     24
1: Collective Performance

 Subjects could indeed solve the coloring problem across a wide
  range of networks
   • 31/38 experiments ended in solution in less that 300 seconds
   • 82 sec mean completion time
 Collective Performance affected by network structure
   • Preferential Attachment harder than Cycle-based networks
 Cycle-based networks:
   • Monotonic relationship between solution time and average
     network distance (smaller distance leading to shorter solution
     times)
 Addition of random chords: Systematically reduces solution time


                                                                      25
2: Human Performance VS Artificial Distributed Heuristics

Heuristic considered:
 A vertex is randomly selected
   • If there are unused colors in the neighbor of this vertex then a
      color is selected randomly from the available ones
   • If there are not unused then a color is selected randomly

Comparison measure
 Number of vertex color changes

Findings:
 Results exactly reversed: lower average distance increases the
   difficulty for the heuristic
 Preferential attachment networks easier for the heuristic

                                                                        26
3: Effects on Varying the Locality of Information View

 Variable locality information provided to subjects
   • Low: Their own and neighboring colors are visible
   • Medium: Their own and neighboring colors are visible but
     providing information on connectivity of neighbors
   • High: global coloring state at all times

Findings:
 Increased amount of information
    • Reduces solution times for cycle-based networks
    • Decreases solution times for preferential attachment networks
    • Rapid convergence to one of the two solutions in cycle-based
      networks
                                                                      27
Information View Effect 1: Pref. Att. VS Cycle-based Networks


                             Avg. Experiment Duration

                      350
                      300
                      250
     Time (seconds)




                      200                                   Cycles
                      150                                   Pref. Att.
                      100
                      50
                       0
                            Low       Medium         High
                                  Information View

                                                                         28
Information View Effect 2: Cycle-based Solution Convergence


        Low Information View            High Information View




  Population oscillates between    Rapid convergence to one of the
 approaches to the two solutions       Two possible solutions

                                                                     29
Individual Strategies

   Choosing colors that result in the fewest local conflicts
   Attempt to avoid conflicts with highly connected subjects
   Signaling behavior of subjects
   Introducing conflicts to avoid local minima




                                                                30
Questions?




Computer Science Department, University of Toronto   31
Thanks!




Computer Science Department, University of Toronto   32

Weitere ähnliche Inhalte

Was ist angesagt?

Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
Charalampos Chelmis
 
Toward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k ProcessingToward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k Processing
asapteam
 

Was ist angesagt? (9)

Multidimensional Patterns of Disturbance in Digital Social Networks
Multidimensional Patterns of Disturbance in Digital Social NetworksMultidimensional Patterns of Disturbance in Digital Social Networks
Multidimensional Patterns of Disturbance in Digital Social Networks
 
Recomendation system: Community Detection Based Recomendation System using Hy...
Recomendation system: Community Detection Based Recomendation System using Hy...Recomendation system: Community Detection Based Recomendation System using Hy...
Recomendation system: Community Detection Based Recomendation System using Hy...
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
 
Ppt
PptPpt
Ppt
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
Toward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k ProcessingToward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k Processing
 
1 chayes
1 chayes1 chayes
1 chayes
 
Multi-mediated community structure in a socio-technical network
Multi-mediated community structure in a socio-technical networkMulti-mediated community structure in a socio-technical network
Multi-mediated community structure in a socio-technical network
 
Action and content based Community Detection in Social Networks
Action and content based Community Detection in Social NetworksAction and content based Community Detection in Social Networks
Action and content based Community Detection in Social Networks
 

Ähnlich wie Manos

Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph Algorithms
Databricks
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...
IJDKP
 

Ähnlich wie Manos (20)

Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
 
20191107 deeplearningapproachesfornetworks
20191107 deeplearningapproachesfornetworks20191107 deeplearningapproachesfornetworks
20191107 deeplearningapproachesfornetworks
 
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
Exploring Generative Models of Tripartite Graphs for Recommendation in Social...
 
2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial
 
2009 - Node XL v.84+ - Social Media Network Visualization Tools For Excel 2007
2009 - Node XL v.84+ - Social Media Network Visualization Tools For Excel 20072009 - Node XL v.84+ - Social Media Network Visualization Tools For Excel 2007
2009 - Node XL v.84+ - Social Media Network Visualization Tools For Excel 2007
 
Keynote at AImWD
Keynote at AImWDKeynote at AImWD
Keynote at AImWD
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for Developers
 
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
 
Network Visualization and Analysis with Cytoscape
Network Visualization and Analysis with CytoscapeNetwork Visualization and Analysis with Cytoscape
Network Visualization and Analysis with Cytoscape
 
Community detection in complex social networks
Community detection in complex social networksCommunity detection in complex social networks
Community detection in complex social networks
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
 
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
Only Time Will Tell: Modelling Information Diffusion in Code Review with Time...
 
Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph Algorithms
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...
 
Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.
 
Cytoscape Network Visualization and Analysis
Cytoscape Network Visualization and AnalysisCytoscape Network Visualization and Analysis
Cytoscape Network Visualization and Analysis
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for Graphs
 
Jürgens diata12-communities
Jürgens diata12-communitiesJürgens diata12-communities
Jürgens diata12-communities
 
Social Network Analysis Using Gephi
Social Network Analysis Using Gephi Social Network Analysis Using Gephi
Social Network Analysis Using Gephi
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Manos

  • 1. Seminar Series Social Information Systems Manos Papagelis Department of Computer Science, University of Toronto papaggel@cs.toronto.edu Toronto, Spring, 2007 Computer Science Department, University of Toronto 1
  • 2. Presentation Outline Part I: Exploiting Social Networks for Internet Search Part II: An Experimental Study of the Coloring Problem on Human Subject Networks 2
  • 3. Exploiting Social Networks for Internet Search Alan Mislove, Krishna Gummadi, and Peter Druschel, HotNets 2006 Part I Computer Science Department, University of Toronto 3
  • 4. Introduction  Social Networking (SN) A new form of publishing and locating information  Objective To understand whether these social links can be exploited by search engines to provide better results  Contributions • Comparison of the mechanisms in Web and online SN for  Publishing: Mechanisms to make information available to users  Locating: Mechanisms to find information • Results from an experiment in social network-based Web Search • Challenges and opportunities in using Social Networks for Internet Search 4
  • 5. Web vs. SN (1/2) Web  Publishing: By placing documents on a Web Server (and then search for incoming links)  Locating: Via Search engines (Exploiting the link graph) Pros  Very Effective (incoming links are good indicators of importance) Limitations  No fresh data  No personalized results  Unlinked pages are not indexed 5
  • 6. Web vs. SN (2/2) Social Networks  Publishing: No explicit links between content (photos, videos, blogs) but implicit links between content through explicit links between users.  Locating: • Navigation through the social network and browsing users’ content • Keyword based search for textual or tagged content • Through "Top-10" lists Pros  Helps a user find timely, relevant information by browsing adjacent regions of the network of users with similar interests  Content is rated rapidly (by comments and feedback of a community) 6
  • 7. Integration of Web Search and SN  Web and SN information is disjoint  No unified search tool that locates information across different systems 7
  • 8. PeerSpective: SN-based Web Search  Technology: • Lucene text search engine and FreePastry P2P Overlay • Lightweight HTTP Proxy transparently indexes all visited URLs of user 8
  • 9. Searching Process  A query is submitted by a user to Google  The proxy transparently forwards the query to both Google and the Proxies of Users in the network  Each proxy executes the query on the local index  Results are then collated and presented alongside Google results  Peerspective Ranking: Lucene Sc. + Pagerank + Scores from users who previously viewed the result 9
  • 11. Experiments  10 grad. students share downloaded or viewed Web content  One month long experiments  200.000 Distinct URLs  25% were of type text/html or application/pdf (so the can be indexed) Reports On:  Limits of hyperlink-based search  Benefits of SN-based Search 11
  • 12. Limits of hyperlink-based search  Report on fraction of visited URLs that are not indexed by Google • Too new page (blogs) • Deep Web • Dark Web (no links) Results  About 1/3 of requests cannot be retrieved by Google  Peerspective’s indices covers 30% of the requested URLs  13.3% of URLs were contained in PeerSpective but not in Google's index 12
  • 13. Random samples of URLs not in Google and Potential Reason 13
  • 14. Benefits of SN-based Search  Experiments on clicks on results on first page For 1730 queries (1079 resulted in clicks) Results  86.5% of the clicked results were returned only by Google  5.7% of the clicked results were returned by both  7.7% of the clicked results were returned only by PeerSpective Conclusions  This 7.7% is considered to be the gold standard of web search engineering  Inherent advantage of using social links in web search 14
  • 15. Reasons for Clicks on Peerspective  Disambiguation Community tend to share definitions or interpretation of popular terms (bus)  Ranking SN information can bias the ranking algorithms to the interests of users (CoolStreaming)  Serendipity Ample opportunity of finding interesting things without searching 15
  • 16. Example of URLs found in Peerspective 16
  • 17. Opportunities and Challenges  Privacy • Willingness of users to disclose information • Need for mechanisms to control information flow and anonymity  Membership and Clustering of SN • Users may participate in many networks • Need for searching with respect to the different clusters  Content rating and ranking • New approaches to ranking search results • System Architecture: centralized or Distributed? 17
  • 18. An Experimental Study of the Coloring Problem on Human Subject Networks Michael Kearns, Siddharth Suri, Nick Montfort, SCIENCE, (313), Aug 2006 Part II Computer Science Department, University of Toronto 18
  • 19. Experimental Study on Human Subject Networks  Theoretical work suggests that structural properties of naturally occurring networks are important in shaping behavior and dynamics • E.g. Hubs in networks are important in routing information  Empirical Structural Properties established by many disciplines • Small Diameter (the “six” degrees of separation) • Local clustering of connectivity • Heavy-tailed distribution of connectivity (Power-law distributions)  Empirical Studies of Networks • Limitation: Networks are fixed and given (no alternatives) • Other approach: Controlled laboratory study 19
  • 20. Experiment  Experimental Scenario • Distributed problem-solving from local information  Experimental Setting • 38 human subjects (network vertices) • Each subject controls the color of a vertex in a network • Networks: simple and more complex • Goal: Select a different color from that of all neighbors • Problem: Coloring problem • Information Available: Variable (Low, Medium, High) 20
  • 21. Graph Coloring Problem  Graph coloring An assignment of "colors" to certain objects in a graph such that no two adjacent objects are assigned the same color  Graph Coloring Problem Find the minimum number of colors for an arbitrary graph (NP-hard)  Chromatic number The least number of colors needed to color the graph Example  Vertex coloring  A 3-coloring suits this graph but fewer colors would result in adjacent vertices of the same color 21
  • 22. Network Topologies Simple Cycle 5-Chord Cycle 20-Chord Cycle Leader Cycle Pref. Att. v=2 Pref. Att. v=3 22
  • 23. Information View Low Medium All (Color of each Neighbor) (#of Links of each Neighbor) (All network) 3 3 6 YOU YOU YOU 7 10 Overall Progress Overall Progress Overall Progress 23
  • 24. Graph Properties and Experimental Results Graph Graph Properties Experimental Results Avg. Exp. # Exp. Colors Min Max Avg. No. of Duration Solved Required Links Links Distance Changes (sec) (sec) Simple 2 2 2 9.76 144.17 5/6 378 Cycle 5-Chord 2 2 4 5.63 121.14 7/7 687 Cycle 20-Chord 2 2 7 3.34 65.67 6/6 8265 Cycle Leader 2 3 19 2.31 40.86 7/7 8797 Cycle Pref. Att. 3 2 13 2.63 219.67 2/6 1744 V=2 Pref. Att. 4 3 22 2.08 154.83 4/6 4703 V=3 24
  • 25. 1: Collective Performance  Subjects could indeed solve the coloring problem across a wide range of networks • 31/38 experiments ended in solution in less that 300 seconds • 82 sec mean completion time  Collective Performance affected by network structure • Preferential Attachment harder than Cycle-based networks  Cycle-based networks: • Monotonic relationship between solution time and average network distance (smaller distance leading to shorter solution times)  Addition of random chords: Systematically reduces solution time 25
  • 26. 2: Human Performance VS Artificial Distributed Heuristics Heuristic considered:  A vertex is randomly selected • If there are unused colors in the neighbor of this vertex then a color is selected randomly from the available ones • If there are not unused then a color is selected randomly Comparison measure  Number of vertex color changes Findings:  Results exactly reversed: lower average distance increases the difficulty for the heuristic  Preferential attachment networks easier for the heuristic 26
  • 27. 3: Effects on Varying the Locality of Information View  Variable locality information provided to subjects • Low: Their own and neighboring colors are visible • Medium: Their own and neighboring colors are visible but providing information on connectivity of neighbors • High: global coloring state at all times Findings:  Increased amount of information • Reduces solution times for cycle-based networks • Decreases solution times for preferential attachment networks • Rapid convergence to one of the two solutions in cycle-based networks 27
  • 28. Information View Effect 1: Pref. Att. VS Cycle-based Networks Avg. Experiment Duration 350 300 250 Time (seconds) 200 Cycles 150 Pref. Att. 100 50 0 Low Medium High Information View 28
  • 29. Information View Effect 2: Cycle-based Solution Convergence Low Information View High Information View Population oscillates between Rapid convergence to one of the approaches to the two solutions Two possible solutions 29
  • 30. Individual Strategies  Choosing colors that result in the fewest local conflicts  Attempt to avoid conflicts with highly connected subjects  Signaling behavior of subjects  Introducing conflicts to avoid local minima 30
  • 31. Questions? Computer Science Department, University of Toronto 31
  • 32. Thanks! Computer Science Department, University of Toronto 32