1. Seminar Series
Social Information Systems
Manos Papagelis
Department of Computer Science, University of Toronto
papaggel@cs.toronto.edu
Toronto, Spring, 2007
Computer Science Department, University of Toronto 1
2. Presentation Outline
Part I: Exploiting Social Networks for Internet Search
Part II: An Experimental Study of the Coloring Problem on Human
Subject Networks
2
3. Exploiting Social Networks for Internet Search
Alan Mislove, Krishna Gummadi, and Peter Druschel, HotNets 2006
Part I
Computer Science Department, University of Toronto 3
4. Introduction
Social Networking (SN)
A new form of publishing and locating information
Objective
To understand whether these social links can be exploited by search
engines to provide better results
Contributions
• Comparison of the mechanisms in Web and online SN for
Publishing: Mechanisms to make information available to users
Locating: Mechanisms to find information
• Results from an experiment in social network-based Web Search
• Challenges and opportunities in using Social Networks for
Internet Search
4
5. Web vs. SN (1/2)
Web
Publishing: By placing documents on a Web Server (and then search
for incoming links)
Locating: Via Search engines (Exploiting the link graph)
Pros
Very Effective (incoming links are good indicators of importance)
Limitations
No fresh data
No personalized results
Unlinked pages are not indexed
5
6. Web vs. SN (2/2)
Social Networks
Publishing: No explicit links between content (photos, videos, blogs)
but implicit links between content through explicit links between
users.
Locating:
• Navigation through the social network and browsing users’
content
• Keyword based search for textual or tagged content
• Through "Top-10" lists
Pros
Helps a user find timely, relevant information by browsing adjacent
regions of the network of users with similar interests
Content is rated rapidly (by comments and feedback of a community)
6
7. Integration of Web Search and SN
Web and SN information is disjoint
No unified search tool that locates information across different
systems
7
8. PeerSpective: SN-based Web Search
Technology:
• Lucene text search engine and FreePastry P2P Overlay
• Lightweight HTTP Proxy transparently indexes all visited URLs of
user
8
9. Searching Process
A query is submitted by a user to Google
The proxy transparently forwards the query to both Google and the
Proxies of Users in the network
Each proxy executes the query on the local index
Results are then collated and presented alongside Google results
Peerspective Ranking:
Lucene Sc. + Pagerank + Scores from users who previously viewed the result
9
11. Experiments
10 grad. students share downloaded or viewed Web content
One month long experiments
200.000 Distinct URLs
25% were of type text/html or application/pdf (so the can be indexed)
Reports On:
Limits of hyperlink-based search
Benefits of SN-based Search
11
12. Limits of hyperlink-based search
Report on fraction of visited URLs that are not indexed by Google
• Too new page (blogs)
• Deep Web
• Dark Web (no links)
Results
About 1/3 of requests cannot be retrieved by Google
Peerspective’s indices covers 30% of the requested URLs
13.3% of URLs were contained in PeerSpective but not in Google's
index
12
14. Benefits of SN-based Search
Experiments on clicks on results on first page
For 1730 queries (1079 resulted in clicks)
Results
86.5% of the clicked results were returned only by Google
5.7% of the clicked results were returned by both
7.7% of the clicked results were returned only by PeerSpective
Conclusions
This 7.7% is considered to be the gold standard of web search
engineering
Inherent advantage of using social links in web search
14
15. Reasons for Clicks on Peerspective
Disambiguation
Community tend to share definitions or interpretation of popular
terms (bus)
Ranking
SN information can bias the ranking algorithms to the interests of
users (CoolStreaming)
Serendipity
Ample opportunity of finding interesting things without searching
15
17. Opportunities and Challenges
Privacy
• Willingness of users to disclose information
• Need for mechanisms to control information flow and anonymity
Membership and Clustering of SN
• Users may participate in many networks
• Need for searching with respect to the different clusters
Content rating and ranking
• New approaches to ranking search results
• System Architecture: centralized or Distributed?
17
18. An Experimental Study of the Coloring Problem on Human
Subject Networks
Michael Kearns, Siddharth Suri, Nick Montfort, SCIENCE, (313), Aug 2006
Part II
Computer Science Department, University of Toronto 18
19. Experimental Study on Human Subject Networks
Theoretical work suggests that structural properties of naturally
occurring networks are important in shaping behavior and dynamics
• E.g. Hubs in networks are important in routing information
Empirical Structural Properties established by many disciplines
• Small Diameter (the “six” degrees of separation)
• Local clustering of connectivity
• Heavy-tailed distribution of connectivity (Power-law distributions)
Empirical Studies of Networks
• Limitation: Networks are fixed and given (no alternatives)
• Other approach: Controlled laboratory study
19
20. Experiment
Experimental Scenario
• Distributed problem-solving from local information
Experimental Setting
• 38 human subjects (network vertices)
• Each subject controls the color of a vertex in a network
• Networks: simple and more complex
• Goal: Select a different color from that of all neighbors
• Problem: Coloring problem
• Information Available: Variable (Low, Medium, High)
20
21. Graph Coloring Problem
Graph coloring
An assignment of "colors" to certain objects in a graph such that no
two adjacent objects are assigned the same color
Graph Coloring Problem
Find the minimum number of colors for an arbitrary graph (NP-hard)
Chromatic number
The least number of colors needed to color the graph
Example
Vertex coloring
A 3-coloring suits this graph but fewer colors
would result in adjacent vertices of the same
color
21
23. Information View
Low Medium All
(Color of each Neighbor) (#of Links of each Neighbor) (All network)
3
3 6
YOU
YOU YOU
7 10
Overall Progress Overall Progress Overall Progress
23
25. 1: Collective Performance
Subjects could indeed solve the coloring problem across a wide
range of networks
• 31/38 experiments ended in solution in less that 300 seconds
• 82 sec mean completion time
Collective Performance affected by network structure
• Preferential Attachment harder than Cycle-based networks
Cycle-based networks:
• Monotonic relationship between solution time and average
network distance (smaller distance leading to shorter solution
times)
Addition of random chords: Systematically reduces solution time
25
26. 2: Human Performance VS Artificial Distributed Heuristics
Heuristic considered:
A vertex is randomly selected
• If there are unused colors in the neighbor of this vertex then a
color is selected randomly from the available ones
• If there are not unused then a color is selected randomly
Comparison measure
Number of vertex color changes
Findings:
Results exactly reversed: lower average distance increases the
difficulty for the heuristic
Preferential attachment networks easier for the heuristic
26
27. 3: Effects on Varying the Locality of Information View
Variable locality information provided to subjects
• Low: Their own and neighboring colors are visible
• Medium: Their own and neighboring colors are visible but
providing information on connectivity of neighbors
• High: global coloring state at all times
Findings:
Increased amount of information
• Reduces solution times for cycle-based networks
• Decreases solution times for preferential attachment networks
• Rapid convergence to one of the two solutions in cycle-based
networks
27
28. Information View Effect 1: Pref. Att. VS Cycle-based Networks
Avg. Experiment Duration
350
300
250
Time (seconds)
200 Cycles
150 Pref. Att.
100
50
0
Low Medium High
Information View
28
29. Information View Effect 2: Cycle-based Solution Convergence
Low Information View High Information View
Population oscillates between Rapid convergence to one of the
approaches to the two solutions Two possible solutions
29
30. Individual Strategies
Choosing colors that result in the fewest local conflicts
Attempt to avoid conflicts with highly connected subjects
Signaling behavior of subjects
Introducing conflicts to avoid local minima
30