SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Counting Fast
                               (Part II)

                                   Sergei Vassilvitskii
                                 Columbia University
                         Computational Social Science
                                       March 8, 2013



Thursday, March 14, 13
Last time

              Counting fast:
                – Quadratic time doesn’t scale
                – Sorting is slightly more than linear
                – Hashing allows you to do membership queries in constant time




                                                   2                      Sergei Vassilvitskii

Thursday, March 14, 13
Today

              Counting on Networks:
                – Large Graphs: Internet, Facebook, Twitter
                – Recommendation Graphs: Netflix, Amazon, etc.




                                                 3              Sergei Vassilvitskii

Thursday, March 14, 13
Friends & Followers

              Given a network:
                – When do people become friends?
                – What factors influence this?




                                                4   Sergei Vassilvitskii

Thursday, March 14, 13
Friends & Followers

              Given a network:
                – When do people become friends?
                – What factors influence this?


              Products:
                – People You May Know (PYMK). Reconnect people, help new users




                                                5                       Sergei Vassilvitskii

Thursday, March 14, 13
Friends & Followers

              Given a network:
                – When do people become friends?
                – What factors influence this?


              Products:
                – People You May Know (PYMK). Reconnect people, help new users
                – Twitter’s who to follow?




                                                6                       Sergei Vassilvitskii

Thursday, March 14, 13
Friends & Followers

              Given a network:
                – When do people become friends?
                – What factors influence this?


              Products:
                – People You May Know (PYMK). Reconnect people, help new users
                – Twitter’s who to follow?


              Recommendations:
                – Netflix, Amazon, etc. (Future lectures)




                                                  7                     Sergei Vassilvitskii

Thursday, March 14, 13
Triadic Closure

              Likely to become friends with:
                – People in similar groups
                – Friends of friends




                                             8   Sergei Vassilvitskii

Thursday, March 14, 13
Defining Tight Knit Circles

              Looking for tight-knit circles:
                – People whose friends are friends themselves


              Why?
                – Network Cohesion: Tightly knit communities foster more trust, social
                  norms. [Coleman ’88, Portes ’88]
                – Structural Holes: Individuals benefit form bridging [Burt ’04, ’07]




                                                   9                          Sergei Vassilvitskii

Thursday, March 14, 13
Clustering Coefficient




                         vs.




                         10    Sergei Vassilvitskii

Thursday, March 14, 13
Clustering Coefficient



       cc (        ) = 0.5                            cc (   ) = 0.1


                                                vs.




             Given an undirected graph
             - For each node, it’s the fraction of v’s neighbors who are neighbors
             themselves
             - Identical to the number of triangles containing the node


                                                11                          Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles

              Sequential Version:
                    foreach v in V
                        foreach u,w in Adjacency(v)
                           if (u,w) in E
                              Triangles[v]++




                                  v

                                                      Triangles[v]=0




                                           12                     Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles

              Sequential Version:
                    foreach v in V
                        foreach u,w in Adjacency(v)
                           if (u,w) in E
                              Triangles[v]++




                                  v

                                                      Triangles[v]=1
                                                w

                                      u
                                           13                     Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles

              Sequential Version:
                    foreach v in V
                        foreach u,w in Adjacency(v)
                           if (u,w) in E
                              Triangles[v]++




                                  v

                                                      Triangles[v]=1

                         w
                                      u
                                           14                     Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles

              Sequential Version:
                    foreach v in V
                        foreach u,w in Adjacency(v)
                           if (u,w) in E
                              Triangles[v]++


              Running time:
                – For each vertex, look at all pairs of neighbors
                – Number of pairs ~ quadratic in the degree of the vertex


                – What happens if the degree is very large?




                                                  15                        Sergei Vassilvitskii

Thursday, March 14, 13
Parallel Version

              But use 1,000 machines!
                – Quadratic algorithms still don’t scale
                – Simple parallelization: process each vertex separately




              Naive parallelization does not help with data skew
                – Some nodes will have very high degree
                – Example. 3.2 Million followers, must generate 10 Trillion (10^13)
                  potential edges to check.
                – Even if generating 100M edges per second this is 100K seconds ~ 27
                  hours for one vertex!




                                                  16                        Sergei Vassilvitskii

Thursday, March 14, 13
“Just 5 more minutes”

              On the LiveJournal Graph (5M nodes, 70M edges)
                – 80% of vertices are done after 5 min
                – 99% done after 35 min




                                                 17            Sergei Vassilvitskii

Thursday, March 14, 13
Adapting the Algorithm

              Approach 1: Dealing with skew directly
                – currently every triangle counted 3 times (once per vertex)
                – Running time quadratic in the degree of the vertex
                – Idea: Count each once, from the perspective of lowest degree vertex
                – Does this heuristic work?




                                                 18                            Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles Better

              Idea [Schank ’07]
                – Only pivot on nodes who have smaller degrees than both neighbors.
                – Neighbors of high degree nodes tend to have small degrees




                                                19                        Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles Better

              foreach v in V
                 foreach u in Adjacency(v) with deg(u) > deg(v):
                     foreach w in Adjacency(v) with deg(w) > deg(v):
                       if (u,w) is an edge:
                          Triangles[v]++
                          Triangles[w]++
                          Triangles[u]++




                                       20                   Sergei Vassilvitskii

Thursday, March 14, 13
Does it make a difference?




                         21      Sergei Vassilvitskii

Thursday, March 14, 13
Why does it help?

              Look at two different kinds of nodes:
                – Few friends:
                         • OK to be quadratic on small instances
                – Lots of friends
                         • Only care about number of friends with even more friends!
                         • Cannot have too many (can make this formal)




                                                            22                         Sergei Vassilvitskii

Thursday, March 14, 13
Break




                         23   Sergei Vassilvitskii

Thursday, March 14, 13
Working in Parallel

              MapReduce (review):


              Map:
                – Decide how to group the data for computation

              Reduce:
                – Given the grouping, perform the computation




                                               24                Sergei Vassilvitskii

Thursday, March 14, 13
Building People You May Know

              Friendships are undirected:
                – If Alice knows Bob, Bob knows Alice
                – Data stored as a list of all edges
                – Find all friends of friends
                – Score the possible pairs




                                                   25   Sergei Vassilvitskii

Thursday, March 14, 13
Data

              Suppose you have edges and degrees of each vertex:


              Joe        56    Mary       78
              Alice 398        Bob    198
              Dan        983   Justin 11,985,234
              ...


              An alternate view may be data stored as adjacency list:
              Joe         56    Mary 78        Don   99   Bill 1
              Alice 398         Kate 55        Bob 198    Mary 78
              ...


                                                26                  Sergei Vassilvitskii

Thursday, March 14, 13
Previous Algorithm

              Adjacency list input.
                – Map:
                         • For each node and its neighbors, output all paths through the node
                – Reduce:
                         • none




                – Map: [          |         ]
                – Output:
                – Map: [          |    ]
                – Output: None
                                                            27                                  Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles Better

              Idea [Schank ’07]
                – Only pivot on nodes who have smaller degrees than both neighbors.
                – Neighbors of high degree nodes tend to have small degrees




                                                28                        Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Data Needed:
                – Central node
                – Neighbors that have higher degree




                                                29    Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Data Needed:
                – Central node
                – Neighbors that have higher degree




                                                30    Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Data Needed:
                – Central node
                – Neighbors that have higher degree




                                                31    Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Data Needed:
                – Central node
                – Neighbors that have higher degree




                – Orient each edge to point to a node of higher degree, breaking ties
                  arbitrarily but consistently




                                                 32                          Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Map:
                – Orient each edge to point to a node of higher degree, breaking ties
                  arbitrarily but consistently
                – Given: Joe      56    Mary         78
                – Output: <Key    = Joe, Value      = Mary>
                – Given: Alice    398   Bob         198
                – Output: <Key    = Bob, Value      = Alice>

                     map(key, value):
                        split = value.split()
                        if split[3] > split[1] or
                          (split[3] == split[1] and split[0] < split[2]):
                             emit(split[0], split[2])
                        if split[3] < split[1] or
                          (split[3] == split[1] and split[0] > split[2]):
                             emit(split[2], split[0])


                                                 33                          Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Aggregate (Shuffle):
                – Collect all values with same key (nodes with higher degree)




              Computation:
                – Generate all 2-paths (friend of a friend relationships):




                                                   34                           Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Aggregate (Shuffle):
                – Collect all values with same key (nodes with higher degree)




              Computation:
                – Generate all 2-paths (friend of a friend relationships):
                – Generate all 2-paths:       ,         ,




                                                   35                           Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Aggregate (Shuffle):
                – Collect all values with same key (nodes with higher degree)

              Computation:
                – Generate all 2-paths (friend of a friend relationships)
                – Given: key= Joe, value={Mary, Justin, Alice}
                – Output:
                         • (key = Joe, Value = (Mary, Justin))
                         • (key = Joe, Value = (Mary, Alice))
                         • (key = Joe, Value = (Justin, Alice))

                    reduce(key, values):
                       for friend1 : values
                          for friend2 : values
                             emit(key, (friend1, friend2))


                                                    36                          Sergei Vassilvitskii

Thursday, March 14, 13
Comparing Algorithms

              Edgelist MapOnly Algorithm:
                – MapOnly
                – Output from some nodes is quadratic


              Edge at a time Algroithm:
                – Map & Reduce
                – More balanced output from each node




                                               37       Sergei Vassilvitskii

Thursday, March 14, 13
Scoring

              Some suggestions are better than others:
                – Some people are already friends!
                – Or they used to be friends...
                – Connected through a friend with 1000s of friends
                – Connected through multiple friends
                – ...




                                                  38                 Sergei Vassilvitskii

Thursday, March 14, 13
Spring Break!




                         39   Sergei Vassilvitskii

Thursday, March 14, 13

Weitere ähnliche Inhalte

Andere mochten auch

Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIjakehofman
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overviewjakehofman
 
Auto Elektrikoen Aurkezpena
Auto Elektrikoen AurkezpenaAuto Elektrikoen Aurkezpena
Auto Elektrikoen Aurkezpenaguest40206e1d
 
Proyecto De FormacióN E InvestigacióNúCleo Social Cultural
Proyecto De FormacióN E InvestigacióNúCleo Social CulturalProyecto De FormacióN E InvestigacióNúCleo Social Cultural
Proyecto De FormacióN E InvestigacióNúCleo Social Culturalguest2a9b0b0
 
Student textual analysis
Student textual analysisStudent textual analysis
Student textual analysisGudj
 
Tic ted bravo nicolás
Tic ted bravo nicolásTic ted bravo nicolás
Tic ted bravo nicolásNico Bravo
 
Поетична свічка
Поетична свічкаПоетична свічка
Поетична свічкаDoom Doom
 
SEPM Outsourcing
SEPM OutsourcingSEPM Outsourcing
SEPM Outsourcingasherad
 

Andere mochten auch (20)

Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part II
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Counting
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Auto Elektrikoen Aurkezpena
Auto Elektrikoen AurkezpenaAuto Elektrikoen Aurkezpena
Auto Elektrikoen Aurkezpena
 
Iso 9001 2008 - qms
Iso 9001  2008 - qmsIso 9001  2008 - qms
Iso 9001 2008 - qms
 
лабар5
лабар5лабар5
лабар5
 
Proyecto De FormacióN E InvestigacióNúCleo Social Cultural
Proyecto De FormacióN E InvestigacióNúCleo Social CulturalProyecto De FormacióN E InvestigacióNúCleo Social Cultural
Proyecto De FormacióN E InvestigacióNúCleo Social Cultural
 
Presentación del centro Don Juan I
Presentación del centro Don  Juan IPresentación del centro Don  Juan I
Presentación del centro Don Juan I
 
звіти
звітизвіти
звіти
 
Student textual analysis
Student textual analysisStudent textual analysis
Student textual analysis
 
Tic ted bravo nicolás
Tic ted bravo nicolásTic ted bravo nicolás
Tic ted bravo nicolás
 
Evaluation Q7
Evaluation Q7Evaluation Q7
Evaluation Q7
 
Поетична свічка
Поетична свічкаПоетична свічка
Поетична свічка
 
Rukovodstvo po montazhu_i_nastrojke_touch_board_i_
Rukovodstvo po montazhu_i_nastrojke_touch_board_i_Rukovodstvo po montazhu_i_nastrojke_touch_board_i_
Rukovodstvo po montazhu_i_nastrojke_touch_board_i_
 
Virus y antivirus
Virus y antivirusVirus y antivirus
Virus y antivirus
 
SEPM Outsourcing
SEPM OutsourcingSEPM Outsourcing
SEPM Outsourcing
 
Lake maggiore
Lake maggioreLake maggiore
Lake maggiore
 

Mehr von jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networksjakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studiesjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbitjakehofman
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10jakehofman
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09jakehofman
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brainjakehofman
 

Mehr von jakehofman (16)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brain
 

Kürzlich hochgeladen

How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
Patterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxPatterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxMYDA ANGELICA SUAN
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxDr. Santhosh Kumar. N
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?TechSoup
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice documentXsasf Sfdfasd
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxKatherine Villaluna
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE
 
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxSaurabhParmar42
 

Kürzlich hochgeladen (20)

How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
Patterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxPatterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptx
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptx
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice document
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
 
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptx
 

Computational Social Science, Lecture 08: Counting Fast, Part II

  • 1. Counting Fast (Part II) Sergei Vassilvitskii Columbia University Computational Social Science March 8, 2013 Thursday, March 14, 13
  • 2. Last time Counting fast: – Quadratic time doesn’t scale – Sorting is slightly more than linear – Hashing allows you to do membership queries in constant time 2 Sergei Vassilvitskii Thursday, March 14, 13
  • 3. Today Counting on Networks: – Large Graphs: Internet, Facebook, Twitter – Recommendation Graphs: Netflix, Amazon, etc. 3 Sergei Vassilvitskii Thursday, March 14, 13
  • 4. Friends & Followers Given a network: – When do people become friends? – What factors influence this? 4 Sergei Vassilvitskii Thursday, March 14, 13
  • 5. Friends & Followers Given a network: – When do people become friends? – What factors influence this? Products: – People You May Know (PYMK). Reconnect people, help new users 5 Sergei Vassilvitskii Thursday, March 14, 13
  • 6. Friends & Followers Given a network: – When do people become friends? – What factors influence this? Products: – People You May Know (PYMK). Reconnect people, help new users – Twitter’s who to follow? 6 Sergei Vassilvitskii Thursday, March 14, 13
  • 7. Friends & Followers Given a network: – When do people become friends? – What factors influence this? Products: – People You May Know (PYMK). Reconnect people, help new users – Twitter’s who to follow? Recommendations: – Netflix, Amazon, etc. (Future lectures) 7 Sergei Vassilvitskii Thursday, March 14, 13
  • 8. Triadic Closure Likely to become friends with: – People in similar groups – Friends of friends 8 Sergei Vassilvitskii Thursday, March 14, 13
  • 9. Defining Tight Knit Circles Looking for tight-knit circles: – People whose friends are friends themselves Why? – Network Cohesion: Tightly knit communities foster more trust, social norms. [Coleman ’88, Portes ’88] – Structural Holes: Individuals benefit form bridging [Burt ’04, ’07] 9 Sergei Vassilvitskii Thursday, March 14, 13
  • 10. Clustering Coefficient vs. 10 Sergei Vassilvitskii Thursday, March 14, 13
  • 11. Clustering Coefficient cc ( ) = 0.5 cc ( ) = 0.1 vs. Given an undirected graph - For each node, it’s the fraction of v’s neighbors who are neighbors themselves - Identical to the number of triangles containing the node 11 Sergei Vassilvitskii Thursday, March 14, 13
  • 12. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=0 12 Sergei Vassilvitskii Thursday, March 14, 13
  • 13. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=1 w u 13 Sergei Vassilvitskii Thursday, March 14, 13
  • 14. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=1 w u 14 Sergei Vassilvitskii Thursday, March 14, 13
  • 15. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ Running time: – For each vertex, look at all pairs of neighbors – Number of pairs ~ quadratic in the degree of the vertex – What happens if the degree is very large? 15 Sergei Vassilvitskii Thursday, March 14, 13
  • 16. Parallel Version But use 1,000 machines! – Quadratic algorithms still don’t scale – Simple parallelization: process each vertex separately Naive parallelization does not help with data skew – Some nodes will have very high degree – Example. 3.2 Million followers, must generate 10 Trillion (10^13) potential edges to check. – Even if generating 100M edges per second this is 100K seconds ~ 27 hours for one vertex! 16 Sergei Vassilvitskii Thursday, March 14, 13
  • 17. “Just 5 more minutes” On the LiveJournal Graph (5M nodes, 70M edges) – 80% of vertices are done after 5 min – 99% done after 35 min 17 Sergei Vassilvitskii Thursday, March 14, 13
  • 18. Adapting the Algorithm Approach 1: Dealing with skew directly – currently every triangle counted 3 times (once per vertex) – Running time quadratic in the degree of the vertex – Idea: Count each once, from the perspective of lowest degree vertex – Does this heuristic work? 18 Sergei Vassilvitskii Thursday, March 14, 13
  • 19. How to Count Triangles Better Idea [Schank ’07] – Only pivot on nodes who have smaller degrees than both neighbors. – Neighbors of high degree nodes tend to have small degrees 19 Sergei Vassilvitskii Thursday, March 14, 13
  • 20. How to Count Triangles Better foreach v in V foreach u in Adjacency(v) with deg(u) > deg(v): foreach w in Adjacency(v) with deg(w) > deg(v): if (u,w) is an edge: Triangles[v]++ Triangles[w]++ Triangles[u]++ 20 Sergei Vassilvitskii Thursday, March 14, 13
  • 21. Does it make a difference? 21 Sergei Vassilvitskii Thursday, March 14, 13
  • 22. Why does it help? Look at two different kinds of nodes: – Few friends: • OK to be quadratic on small instances – Lots of friends • Only care about number of friends with even more friends! • Cannot have too many (can make this formal) 22 Sergei Vassilvitskii Thursday, March 14, 13
  • 23. Break 23 Sergei Vassilvitskii Thursday, March 14, 13
  • 24. Working in Parallel MapReduce (review): Map: – Decide how to group the data for computation Reduce: – Given the grouping, perform the computation 24 Sergei Vassilvitskii Thursday, March 14, 13
  • 25. Building People You May Know Friendships are undirected: – If Alice knows Bob, Bob knows Alice – Data stored as a list of all edges – Find all friends of friends – Score the possible pairs 25 Sergei Vassilvitskii Thursday, March 14, 13
  • 26. Data Suppose you have edges and degrees of each vertex: Joe 56 Mary 78 Alice 398 Bob 198 Dan 983 Justin 11,985,234 ... An alternate view may be data stored as adjacency list: Joe 56 Mary 78 Don 99 Bill 1 Alice 398 Kate 55 Bob 198 Mary 78 ... 26 Sergei Vassilvitskii Thursday, March 14, 13
  • 27. Previous Algorithm Adjacency list input. – Map: • For each node and its neighbors, output all paths through the node – Reduce: • none – Map: [ | ] – Output: – Map: [ | ] – Output: None 27 Sergei Vassilvitskii Thursday, March 14, 13
  • 28. How to Count Triangles Better Idea [Schank ’07] – Only pivot on nodes who have smaller degrees than both neighbors. – Neighbors of high degree nodes tend to have small degrees 28 Sergei Vassilvitskii Thursday, March 14, 13
  • 29. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 29 Sergei Vassilvitskii Thursday, March 14, 13
  • 30. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 30 Sergei Vassilvitskii Thursday, March 14, 13
  • 31. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 31 Sergei Vassilvitskii Thursday, March 14, 13
  • 32. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree – Orient each edge to point to a node of higher degree, breaking ties arbitrarily but consistently 32 Sergei Vassilvitskii Thursday, March 14, 13
  • 33. Want to compute all open triads Map: – Orient each edge to point to a node of higher degree, breaking ties arbitrarily but consistently – Given: Joe 56 Mary 78 – Output: <Key = Joe, Value = Mary> – Given: Alice 398 Bob 198 – Output: <Key = Bob, Value = Alice> map(key, value): split = value.split() if split[3] > split[1] or (split[3] == split[1] and split[0] < split[2]): emit(split[0], split[2]) if split[3] < split[1] or (split[3] == split[1] and split[0] > split[2]): emit(split[2], split[0]) 33 Sergei Vassilvitskii Thursday, March 14, 13
  • 34. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships): 34 Sergei Vassilvitskii Thursday, March 14, 13
  • 35. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships): – Generate all 2-paths: , , 35 Sergei Vassilvitskii Thursday, March 14, 13
  • 36. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships) – Given: key= Joe, value={Mary, Justin, Alice} – Output: • (key = Joe, Value = (Mary, Justin)) • (key = Joe, Value = (Mary, Alice)) • (key = Joe, Value = (Justin, Alice)) reduce(key, values): for friend1 : values for friend2 : values emit(key, (friend1, friend2)) 36 Sergei Vassilvitskii Thursday, March 14, 13
  • 37. Comparing Algorithms Edgelist MapOnly Algorithm: – MapOnly – Output from some nodes is quadratic Edge at a time Algroithm: – Map & Reduce – More balanced output from each node 37 Sergei Vassilvitskii Thursday, March 14, 13
  • 38. Scoring Some suggestions are better than others: – Some people are already friends! – Or they used to be friends... – Connected through a friend with 1000s of friends – Connected through multiple friends – ... 38 Sergei Vassilvitskii Thursday, March 14, 13
  • 39. Spring Break! 39 Sergei Vassilvitskii Thursday, March 14, 13