SlideShare a Scribd company logo
1 of 39
Counting Fast
                               (Part II)

                                   Sergei Vassilvitskii
                                 Columbia University
                         Computational Social Science
                                       March 8, 2013



Thursday, March 14, 13
Last time

              Counting fast:
                – Quadratic time doesn’t scale
                – Sorting is slightly more than linear
                – Hashing allows you to do membership queries in constant time




                                                   2                      Sergei Vassilvitskii

Thursday, March 14, 13
Today

              Counting on Networks:
                – Large Graphs: Internet, Facebook, Twitter
                – Recommendation Graphs: Netflix, Amazon, etc.




                                                 3              Sergei Vassilvitskii

Thursday, March 14, 13
Friends & Followers

              Given a network:
                – When do people become friends?
                – What factors influence this?




                                                4   Sergei Vassilvitskii

Thursday, March 14, 13
Friends & Followers

              Given a network:
                – When do people become friends?
                – What factors influence this?


              Products:
                – People You May Know (PYMK). Reconnect people, help new users




                                                5                       Sergei Vassilvitskii

Thursday, March 14, 13
Friends & Followers

              Given a network:
                – When do people become friends?
                – What factors influence this?


              Products:
                – People You May Know (PYMK). Reconnect people, help new users
                – Twitter’s who to follow?




                                                6                       Sergei Vassilvitskii

Thursday, March 14, 13
Friends & Followers

              Given a network:
                – When do people become friends?
                – What factors influence this?


              Products:
                – People You May Know (PYMK). Reconnect people, help new users
                – Twitter’s who to follow?


              Recommendations:
                – Netflix, Amazon, etc. (Future lectures)




                                                  7                     Sergei Vassilvitskii

Thursday, March 14, 13
Triadic Closure

              Likely to become friends with:
                – People in similar groups
                – Friends of friends




                                             8   Sergei Vassilvitskii

Thursday, March 14, 13
Defining Tight Knit Circles

              Looking for tight-knit circles:
                – People whose friends are friends themselves


              Why?
                – Network Cohesion: Tightly knit communities foster more trust, social
                  norms. [Coleman ’88, Portes ’88]
                – Structural Holes: Individuals benefit form bridging [Burt ’04, ’07]




                                                   9                          Sergei Vassilvitskii

Thursday, March 14, 13
Clustering Coefficient




                         vs.




                         10    Sergei Vassilvitskii

Thursday, March 14, 13
Clustering Coefficient



       cc (        ) = 0.5                            cc (   ) = 0.1


                                                vs.




             Given an undirected graph
             - For each node, it’s the fraction of v’s neighbors who are neighbors
             themselves
             - Identical to the number of triangles containing the node


                                                11                          Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles

              Sequential Version:
                    foreach v in V
                        foreach u,w in Adjacency(v)
                           if (u,w) in E
                              Triangles[v]++




                                  v

                                                      Triangles[v]=0




                                           12                     Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles

              Sequential Version:
                    foreach v in V
                        foreach u,w in Adjacency(v)
                           if (u,w) in E
                              Triangles[v]++




                                  v

                                                      Triangles[v]=1
                                                w

                                      u
                                           13                     Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles

              Sequential Version:
                    foreach v in V
                        foreach u,w in Adjacency(v)
                           if (u,w) in E
                              Triangles[v]++




                                  v

                                                      Triangles[v]=1

                         w
                                      u
                                           14                     Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles

              Sequential Version:
                    foreach v in V
                        foreach u,w in Adjacency(v)
                           if (u,w) in E
                              Triangles[v]++


              Running time:
                – For each vertex, look at all pairs of neighbors
                – Number of pairs ~ quadratic in the degree of the vertex


                – What happens if the degree is very large?




                                                  15                        Sergei Vassilvitskii

Thursday, March 14, 13
Parallel Version

              But use 1,000 machines!
                – Quadratic algorithms still don’t scale
                – Simple parallelization: process each vertex separately




              Naive parallelization does not help with data skew
                – Some nodes will have very high degree
                – Example. 3.2 Million followers, must generate 10 Trillion (10^13)
                  potential edges to check.
                – Even if generating 100M edges per second this is 100K seconds ~ 27
                  hours for one vertex!




                                                  16                        Sergei Vassilvitskii

Thursday, March 14, 13
“Just 5 more minutes”

              On the LiveJournal Graph (5M nodes, 70M edges)
                – 80% of vertices are done after 5 min
                – 99% done after 35 min




                                                 17            Sergei Vassilvitskii

Thursday, March 14, 13
Adapting the Algorithm

              Approach 1: Dealing with skew directly
                – currently every triangle counted 3 times (once per vertex)
                – Running time quadratic in the degree of the vertex
                – Idea: Count each once, from the perspective of lowest degree vertex
                – Does this heuristic work?




                                                 18                            Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles Better

              Idea [Schank ’07]
                – Only pivot on nodes who have smaller degrees than both neighbors.
                – Neighbors of high degree nodes tend to have small degrees




                                                19                        Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles Better

              foreach v in V
                 foreach u in Adjacency(v) with deg(u) > deg(v):
                     foreach w in Adjacency(v) with deg(w) > deg(v):
                       if (u,w) is an edge:
                          Triangles[v]++
                          Triangles[w]++
                          Triangles[u]++




                                       20                   Sergei Vassilvitskii

Thursday, March 14, 13
Does it make a difference?




                         21      Sergei Vassilvitskii

Thursday, March 14, 13
Why does it help?

              Look at two different kinds of nodes:
                – Few friends:
                         • OK to be quadratic on small instances
                – Lots of friends
                         • Only care about number of friends with even more friends!
                         • Cannot have too many (can make this formal)




                                                            22                         Sergei Vassilvitskii

Thursday, March 14, 13
Break




                         23   Sergei Vassilvitskii

Thursday, March 14, 13
Working in Parallel

              MapReduce (review):


              Map:
                – Decide how to group the data for computation

              Reduce:
                – Given the grouping, perform the computation




                                               24                Sergei Vassilvitskii

Thursday, March 14, 13
Building People You May Know

              Friendships are undirected:
                – If Alice knows Bob, Bob knows Alice
                – Data stored as a list of all edges
                – Find all friends of friends
                – Score the possible pairs




                                                   25   Sergei Vassilvitskii

Thursday, March 14, 13
Data

              Suppose you have edges and degrees of each vertex:


              Joe        56    Mary       78
              Alice 398        Bob    198
              Dan        983   Justin 11,985,234
              ...


              An alternate view may be data stored as adjacency list:
              Joe         56    Mary 78        Don   99   Bill 1
              Alice 398         Kate 55        Bob 198    Mary 78
              ...


                                                26                  Sergei Vassilvitskii

Thursday, March 14, 13
Previous Algorithm

              Adjacency list input.
                – Map:
                         • For each node and its neighbors, output all paths through the node
                – Reduce:
                         • none




                – Map: [          |         ]
                – Output:
                – Map: [          |    ]
                – Output: None
                                                            27                                  Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles Better

              Idea [Schank ’07]
                – Only pivot on nodes who have smaller degrees than both neighbors.
                – Neighbors of high degree nodes tend to have small degrees




                                                28                        Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Data Needed:
                – Central node
                – Neighbors that have higher degree




                                                29    Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Data Needed:
                – Central node
                – Neighbors that have higher degree




                                                30    Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Data Needed:
                – Central node
                – Neighbors that have higher degree




                                                31    Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Data Needed:
                – Central node
                – Neighbors that have higher degree




                – Orient each edge to point to a node of higher degree, breaking ties
                  arbitrarily but consistently




                                                 32                          Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Map:
                – Orient each edge to point to a node of higher degree, breaking ties
                  arbitrarily but consistently
                – Given: Joe      56    Mary         78
                – Output: <Key    = Joe, Value      = Mary>
                – Given: Alice    398   Bob         198
                – Output: <Key    = Bob, Value      = Alice>

                     map(key, value):
                        split = value.split()
                        if split[3] > split[1] or
                          (split[3] == split[1] and split[0] < split[2]):
                             emit(split[0], split[2])
                        if split[3] < split[1] or
                          (split[3] == split[1] and split[0] > split[2]):
                             emit(split[2], split[0])


                                                 33                          Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Aggregate (Shuffle):
                – Collect all values with same key (nodes with higher degree)




              Computation:
                – Generate all 2-paths (friend of a friend relationships):




                                                   34                           Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Aggregate (Shuffle):
                – Collect all values with same key (nodes with higher degree)




              Computation:
                – Generate all 2-paths (friend of a friend relationships):
                – Generate all 2-paths:       ,         ,




                                                   35                           Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Aggregate (Shuffle):
                – Collect all values with same key (nodes with higher degree)

              Computation:
                – Generate all 2-paths (friend of a friend relationships)
                – Given: key= Joe, value={Mary, Justin, Alice}
                – Output:
                         • (key = Joe, Value = (Mary, Justin))
                         • (key = Joe, Value = (Mary, Alice))
                         • (key = Joe, Value = (Justin, Alice))

                    reduce(key, values):
                       for friend1 : values
                          for friend2 : values
                             emit(key, (friend1, friend2))


                                                    36                          Sergei Vassilvitskii

Thursday, March 14, 13
Comparing Algorithms

              Edgelist MapOnly Algorithm:
                – MapOnly
                – Output from some nodes is quadratic


              Edge at a time Algroithm:
                – Map & Reduce
                – More balanced output from each node




                                               37       Sergei Vassilvitskii

Thursday, March 14, 13
Scoring

              Some suggestions are better than others:
                – Some people are already friends!
                – Or they used to be friends...
                – Connected through a friend with 1000s of friends
                – Connected through multiple friends
                – ...




                                                  38                 Sergei Vassilvitskii

Thursday, March 14, 13
Spring Break!




                         39   Sergei Vassilvitskii

Thursday, March 14, 13

More Related Content

Viewers also liked

Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIjakehofman
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overviewjakehofman
 
Auto Elektrikoen Aurkezpena
Auto Elektrikoen AurkezpenaAuto Elektrikoen Aurkezpena
Auto Elektrikoen Aurkezpenaguest40206e1d
 
Proyecto De FormacióN E InvestigacióNúCleo Social Cultural
Proyecto De FormacióN E InvestigacióNúCleo Social CulturalProyecto De FormacióN E InvestigacióNúCleo Social Cultural
Proyecto De FormacióN E InvestigacióNúCleo Social Culturalguest2a9b0b0
 
Student textual analysis
Student textual analysisStudent textual analysis
Student textual analysisGudj
 
Tic ted bravo nicolás
Tic ted bravo nicolásTic ted bravo nicolás
Tic ted bravo nicolásNico Bravo
 
Поетична свічка
Поетична свічкаПоетична свічка
Поетична свічкаDoom Doom
 
SEPM Outsourcing
SEPM OutsourcingSEPM Outsourcing
SEPM Outsourcingasherad
 

Viewers also liked (20)

Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part II
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Counting
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Auto Elektrikoen Aurkezpena
Auto Elektrikoen AurkezpenaAuto Elektrikoen Aurkezpena
Auto Elektrikoen Aurkezpena
 
Iso 9001 2008 - qms
Iso 9001  2008 - qmsIso 9001  2008 - qms
Iso 9001 2008 - qms
 
лабар5
лабар5лабар5
лабар5
 
Proyecto De FormacióN E InvestigacióNúCleo Social Cultural
Proyecto De FormacióN E InvestigacióNúCleo Social CulturalProyecto De FormacióN E InvestigacióNúCleo Social Cultural
Proyecto De FormacióN E InvestigacióNúCleo Social Cultural
 
Presentación del centro Don Juan I
Presentación del centro Don  Juan IPresentación del centro Don  Juan I
Presentación del centro Don Juan I
 
звіти
звітизвіти
звіти
 
Student textual analysis
Student textual analysisStudent textual analysis
Student textual analysis
 
Tic ted bravo nicolás
Tic ted bravo nicolásTic ted bravo nicolás
Tic ted bravo nicolás
 
Evaluation Q7
Evaluation Q7Evaluation Q7
Evaluation Q7
 
Поетична свічка
Поетична свічкаПоетична свічка
Поетична свічка
 
Rukovodstvo po montazhu_i_nastrojke_touch_board_i_
Rukovodstvo po montazhu_i_nastrojke_touch_board_i_Rukovodstvo po montazhu_i_nastrojke_touch_board_i_
Rukovodstvo po montazhu_i_nastrojke_touch_board_i_
 
Virus y antivirus
Virus y antivirusVirus y antivirus
Virus y antivirus
 
SEPM Outsourcing
SEPM OutsourcingSEPM Outsourcing
SEPM Outsourcing
 
Lake maggiore
Lake maggioreLake maggiore
Lake maggiore
 

More from jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networksjakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studiesjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbitjakehofman
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10jakehofman
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09jakehofman
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brainjakehofman
 

More from jakehofman (16)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brain
 

Recently uploaded

ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 

Recently uploaded (20)

ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 

Computational Social Science, Lecture 08: Counting Fast, Part II

  • 1. Counting Fast (Part II) Sergei Vassilvitskii Columbia University Computational Social Science March 8, 2013 Thursday, March 14, 13
  • 2. Last time Counting fast: – Quadratic time doesn’t scale – Sorting is slightly more than linear – Hashing allows you to do membership queries in constant time 2 Sergei Vassilvitskii Thursday, March 14, 13
  • 3. Today Counting on Networks: – Large Graphs: Internet, Facebook, Twitter – Recommendation Graphs: Netflix, Amazon, etc. 3 Sergei Vassilvitskii Thursday, March 14, 13
  • 4. Friends & Followers Given a network: – When do people become friends? – What factors influence this? 4 Sergei Vassilvitskii Thursday, March 14, 13
  • 5. Friends & Followers Given a network: – When do people become friends? – What factors influence this? Products: – People You May Know (PYMK). Reconnect people, help new users 5 Sergei Vassilvitskii Thursday, March 14, 13
  • 6. Friends & Followers Given a network: – When do people become friends? – What factors influence this? Products: – People You May Know (PYMK). Reconnect people, help new users – Twitter’s who to follow? 6 Sergei Vassilvitskii Thursday, March 14, 13
  • 7. Friends & Followers Given a network: – When do people become friends? – What factors influence this? Products: – People You May Know (PYMK). Reconnect people, help new users – Twitter’s who to follow? Recommendations: – Netflix, Amazon, etc. (Future lectures) 7 Sergei Vassilvitskii Thursday, March 14, 13
  • 8. Triadic Closure Likely to become friends with: – People in similar groups – Friends of friends 8 Sergei Vassilvitskii Thursday, March 14, 13
  • 9. Defining Tight Knit Circles Looking for tight-knit circles: – People whose friends are friends themselves Why? – Network Cohesion: Tightly knit communities foster more trust, social norms. [Coleman ’88, Portes ’88] – Structural Holes: Individuals benefit form bridging [Burt ’04, ’07] 9 Sergei Vassilvitskii Thursday, March 14, 13
  • 10. Clustering Coefficient vs. 10 Sergei Vassilvitskii Thursday, March 14, 13
  • 11. Clustering Coefficient cc ( ) = 0.5 cc ( ) = 0.1 vs. Given an undirected graph - For each node, it’s the fraction of v’s neighbors who are neighbors themselves - Identical to the number of triangles containing the node 11 Sergei Vassilvitskii Thursday, March 14, 13
  • 12. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=0 12 Sergei Vassilvitskii Thursday, March 14, 13
  • 13. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=1 w u 13 Sergei Vassilvitskii Thursday, March 14, 13
  • 14. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=1 w u 14 Sergei Vassilvitskii Thursday, March 14, 13
  • 15. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ Running time: – For each vertex, look at all pairs of neighbors – Number of pairs ~ quadratic in the degree of the vertex – What happens if the degree is very large? 15 Sergei Vassilvitskii Thursday, March 14, 13
  • 16. Parallel Version But use 1,000 machines! – Quadratic algorithms still don’t scale – Simple parallelization: process each vertex separately Naive parallelization does not help with data skew – Some nodes will have very high degree – Example. 3.2 Million followers, must generate 10 Trillion (10^13) potential edges to check. – Even if generating 100M edges per second this is 100K seconds ~ 27 hours for one vertex! 16 Sergei Vassilvitskii Thursday, March 14, 13
  • 17. “Just 5 more minutes” On the LiveJournal Graph (5M nodes, 70M edges) – 80% of vertices are done after 5 min – 99% done after 35 min 17 Sergei Vassilvitskii Thursday, March 14, 13
  • 18. Adapting the Algorithm Approach 1: Dealing with skew directly – currently every triangle counted 3 times (once per vertex) – Running time quadratic in the degree of the vertex – Idea: Count each once, from the perspective of lowest degree vertex – Does this heuristic work? 18 Sergei Vassilvitskii Thursday, March 14, 13
  • 19. How to Count Triangles Better Idea [Schank ’07] – Only pivot on nodes who have smaller degrees than both neighbors. – Neighbors of high degree nodes tend to have small degrees 19 Sergei Vassilvitskii Thursday, March 14, 13
  • 20. How to Count Triangles Better foreach v in V foreach u in Adjacency(v) with deg(u) > deg(v): foreach w in Adjacency(v) with deg(w) > deg(v): if (u,w) is an edge: Triangles[v]++ Triangles[w]++ Triangles[u]++ 20 Sergei Vassilvitskii Thursday, March 14, 13
  • 21. Does it make a difference? 21 Sergei Vassilvitskii Thursday, March 14, 13
  • 22. Why does it help? Look at two different kinds of nodes: – Few friends: • OK to be quadratic on small instances – Lots of friends • Only care about number of friends with even more friends! • Cannot have too many (can make this formal) 22 Sergei Vassilvitskii Thursday, March 14, 13
  • 23. Break 23 Sergei Vassilvitskii Thursday, March 14, 13
  • 24. Working in Parallel MapReduce (review): Map: – Decide how to group the data for computation Reduce: – Given the grouping, perform the computation 24 Sergei Vassilvitskii Thursday, March 14, 13
  • 25. Building People You May Know Friendships are undirected: – If Alice knows Bob, Bob knows Alice – Data stored as a list of all edges – Find all friends of friends – Score the possible pairs 25 Sergei Vassilvitskii Thursday, March 14, 13
  • 26. Data Suppose you have edges and degrees of each vertex: Joe 56 Mary 78 Alice 398 Bob 198 Dan 983 Justin 11,985,234 ... An alternate view may be data stored as adjacency list: Joe 56 Mary 78 Don 99 Bill 1 Alice 398 Kate 55 Bob 198 Mary 78 ... 26 Sergei Vassilvitskii Thursday, March 14, 13
  • 27. Previous Algorithm Adjacency list input. – Map: • For each node and its neighbors, output all paths through the node – Reduce: • none – Map: [ | ] – Output: – Map: [ | ] – Output: None 27 Sergei Vassilvitskii Thursday, March 14, 13
  • 28. How to Count Triangles Better Idea [Schank ’07] – Only pivot on nodes who have smaller degrees than both neighbors. – Neighbors of high degree nodes tend to have small degrees 28 Sergei Vassilvitskii Thursday, March 14, 13
  • 29. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 29 Sergei Vassilvitskii Thursday, March 14, 13
  • 30. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 30 Sergei Vassilvitskii Thursday, March 14, 13
  • 31. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 31 Sergei Vassilvitskii Thursday, March 14, 13
  • 32. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree – Orient each edge to point to a node of higher degree, breaking ties arbitrarily but consistently 32 Sergei Vassilvitskii Thursday, March 14, 13
  • 33. Want to compute all open triads Map: – Orient each edge to point to a node of higher degree, breaking ties arbitrarily but consistently – Given: Joe 56 Mary 78 – Output: <Key = Joe, Value = Mary> – Given: Alice 398 Bob 198 – Output: <Key = Bob, Value = Alice> map(key, value): split = value.split() if split[3] > split[1] or (split[3] == split[1] and split[0] < split[2]): emit(split[0], split[2]) if split[3] < split[1] or (split[3] == split[1] and split[0] > split[2]): emit(split[2], split[0]) 33 Sergei Vassilvitskii Thursday, March 14, 13
  • 34. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships): 34 Sergei Vassilvitskii Thursday, March 14, 13
  • 35. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships): – Generate all 2-paths: , , 35 Sergei Vassilvitskii Thursday, March 14, 13
  • 36. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships) – Given: key= Joe, value={Mary, Justin, Alice} – Output: • (key = Joe, Value = (Mary, Justin)) • (key = Joe, Value = (Mary, Alice)) • (key = Joe, Value = (Justin, Alice)) reduce(key, values): for friend1 : values for friend2 : values emit(key, (friend1, friend2)) 36 Sergei Vassilvitskii Thursday, March 14, 13
  • 37. Comparing Algorithms Edgelist MapOnly Algorithm: – MapOnly – Output from some nodes is quadratic Edge at a time Algroithm: – Map & Reduce – More balanced output from each node 37 Sergei Vassilvitskii Thursday, March 14, 13
  • 38. Scoring Some suggestions are better than others: – Some people are already friends! – Or they used to be friends... – Connected through a friend with 1000s of friends – Connected through multiple friends – ... 38 Sergei Vassilvitskii Thursday, March 14, 13
  • 39. Spring Break! 39 Sergei Vassilvitskii Thursday, March 14, 13