SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Networkx & Gephi Tutorial
          #pydata
     Gilad Lotan | @gilgul
link
#gayrights, #lgbt, #jesus,                          #palestine, #OWS, #immigration,
#flipflop, #jobs, #economy                          #abortion
                             #republican, #dems,
                             #economics, #amnesty
#Debates / Ohio
#Debates / Ohio


Politicos




            Ohio based Media




            OSU Students
• Node network properties
  – from immediate connections
                                                                     indegree=3
    • indegree
      how many directed edges (arcs) are incident on a node
                                                                   outdegree=2
    • outdegree
      how many directed edges (arcs) originate at a node
                                                                     degree=5

    • degree (in or out)
      number of edges incident on a node


  – from the entire graph
    • centrality (betweenness, closeness)

                                                  Source: Lada Adamic (SI508-F08)
Example Graph Types
• Complete Graph



• Bipartite Graph
  – Vertices can be divided into two disjoint sets
  – Ex: students & schools
Social Network Attributes
• Scale Free
  – Degree distribution follows a power law
  – Barabasi et al (‘99): mapped the topology of a portion
    of the web



• Small World
  – Most nodes are not neighbors, but can be reached by
    small number of hops
  – Watts & Strogatz (’98)
  – Properties: cliques, sub networks with high clustering
    coefficient, most pairs of nodes connected by at least
    one short path
(Zachary) Karate club graph

                              social network of friendships
                              between 34 members of a karate
                              club at a US university in the
                              1970s.

                              Standard test network for
                              clustering algorithms -> during
                              the observation period the club
                              broke up into two separate clubs
                              over a conflict.
Graph Measures
• Centrality
  – Betweenness
  – Closeness
  – Eigenvector
  – Degree


• Clustering Coefficient (clique)
• Modularity
Graph Layout
• Open Ord
  – Better distinguishes clusters
• Yifan Hu
• Force Atlas
• Fruchterman Reingold
  – Graph as a system of mass particles
    (nodes:particles, edges:springs)
Networkx
Graph Generators
Generate Twitter Graph
graphml file



               nodes




               edges
Twitter Users with Python in their Bios
• 2 days of Twitter data (Oct 24th and 25th)
• Total: 4246 users (62k tweets)
• @mikanyan1 tweeted 795 times
Pythonistas on
    Twitter
Pythonistas on
                                                 Twitter
                                                  Spanish Speakers
              English / European


                                                                Chinese




Python
(the snake)


                                                     Japanese




                        Musicians, Artists
Twitter User Community: Data Science
• Grepped from Twitter bios over 1 week:
"data science|data scientist|machine learning|data strateg”


• 1053 Users
• 14k Tweets
• Most tweeting users:
   – @data_nerd (659)
   – @Chantel_Esworth (562)
   – @Da5_12 (253)
Dataists on Twitter
Thank You

   Gilad Lotan
 Twitter: @gilgul
Github: giladlotan

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (7)

Trees and Hierarchies in SQL
Trees and Hierarchies in SQLTrees and Hierarchies in SQL
Trees and Hierarchies in SQL
 
Google PageRank
Google PageRankGoogle PageRank
Google PageRank
 
Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)
 
ISO 50001:2018 條文|中文版下載|捷思顧問
ISO 50001:2018 條文|中文版下載|捷思顧問ISO 50001:2018 條文|中文版下載|捷思顧問
ISO 50001:2018 條文|中文版下載|捷思顧問
 
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
 
Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphs
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 

Andere mochten auch

NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHug
Jimmy Lai
 
A comparative study of social network analysis tools
A comparative study of social network analysis toolsA comparative study of social network analysis tools
A comparative study of social network analysis tools
David Combe
 
Ch01 네트워크와+소켓+프로그래밍+[호환+모드]
Ch01 네트워크와+소켓+프로그래밍+[호환+모드]Ch01 네트워크와+소켓+프로그래밍+[호환+모드]
Ch01 네트워크와+소켓+프로그래밍+[호환+모드]
지환 김
 

Andere mochten auch (9)

Clustering of graphs and search of assemblages
Clustering of graphs and search of assemblagesClustering of graphs and search of assemblages
Clustering of graphs and search of assemblages
 
Network stats using Gephi
Network stats using GephiNetwork stats using Gephi
Network stats using Gephi
 
[20140830, Pycon2014] NetworkX를 이용한 네트워크 분석
[20140830, Pycon2014] NetworkX를 이용한 네트워크 분석[20140830, Pycon2014] NetworkX를 이용한 네트워크 분석
[20140830, Pycon2014] NetworkX를 이용한 네트워크 분석
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHug
 
A comparative study of social network analysis tools
A comparative study of social network analysis toolsA comparative study of social network analysis tools
A comparative study of social network analysis tools
 
Gephi Quick Start (Japanese)
Gephi Quick Start (Japanese)Gephi Quick Start (Japanese)
Gephi Quick Start (Japanese)
 
Ch01 네트워크와+소켓+프로그래밍+[호환+모드]
Ch01 네트워크와+소켓+프로그래밍+[호환+모드]Ch01 네트워크와+소켓+프로그래밍+[호환+모드]
Ch01 네트워크와+소켓+프로그래밍+[호환+모드]
 
Gephi Quick Start
Gephi Quick StartGephi Quick Start
Gephi Quick Start
 

Ähnlich wie Networkx & Gephi Tutorial #Pydata NYC

Social Networks and Computer Science
Social Networks and Computer ScienceSocial Networks and Computer Science
Social Networks and Computer Science
dragonmeteor
 
Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012
CameliaN
 
4C13 J.15 Larson "Twitter based discourse community"
4C13 J.15 Larson "Twitter based discourse community"4C13 J.15 Larson "Twitter based discourse community"
4C13 J.15 Larson "Twitter based discourse community"
rhetoricked
 
Social network analysis basics
Social network analysis basicsSocial network analysis basics
Social network analysis basics
Pradeep Kumar
 
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Steve Kramer
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
Liang Xiang
 

Ähnlich wie Networkx & Gephi Tutorial #Pydata NYC (20)

Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detection
 
Social Networks and Computer Science
Social Networks and Computer ScienceSocial Networks and Computer Science
Social Networks and Computer Science
 
The Slashdot Zoo: Mining a Social Network with Negative Edges
The Slashdot Zoo:  Mining a Social Network with Negative EdgesThe Slashdot Zoo:  Mining a Social Network with Negative Edges
The Slashdot Zoo: Mining a Social Network with Negative Edges
 
Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
 
2010 june - personal democracy forum - marc smith - mapping political socia...
2010   june - personal democracy forum - marc smith - mapping political socia...2010   june - personal democracy forum - marc smith - mapping political socia...
2010 june - personal democracy forum - marc smith - mapping political socia...
 
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
 
PMED Undergraduate Workshop - Communities & Classification in Disease Data -...
PMED Undergraduate Workshop - Communities & Classification in Disease Data  -...PMED Undergraduate Workshop - Communities & Classification in Disease Data  -...
PMED Undergraduate Workshop - Communities & Classification in Disease Data -...
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social Networks
 
4C13 J.15 Larson "Twitter based discourse community"
4C13 J.15 Larson "Twitter based discourse community"4C13 J.15 Larson "Twitter based discourse community"
4C13 J.15 Larson "Twitter based discourse community"
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis
 
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
20121001 pawcon 2012-marc smith - mapping collections of connections in socia...
 
Social network analysis basics
Social network analysis basicsSocial network analysis basics
Social network analysis basics
 
2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial
 
Network Construction and Visualization.pdf
Network Construction and Visualization.pdfNetwork Construction and Visualization.pdf
Network Construction and Visualization.pdf
 
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通
 
Cytoscape Network Visualization and Analysis
Cytoscape Network Visualization and AnalysisCytoscape Network Visualization and Analysis
Cytoscape Network Visualization and Analysis
 

Mehr von Gilad Lotan

Cbnweekly big data
Cbnweekly big data Cbnweekly big data
Cbnweekly big data
Gilad Lotan
 
SocialFlow - 140confNYC
SocialFlow - 140confNYCSocialFlow - 140confNYC
SocialFlow - 140confNYC
Gilad Lotan
 
Understanding the Hebrew Blogosphere
Understanding the Hebrew BlogosphereUnderstanding the Hebrew Blogosphere
Understanding the Hebrew Blogosphere
Gilad Lotan
 
Gilad Presentation on Digital Hollywood
Gilad Presentation on Digital HollywoodGilad Presentation on Digital Hollywood
Gilad Presentation on Digital Hollywood
Gilad Lotan
 

Mehr von Gilad Lotan (11)

Gilad Lotan, News Xchange 2014, Algorithmic Power
Gilad Lotan, News Xchange 2014, Algorithmic PowerGilad Lotan, News Xchange 2014, Algorithmic Power
Gilad Lotan, News Xchange 2014, Algorithmic Power
 
Data Science of Messy Metrics
Data Science of Messy MetricsData Science of Messy Metrics
Data Science of Messy Metrics
 
A Networked Take on Influence: what we learn from data
A Networked Take on Influence: what we learn from dataA Networked Take on Influence: what we learn from data
A Networked Take on Influence: what we learn from data
 
Networked Audiences: what we learn from data / Gilad Lotan / IPZ2012
Networked Audiences: what we learn from data / Gilad Lotan / IPZ2012Networked Audiences: what we learn from data / Gilad Lotan / IPZ2012
Networked Audiences: what we learn from data / Gilad Lotan / IPZ2012
 
Cbnweekly big data
Cbnweekly big data Cbnweekly big data
Cbnweekly big data
 
SocialFlow - 140confNYC
SocialFlow - 140confNYCSocialFlow - 140confNYC
SocialFlow - 140confNYC
 
Seeing your Invisible Audience
Seeing your Invisible AudienceSeeing your Invisible Audience
Seeing your Invisible Audience
 
Understanding the Hebrew Blogosphere
Understanding the Hebrew BlogosphereUnderstanding the Hebrew Blogosphere
Understanding the Hebrew Blogosphere
 
Times Open
Times OpenTimes Open
Times Open
 
imPulse: materials and interactive design
imPulse: materials and interactive designimPulse: materials and interactive design
imPulse: materials and interactive design
 
Gilad Presentation on Digital Hollywood
Gilad Presentation on Digital HollywoodGilad Presentation on Digital Hollywood
Gilad Presentation on Digital Hollywood
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Networkx & Gephi Tutorial #Pydata NYC

  • 1. Networkx & Gephi Tutorial #pydata Gilad Lotan | @gilgul
  • 3.
  • 4.
  • 5.
  • 6. #gayrights, #lgbt, #jesus, #palestine, #OWS, #immigration, #flipflop, #jobs, #economy #abortion #republican, #dems, #economics, #amnesty
  • 8. #Debates / Ohio Politicos Ohio based Media OSU Students
  • 9. • Node network properties – from immediate connections indegree=3 • indegree how many directed edges (arcs) are incident on a node outdegree=2 • outdegree how many directed edges (arcs) originate at a node degree=5 • degree (in or out) number of edges incident on a node – from the entire graph • centrality (betweenness, closeness) Source: Lada Adamic (SI508-F08)
  • 10. Example Graph Types • Complete Graph • Bipartite Graph – Vertices can be divided into two disjoint sets – Ex: students & schools
  • 11.
  • 12. Social Network Attributes • Scale Free – Degree distribution follows a power law – Barabasi et al (‘99): mapped the topology of a portion of the web • Small World – Most nodes are not neighbors, but can be reached by small number of hops – Watts & Strogatz (’98) – Properties: cliques, sub networks with high clustering coefficient, most pairs of nodes connected by at least one short path
  • 13. (Zachary) Karate club graph social network of friendships between 34 members of a karate club at a US university in the 1970s. Standard test network for clustering algorithms -> during the observation period the club broke up into two separate clubs over a conflict.
  • 14. Graph Measures • Centrality – Betweenness – Closeness – Eigenvector – Degree • Clustering Coefficient (clique) • Modularity
  • 15. Graph Layout • Open Ord – Better distinguishes clusters • Yifan Hu • Force Atlas • Fruchterman Reingold – Graph as a system of mass particles (nodes:particles, edges:springs)
  • 19.
  • 20. graphml file nodes edges
  • 21. Twitter Users with Python in their Bios • 2 days of Twitter data (Oct 24th and 25th) • Total: 4246 users (62k tweets) • @mikanyan1 tweeted 795 times
  • 22. Pythonistas on Twitter
  • 23. Pythonistas on Twitter Spanish Speakers English / European Chinese Python (the snake) Japanese Musicians, Artists
  • 24.
  • 25. Twitter User Community: Data Science • Grepped from Twitter bios over 1 week: "data science|data scientist|machine learning|data strateg” • 1053 Users • 14k Tweets • Most tweeting users: – @data_nerd (659) – @Chantel_Esworth (562) – @Da5_12 (253)
  • 27. Thank You Gilad Lotan Twitter: @gilgul Github: giladlotan

Hinweis der Redaktion

  1. Homophily
  2. Endogenous Trend – information spread
  3. Exogenous information spread
  4. Hashtags have emerged as a way for people to gather around topics or events.
  5. - Mitt romney: #gayrights, #lgbt, #jesus, #flipflop, #jobs, #economy- Newt Gingrich: #palestine, #OWS, #immigration, #abortion (he famously said – “Stop whining, take a bath and get a job!”Equal: #republican, #dems, #economics, #amnestyCo-occurence
  6. Networkx supports
  7. Zachary's Karate Club Graph describes the friendships between the members of a US karate club in the 1970s. The significant feature of this social network is that the club president and the instructor were involved in a dispute (some might say: a fight) over the issue of how much to charge for lessons. This split the club into two factions, one centred around the president, and the other centred around the instructor.
  8. Betweenness – number of shortest paths from all vertices that pass through that node / positioningCloseness – how fast it will take to spread information from s to all other nodes sequentially / distance of s from all other actors in a networkEigenvector – measure of the influence of a node (page rank, connections to high scoring nodes contribute more to the score)Clustering Coefficient – measure of degree to which nodes in a graph tend to cluster together (how close to being a clique = 1)
  9. NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and function of complex networks.NetworkX was born in May 2002. The original version was designed and written by AricHagberg, Dan Schult, and Pieter Swart in 2002 and 2003. The first public release was in April 2005.
  10. Python – user description2 days of Twitter data-