SlideShare a Scribd company logo
1 of 52
Social Network Analysis
An overview
Presentation by @dougneedham
Introduction
 @dougneedham
 Data Guy - Started as a DBA in the Marine Corps, evolved to Architect,
now Data Scientist.
 Oracle, SQL Server, Cassandra, Hadoop, MySQL, Spark.
 I have a strong relational/traditional background.
 Perpetual Student
 Learning new things challenges our assumptions. Forces us to take a
new perspective on “old” problems. Eventually maybe even shows us
that there is a better way to solve a problem.
Why study social networks?
 It is cool.
 The concepts around Social Network Analysis can be applied to many
interesting problems in a variety of business verticals.
 The foundation of Social Network Analysis is Graph theory.
 Solving Crime
 Some examples: Introduction to Graph_Theory
What is Social Network Analysis?
 “Social network analysis (SNA) is a strategy for investigating social
structures through the use of network and graph theories. It
characterizes networked structures in terms of nodes (individual actors,
people, or things within the network) and the ties or edges
(relationships or interactions) that connect them. Examples of social
structures commonly visualized through social network analysis include
social media networks, friendship and acquaintance networks, kinship,
disease transmission, and sexual relationships. These networks are often
visualized through sociograms in which nodes are represented as points
and ties are represented as lines.” – Wikipedia
 https://en.wikipedia.org/wiki/Social_network_analysis
Example From wiki:
"Kencf0618FacebookNetwork" by
Kencf0618 - Own work. Licensed under
CC BY-SA 3.0 via Wikimedia Commons -
https://commons.wikimedia.org/wiki/File:
Kencf0618FacebookNetwork.jpg#/medi
a/File:Kencf0618FacebookNetwork.jpg
A little History
 The 7 Bridges of Konisberg
 Every tome on Graph theory or Network analysis devotes a small
portion of there time to the 7 Bridges of Konisberg.
 If I don’t cover this with you, the gods of mathematics will strike me
down, and never allow me to do analysis again in the future.
The Bridges
The Problem
 Folks enjoyed there Sunday afternoon strolls across the bridges, but
occasionally people would wonder if one particular route was more
efficient than another.
 Eventually Leonhard Euler was brought into the debate about the
efficiency problem.
 Euler used Vertices to represent the land masses and edges (or arcs, at
the time) to represent bridges. He realized the odd number of edges
per vertex made the problem unsolvable.
 Sarada Herke provides for one of the best explanations of the solution
Solution to Konisburg
 And here is the cool thing about mathematicians. If we tell you
something is impossible, we have to tell you why in a way you can
understand it. But he also invented the branch of mathematics today
we call Graph Theory.
 http://en.wikipedia.org/wiki/Leonhard_Euler
Why analyze Facebook data?
 Facebook is something that most people use.
 It is easy to see the relationships and the concepts of the
Graph/Network are intuitive to people who are looking at their “own”
network.
 The main idea is, if you can understand your own friend data, you can
learn the concepts quickly, then apply these same concepts to more
complicated problems.
 We will talk a little about some complicated topics at the end.
A few terms
 Stand back, we are going to talk about math!
 Basically we are talking about a bunch of dots joined together by lines
 Vertex – Dot on a graph
 Edge – Line connecting the two points
 Edge_Label – this is a term I coined originally related to Data Structure Graphs that
helps trace a path. If you label your edges, and you have multiple edges with the same
label in a Graph you can quite easily identify walks, paths, and cycles through your
graph.
 Triangle – 3 Vertices, 3 Edges
 Square – 4 Vertices, 4 edges
 Open Triangle - 3 Vertices, 2 edges /
 A lot of things are networks if you look at them the right way.
 Mark Newman has done a number of well done presentations, available on Youtube
about Network analysis.
 https://www.youtube.com/watch?v=lETt7IcDWLI
More terms
 Transitivity – The friend of my friend is my friend. Really?
 Homophily – how things are similar
 Directed Graphs – or Digraphs
 Contagion – How do things “spread” through a network?
 Let’s rearrange things, how does the layout affect understanding?
 Order of a graph – number of vertices
 Size of the graph – number of edges
 This is not just data visualization, it can also be used for prediction.
https://www.youtube.com/watch?v=rwA-y-XwjuU
Final terms
 Centrality – Hub and Authority
 This is almost a whole topic by itself, since there are different types of
Centrality:
 Degree Centrality – Simple, the Vertex with the most degrees is the most
central.
 Eigenvector Centrality – How important a particular Vertex is to a given
network.
 PageRank – similar to Eigenvector Centrality, only scaled, and if a given
vertex is closely connected to very high PageRank vertex, it is itself given a
high PageRank.
 Serious nutshell definitions.
 Shortest path – How are two vertices connected?
 Longest Path – Tracing the flow of an interesting item through a large
collection of applications.
Why is a path important? More on this
later…
The Original Joke This is me in different stores
The Math doesn’t change.
 One thing I like about Graphs –
 The Math does not change.
 The math behind Graph theory can be a little intense, but it does not
change regardless of the scale of the graph.
 Once you understand how to “do the math” on a small graph, those
same Maths apply to a Graph whether it is a graph of the people in this
room, or a graph of the people on this planet.
 Now, let me introduce you to a tool that does much of the
Mathematics for you…
But first, Netvizz…
 Netvizz is a tool that extracts data from different sections of the Facebook Platform.
 It provides an interface to the Facebook Graph API
 https://www.youtube.com/watch?v=3vkKPcN7V7Q
 For the version of data we will be looking at, I was able to extract friendship connections.
Facebook has since changed their permissions such that you can no longer extract this
information.
 However, there are some other interesting things you can do with Netvizz.
 If you manage a Facebook Group, this might be interesting.
 For this particular talk we are going to focus on Gephi interpretation. If we want to have a
more in-depth talk on Facebook and the Graph API that Facebook has opened, we can
discuss that at another time.
 To get this yourself go into Facebook and search for: Netvizz. (You have to authorize it. You
can un-authorized it later)
 You will have a number of options: group data, page data, page like network, search, and
link stats.
 Click “group data”
 Select a group if you need a sample id use: 39462256584
 It runs for a bit, then dumps to a zip file.
 Save the file, then extract it.
 Open Gephi, and use Gephi to import your GDF file.
Gephi
http://gephi.github.io/
From the website: “Gephi is an
interactive visualization and exploration
platform for all kinds of networks and
complex systems, dynamic and
hierarchical graphs.”
Java 1.7 required, you may have to set
this in Gephi.conf
Depending on the size of the network
you are studying you may need to
increase the memory available to Java
in Gephi.conf
Gephi Startup
Gephi – Open GML file
Gephi – After opening
Layout
Behavior Options
After running
Partitioning
Metrics
 Remember all those numbers we spoke about?
 Here are many of them.
Data Table
Configure Labels
Here is the layout with the labels as number of connections
Add Background
Visualization
File->Export-> SVG/PDF/PNG…
Export to Excel
How do we use this?
 Finding bottlenecks.
 You have to ignore the fact that everyone on this graph is connected
to you for a moment.
 How would someone get a message to another given person?
 They would have to pass it to someone either they both know, or pass
the message to someone who is more likely to be connected to the
target of the message.
 This was the heart of Milgram’s experiment that gave us the concept of
6 degrees of separation.
Other Analysis
 What else can be done with Social Network Analysis?
 How about risk exposure to banks?
 http://www.federalreserve.gov/newsevents/speech/yellen20130104a.htm
Application to Business Intelligence
 What if the Vertices are not people ?
 What if the Edges are not mutual connections?
 Jonathan and others over the past few meetings have done a great
job at explaining the underpinnings of how a particular BI framework is
put together.
 Within a Data Architecture there are lots of moving pieces. ETL, FTP,
SFTP, Web-Services, External data feeds. Data moving into Data Marts,
and Data Warehouses. Data Moving between applications.
 Let’s imagine how to visualize this using the information we just gained.
Data Structure Graph
 A Data Structure Graph is a group of atomic entities that are related to
each other, stored in a repository, then moved from one persistence
layer to another, rendered as a Graph.
 A group of atomic entities.
 Related to each other.
 Stored in a repository.
 Moved from one persistence layer to another.
 Rendered as a Graph.
Introducing Data Structure Graphs
 Data Structure Graph Level 1 (DSG-L1)– This is roughly like an Entity
Relationship Diagram (ERD) Tables are Vertices, Foreign Keys are Edges.
 Data Structure Graph Level 2 (DSG-L2) – Each Vertex in this graph is an
application. Each Edge is data transfer. Roughly equivalent to what we
used to call Data Flow diagrams.
 Data Structure Graph Dependency (DSG-D) – Each vertex is a
job,script, program, or process that is dependent on something
happening in sequence before it can do its work.
 A DSG-L1 can show you where you are going to have the most
interesting query performance of your tables.
 A DSG-L2 can show you where the most amount of work is going on in
your Enterprise.
 A DSG-D can show you the sequence of events that need to take
place in order for something to be completed.
New Project, Data Table, Import data.
Load as “Edges Table” Source, Target (required)
Choose Create Missing Nodes
After a few calculations and layout runs
PageRank – Which application is most important?
A few more tweaks
Where is that Node with the highest PageRank?
Remember paths?
The Original Joke This is me in different stores
Dijkstra's algorithm
 Some of you may have heard of Dijkstra’s algorithm.
 It is a method for finding the shortest path between two nodes on a
Graph.
 This is a great optimization technique, but what if you need to find the
longest path?
 What “edge_label” has the most influence on my organization?
 Iterate through each Edge_Label, create a subgraph that consists of
only the nodes this Edge_Label touches, then calculate the diameter of
that Graph.
 The data point represented by a given Edge_label that has the longest
path has the most “value” to your organization.
https://dougneedham.shinyapps.io/DataStructureGraph
Hard to see, I know, but the top diagram is the “master graph”, the bottom image is a single Edge_Label. You
can see how an individual data entity flows through an organization.
My book
Goes through a number of examples for doing an Graph analysis of a fictional organization.
Consider the following:
 If you need assistance, send a message to the group, or contact me
directly (I am easy to find @dougneedham)
 Network/Graph Analysis is cool.
 It can show you some interesting things about your data that you may
not have considered.
 Due thought should be put towards a network analysis project.
 Organizing the data requires a bit of thought. (From -> To vertices is just
a start).
 Directed graph, undirected, bigraph? Setup work needs to be done.
 Tools help with the detailed calculations, and show the paths, walks,
etc.
What did I leave out?
 Graphs that change over time – What happens when you remove a single
Edge or Vertex?
 Growth of a Network – Erdos-Renyi versus Barabasi-Albert models (Random
versus Preferential Attachment)
 Scale Free networks – Graphs that conform to Power laws. (These are
intrinsically Social Networks, but I didn’t give much detail)
 Comparing two networks – If you have the same number of edges and
nodes, are two graphs the same? Is one graph an isomorphism of another?
 Contagion – Ceteris paribus how will things(information, virus’s,
data,disease…) spread through the network. (Since a DSG represents
different types of Edges based on Edge_Label, Contagion should not affect
this type of network entirely.)
 Large Graphs – GraphX a part of Apache Spark is best used for this
purpose.
 The strength of Weak Ties Paradox
 Social Capital
Finally… Want to do Data Science?
 Challenge for members of the audience.
 1. Download Gephi.
 2. Put together a simple CSV: Source, Target,Edge_Label that describes
your own data environment.
 3. Load it in Gephi and have Gephi run the metrics, and perform the auto
layout.
 4. Answer this question: Did you get what you expected?
 5. Get a colleague to do the same thing, compare the images. How similar
are they?
 Here is my hypothesis: If you have more than 5 data applications, including
Hadoop, and Data Warehouse infrastructure, your Graph will follow the
rules of preferential attachment. (To<->From ETL tools don’t count in the
analysis)
 Tweet me @dougneedham #DataStructureGraph (anonymized, of course.)
 What does your Graph look like?
Final Thoughts – Questions?

More Related Content

What's hot

Social network analysis
Social network analysisSocial network analysis
Social network analysisCaleb Jones
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
 
Big Data: Social Network Analysis
Big Data: Social Network AnalysisBig Data: Social Network Analysis
Big Data: Social Network AnalysisMichel Bruley
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithmsAlireza Andalib
 
Social Media Mining - Chapter 4 (Network Models)
Social Media Mining - Chapter 4 (Network Models)Social Media Mining - Chapter 4 (Network Models)
Social Media Mining - Chapter 4 (Network Models)SocialMediaMining
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network AnalysisPremsankar Chakkingal
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network AnalysisPatti Anklam
 
How to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4jHow to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4jNeo4j
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for GraphsDeepLearningBlr
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101librarianrafia
 
Network measures used in social network analysis
Network measures used in social network analysis Network measures used in social network analysis
Network measures used in social network analysis Dragan Gasevic
 
Social Media Mining - Chapter 7 (Information Diffusion)
Social Media Mining - Chapter 7 (Information Diffusion)Social Media Mining - Chapter 7 (Information Diffusion)
Social Media Mining - Chapter 7 (Information Diffusion)SocialMediaMining
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social mediarangesharp
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsNeo4j
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networksFrancisco Restivo
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine LearningDatabricks
 

What's hot (20)

Social network analysis
Social network analysisSocial network analysis
Social network analysis
 
Ppt
PptPpt
Ppt
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
Big Data: Social Network Analysis
Big Data: Social Network AnalysisBig Data: Social Network Analysis
Big Data: Social Network Analysis
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithms
 
Social Media Mining - Chapter 4 (Network Models)
Social Media Mining - Chapter 4 (Network Models)Social Media Mining - Chapter 4 (Network Models)
Social Media Mining - Chapter 4 (Network Models)
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
How to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4jHow to Build a Fraud Detection Solution with Neo4j
How to Build a Fraud Detection Solution with Neo4j
 
Social Network Analysis (SNA)
Social Network Analysis (SNA)Social Network Analysis (SNA)
Social Network Analysis (SNA)
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for Graphs
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Network measures used in social network analysis
Network measures used in social network analysis Network measures used in social network analysis
Network measures used in social network analysis
 
Social Media Mining - Chapter 7 (Information Diffusion)
Social Media Mining - Chapter 7 (Information Diffusion)Social Media Mining - Chapter 7 (Information Diffusion)
Social Media Mining - Chapter 7 (Information Diffusion)
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social media
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph Algorithms
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 

Viewers also liked

LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
LinkedIn - A Professional Network built with Java Technologies and Agile Prac...LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
LinkedIn - A Professional Network built with Java Technologies and Agile Prac...LinkedIn
 
Merry Christmas
Merry ChristmasMerry Christmas
Merry Christmassoniapr30
 
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...sachivchawla
 
陈兵教授《论附佛外道》
陈兵教授《论附佛外道》陈兵教授《论附佛外道》
陈兵教授《论附佛外道》walkmankim
 
Visuell kommunikation - E-business 2.0
Visuell kommunikation - E-business 2.0Visuell kommunikation - E-business 2.0
Visuell kommunikation - E-business 2.0Kajsa Snickars
 
Interpreting Lync Monitoring and Reporting
Interpreting Lync Monitoring and ReportingInterpreting Lync Monitoring and Reporting
Interpreting Lync Monitoring and ReportingBryan Marks
 
Baud rate is the number of change in signal
Baud rate is the number of change in signalBaud rate is the number of change in signal
Baud rate is the number of change in signalAbhishek Pathak
 
Impressionisme informàtica
Impressionisme informàticaImpressionisme informàtica
Impressionisme informàticatorragrau
 
αντιγονη
αντιγονηαντιγονη
αντιγονηekidrou
 
使用 zotero 做文獻管理及引用(1)
使用 zotero 做文獻管理及引用(1)使用 zotero 做文獻管理及引用(1)
使用 zotero 做文獻管理及引用(1)Chengtao Lin
 
Living in the moment
Living in the momentLiving in the moment
Living in the momentwalkmankim
 
圣严法师108语录
圣严法师108语录圣严法师108语录
圣严法师108语录walkmankim
 
ROBOTS POWER POINT
ROBOTS POWER POINTROBOTS POWER POINT
ROBOTS POWER POINTsoniapr30
 
郑水吉《楞严经新表解》
郑水吉《楞严经新表解》郑水吉《楞严经新表解》
郑水吉《楞严经新表解》walkmankim
 
原始佛教基本典籍 中阿含经
原始佛教基本典籍 中阿含经原始佛教基本典籍 中阿含经
原始佛教基本典籍 中阿含经walkmankim
 
Skriva för webben - E-business 2.0
Skriva för webben - E-business 2.0Skriva för webben - E-business 2.0
Skriva för webben - E-business 2.0Kajsa Snickars
 

Viewers also liked (20)

LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
LinkedIn - A Professional Network built with Java Technologies and Agile Prac...LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
 
Merry Christmas
Merry ChristmasMerry Christmas
Merry Christmas
 
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
 
陈兵教授《论附佛外道》
陈兵教授《论附佛外道》陈兵教授《论附佛外道》
陈兵教授《论附佛外道》
 
Visuell kommunikation - E-business 2.0
Visuell kommunikation - E-business 2.0Visuell kommunikation - E-business 2.0
Visuell kommunikation - E-business 2.0
 
James_McLaughlin_Render
James_McLaughlin_RenderJames_McLaughlin_Render
James_McLaughlin_Render
 
Trailer production
Trailer production Trailer production
Trailer production
 
Interpreting Lync Monitoring and Reporting
Interpreting Lync Monitoring and ReportingInterpreting Lync Monitoring and Reporting
Interpreting Lync Monitoring and Reporting
 
Baud rate is the number of change in signal
Baud rate is the number of change in signalBaud rate is the number of change in signal
Baud rate is the number of change in signal
 
Impressionisme informàtica
Impressionisme informàticaImpressionisme informàtica
Impressionisme informàtica
 
αντιγονη
αντιγονηαντιγονη
αντιγονη
 
使用 zotero 做文獻管理及引用(1)
使用 zotero 做文獻管理及引用(1)使用 zotero 做文獻管理及引用(1)
使用 zotero 做文獻管理及引用(1)
 
Living in the moment
Living in the momentLiving in the moment
Living in the moment
 
圣严法师108语录
圣严法师108语录圣严法师108语录
圣严法师108语录
 
ROBOTS POWER POINT
ROBOTS POWER POINTROBOTS POWER POINT
ROBOTS POWER POINT
 
郑水吉《楞严经新表解》
郑水吉《楞严经新表解》郑水吉《楞严经新表解》
郑水吉《楞严经新表解》
 
原始佛教基本典籍 中阿含经
原始佛教基本典籍 中阿含经原始佛教基本典籍 中阿含经
原始佛教基本典籍 中阿含经
 
Skriva för webben - E-business 2.0
Skriva för webben - E-business 2.0Skriva för webben - E-business 2.0
Skriva för webben - E-business 2.0
 
11조
11조11조
11조
 
S'more fun
S'more funS'more fun
S'more fun
 

Similar to Social Network Analysis Introduction including Data Structure Graph overview.

Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Doug Needham
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneDoug Needham
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkAnastasios Theodosiou
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014James Powell
 
Intro to Graph Theory w Neo4J
Intro to Graph Theory w Neo4JIntro to Graph Theory w Neo4J
Intro to Graph Theory w Neo4JRay Lukas
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and GiraphDoug Needham
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Intro to Graph Theory
Intro to Graph TheoryIntro to Graph Theory
Intro to Graph TheoryRay Lukas
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataJames Hendler
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling TechniqueCarmen Sanborn
 
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...Colin Panisset
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-sharestelligence
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data ConferenceDataTactics
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationRich Heimann
 
Network Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersNetwork Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersRenaud Clément
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 

Similar to Social Network Analysis Introduction including Data Structure Graph overview. (20)

Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZone
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache Spark
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
 
Intro to Graph Theory w Neo4J
Intro to Graph Theory w Neo4JIntro to Graph Theory w Neo4J
Intro to Graph Theory w Neo4J
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
ML.pdf
ML.pdfML.pdf
ML.pdf
 
Intro to Graph Theory
Intro to Graph TheoryIntro to Graph Theory
Intro to Graph Theory
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of Metadata
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
 
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data Conference
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics Corporation
 
Network Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersNetwork Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for Beginners
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 

Recently uploaded

Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 

Recently uploaded (20)

Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 

Social Network Analysis Introduction including Data Structure Graph overview.

  • 1. Social Network Analysis An overview Presentation by @dougneedham
  • 2. Introduction  @dougneedham  Data Guy - Started as a DBA in the Marine Corps, evolved to Architect, now Data Scientist.  Oracle, SQL Server, Cassandra, Hadoop, MySQL, Spark.  I have a strong relational/traditional background.  Perpetual Student  Learning new things challenges our assumptions. Forces us to take a new perspective on “old” problems. Eventually maybe even shows us that there is a better way to solve a problem.
  • 3. Why study social networks?  It is cool.  The concepts around Social Network Analysis can be applied to many interesting problems in a variety of business verticals.  The foundation of Social Network Analysis is Graph theory.  Solving Crime  Some examples: Introduction to Graph_Theory
  • 4. What is Social Network Analysis?  “Social network analysis (SNA) is a strategy for investigating social structures through the use of network and graph theories. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties or edges (relationships or interactions) that connect them. Examples of social structures commonly visualized through social network analysis include social media networks, friendship and acquaintance networks, kinship, disease transmission, and sexual relationships. These networks are often visualized through sociograms in which nodes are represented as points and ties are represented as lines.” – Wikipedia  https://en.wikipedia.org/wiki/Social_network_analysis
  • 5. Example From wiki: "Kencf0618FacebookNetwork" by Kencf0618 - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons - https://commons.wikimedia.org/wiki/File: Kencf0618FacebookNetwork.jpg#/medi a/File:Kencf0618FacebookNetwork.jpg
  • 6. A little History  The 7 Bridges of Konisberg  Every tome on Graph theory or Network analysis devotes a small portion of there time to the 7 Bridges of Konisberg.  If I don’t cover this with you, the gods of mathematics will strike me down, and never allow me to do analysis again in the future.
  • 8. The Problem  Folks enjoyed there Sunday afternoon strolls across the bridges, but occasionally people would wonder if one particular route was more efficient than another.  Eventually Leonhard Euler was brought into the debate about the efficiency problem.  Euler used Vertices to represent the land masses and edges (or arcs, at the time) to represent bridges. He realized the odd number of edges per vertex made the problem unsolvable.  Sarada Herke provides for one of the best explanations of the solution Solution to Konisburg  And here is the cool thing about mathematicians. If we tell you something is impossible, we have to tell you why in a way you can understand it. But he also invented the branch of mathematics today we call Graph Theory.  http://en.wikipedia.org/wiki/Leonhard_Euler
  • 9. Why analyze Facebook data?  Facebook is something that most people use.  It is easy to see the relationships and the concepts of the Graph/Network are intuitive to people who are looking at their “own” network.  The main idea is, if you can understand your own friend data, you can learn the concepts quickly, then apply these same concepts to more complicated problems.  We will talk a little about some complicated topics at the end.
  • 10. A few terms  Stand back, we are going to talk about math!  Basically we are talking about a bunch of dots joined together by lines  Vertex – Dot on a graph  Edge – Line connecting the two points  Edge_Label – this is a term I coined originally related to Data Structure Graphs that helps trace a path. If you label your edges, and you have multiple edges with the same label in a Graph you can quite easily identify walks, paths, and cycles through your graph.  Triangle – 3 Vertices, 3 Edges  Square – 4 Vertices, 4 edges  Open Triangle - 3 Vertices, 2 edges /  A lot of things are networks if you look at them the right way.  Mark Newman has done a number of well done presentations, available on Youtube about Network analysis.  https://www.youtube.com/watch?v=lETt7IcDWLI
  • 11. More terms  Transitivity – The friend of my friend is my friend. Really?  Homophily – how things are similar  Directed Graphs – or Digraphs  Contagion – How do things “spread” through a network?  Let’s rearrange things, how does the layout affect understanding?  Order of a graph – number of vertices  Size of the graph – number of edges  This is not just data visualization, it can also be used for prediction. https://www.youtube.com/watch?v=rwA-y-XwjuU
  • 12. Final terms  Centrality – Hub and Authority  This is almost a whole topic by itself, since there are different types of Centrality:  Degree Centrality – Simple, the Vertex with the most degrees is the most central.  Eigenvector Centrality – How important a particular Vertex is to a given network.  PageRank – similar to Eigenvector Centrality, only scaled, and if a given vertex is closely connected to very high PageRank vertex, it is itself given a high PageRank.  Serious nutshell definitions.  Shortest path – How are two vertices connected?  Longest Path – Tracing the flow of an interesting item through a large collection of applications.
  • 13. Why is a path important? More on this later… The Original Joke This is me in different stores
  • 14. The Math doesn’t change.  One thing I like about Graphs –  The Math does not change.  The math behind Graph theory can be a little intense, but it does not change regardless of the scale of the graph.  Once you understand how to “do the math” on a small graph, those same Maths apply to a Graph whether it is a graph of the people in this room, or a graph of the people on this planet.  Now, let me introduce you to a tool that does much of the Mathematics for you…
  • 15. But first, Netvizz…  Netvizz is a tool that extracts data from different sections of the Facebook Platform.  It provides an interface to the Facebook Graph API  https://www.youtube.com/watch?v=3vkKPcN7V7Q  For the version of data we will be looking at, I was able to extract friendship connections. Facebook has since changed their permissions such that you can no longer extract this information.  However, there are some other interesting things you can do with Netvizz.  If you manage a Facebook Group, this might be interesting.  For this particular talk we are going to focus on Gephi interpretation. If we want to have a more in-depth talk on Facebook and the Graph API that Facebook has opened, we can discuss that at another time.  To get this yourself go into Facebook and search for: Netvizz. (You have to authorize it. You can un-authorized it later)  You will have a number of options: group data, page data, page like network, search, and link stats.  Click “group data”  Select a group if you need a sample id use: 39462256584  It runs for a bit, then dumps to a zip file.  Save the file, then extract it.  Open Gephi, and use Gephi to import your GDF file.
  • 16. Gephi http://gephi.github.io/ From the website: “Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.” Java 1.7 required, you may have to set this in Gephi.conf Depending on the size of the network you are studying you may need to increase the memory available to Java in Gephi.conf
  • 18. Gephi – Open GML file
  • 19. Gephi – After opening
  • 24.
  • 25. Metrics  Remember all those numbers we spoke about?  Here are many of them.
  • 28. Here is the layout with the labels as number of connections
  • 32. How do we use this?  Finding bottlenecks.  You have to ignore the fact that everyone on this graph is connected to you for a moment.  How would someone get a message to another given person?  They would have to pass it to someone either they both know, or pass the message to someone who is more likely to be connected to the target of the message.  This was the heart of Milgram’s experiment that gave us the concept of 6 degrees of separation.
  • 33. Other Analysis  What else can be done with Social Network Analysis?  How about risk exposure to banks?  http://www.federalreserve.gov/newsevents/speech/yellen20130104a.htm
  • 34.
  • 35. Application to Business Intelligence  What if the Vertices are not people ?  What if the Edges are not mutual connections?  Jonathan and others over the past few meetings have done a great job at explaining the underpinnings of how a particular BI framework is put together.  Within a Data Architecture there are lots of moving pieces. ETL, FTP, SFTP, Web-Services, External data feeds. Data moving into Data Marts, and Data Warehouses. Data Moving between applications.  Let’s imagine how to visualize this using the information we just gained.
  • 36. Data Structure Graph  A Data Structure Graph is a group of atomic entities that are related to each other, stored in a repository, then moved from one persistence layer to another, rendered as a Graph.  A group of atomic entities.  Related to each other.  Stored in a repository.  Moved from one persistence layer to another.  Rendered as a Graph.
  • 37. Introducing Data Structure Graphs  Data Structure Graph Level 1 (DSG-L1)– This is roughly like an Entity Relationship Diagram (ERD) Tables are Vertices, Foreign Keys are Edges.  Data Structure Graph Level 2 (DSG-L2) – Each Vertex in this graph is an application. Each Edge is data transfer. Roughly equivalent to what we used to call Data Flow diagrams.  Data Structure Graph Dependency (DSG-D) – Each vertex is a job,script, program, or process that is dependent on something happening in sequence before it can do its work.  A DSG-L1 can show you where you are going to have the most interesting query performance of your tables.  A DSG-L2 can show you where the most amount of work is going on in your Enterprise.  A DSG-D can show you the sequence of events that need to take place in order for something to be completed.
  • 38. New Project, Data Table, Import data.
  • 39. Load as “Edges Table” Source, Target (required)
  • 41. After a few calculations and layout runs
  • 42. PageRank – Which application is most important?
  • 43. A few more tweaks
  • 44. Where is that Node with the highest PageRank?
  • 45. Remember paths? The Original Joke This is me in different stores
  • 46. Dijkstra's algorithm  Some of you may have heard of Dijkstra’s algorithm.  It is a method for finding the shortest path between two nodes on a Graph.  This is a great optimization technique, but what if you need to find the longest path?  What “edge_label” has the most influence on my organization?  Iterate through each Edge_Label, create a subgraph that consists of only the nodes this Edge_Label touches, then calculate the diameter of that Graph.  The data point represented by a given Edge_label that has the longest path has the most “value” to your organization.
  • 47. https://dougneedham.shinyapps.io/DataStructureGraph Hard to see, I know, but the top diagram is the “master graph”, the bottom image is a single Edge_Label. You can see how an individual data entity flows through an organization.
  • 48. My book Goes through a number of examples for doing an Graph analysis of a fictional organization.
  • 49. Consider the following:  If you need assistance, send a message to the group, or contact me directly (I am easy to find @dougneedham)  Network/Graph Analysis is cool.  It can show you some interesting things about your data that you may not have considered.  Due thought should be put towards a network analysis project.  Organizing the data requires a bit of thought. (From -> To vertices is just a start).  Directed graph, undirected, bigraph? Setup work needs to be done.  Tools help with the detailed calculations, and show the paths, walks, etc.
  • 50. What did I leave out?  Graphs that change over time – What happens when you remove a single Edge or Vertex?  Growth of a Network – Erdos-Renyi versus Barabasi-Albert models (Random versus Preferential Attachment)  Scale Free networks – Graphs that conform to Power laws. (These are intrinsically Social Networks, but I didn’t give much detail)  Comparing two networks – If you have the same number of edges and nodes, are two graphs the same? Is one graph an isomorphism of another?  Contagion – Ceteris paribus how will things(information, virus’s, data,disease…) spread through the network. (Since a DSG represents different types of Edges based on Edge_Label, Contagion should not affect this type of network entirely.)  Large Graphs – GraphX a part of Apache Spark is best used for this purpose.  The strength of Weak Ties Paradox  Social Capital
  • 51. Finally… Want to do Data Science?  Challenge for members of the audience.  1. Download Gephi.  2. Put together a simple CSV: Source, Target,Edge_Label that describes your own data environment.  3. Load it in Gephi and have Gephi run the metrics, and perform the auto layout.  4. Answer this question: Did you get what you expected?  5. Get a colleague to do the same thing, compare the images. How similar are they?  Here is my hypothesis: If you have more than 5 data applications, including Hadoop, and Data Warehouse infrastructure, your Graph will follow the rules of preferential attachment. (To<->From ETL tools don’t count in the analysis)  Tweet me @dougneedham #DataStructureGraph (anonymized, of course.)  What does your Graph look like?
  • 52. Final Thoughts – Questions?