1. M AT H I E U B A S T I A N
D ATA V I S U A L I Z AT I O N S U M M I T,
1
SAN FRANCISCO, APRIL 11-12, 2013
2. BIG GRAPH DATA
• The story of big graph data is just starting
• BIG GRAPH DATA
DATA VISUALIZATION SUMMIT 2
2
3. BIG GRAPH DATA
• The story of big graph data is just starting
• BIG GRAPH DATA
BIG DATA GRAPHS
DATA VISUALIZATION SUMMIT 3
3
4. BIG GRAPH DATA
• The story of big graph data is just starting
• BIG GRAPH DATA
BIG DATA GRAPHS
DISTRIBUTED SYSTEMS
COMPLEX
STORAGE
DATABASES
INDEXATION
LARGE DATASETS ALGORITHM
CLOUD COMPUTING
HADOOP
ANALYTICS
REAL-TIME VISUALIZATION
DATA VISUALIZATION SUMMIT 4
4
5. BIG GRAPH DATA
• The story of big graph data is just starting
• BIG GRAPH DATA
BIG DATA GRAPHS
DISTRIBUTED SYSTEMS
COMPLEX
STORAGE
DATABASES
INDEXATION
LARGE DATASETS ALGORITHM
CLOUD COMPUTING
HADOOP
ANALYTICS
REAL-TIME VISUALIZATION
DATA VISUALIZATION SUMMIT 5
5
6. BIG DATA
• “The Petabyte age”
• All industries and domains can leverage big data
Health Government Finance Technology
• Big Data => Big Problems
• Focusing on building the technology to handle big data, and big
graph data (ex: graph databases)
• Seeking efficient analysis of ever more complex systems
DATA VISUALIZATION SUMMIT 6
6
7. GRAPHS
• Graphs are everywhere, and it’s easy to collect graph data
• The world is more complex and interconnected that we thought
Source: Collective Dynamics of Small-World Networks, D Watts, S Strogatz, Nature 393, 440-442
DATA VISUALIZATION SUMMIT 7
7
8. NETWORK SCIENCE
• The study of graphs has been exploding in the last 15 years
• Networks have properties and patterns one can study
• Robustness – How a network is resistant to random attacks?
• Contagion – How fast a disease or gossip spread in a network?
• Communities – How many communities exist in a network?
• Centrality – Who is the most central individual in a network?
• If you read one of these books, you understand Network Science
DATA VISUALIZATION SUMMIT 8
8
9. GRAPHS HELP SOLVE PROBLEMS
• Saddam Hussein Network (2003)
The Universe
C. Wilson. Searching for Saddam: a five-part series on how the US military
used social networking to capture the Iraqi dictator. 2010. www.slate.com/
id/2245228/.
DATA VISUALIZATION SUMMIT 9
9
10. GRAPHS HELP SOLVE PROBLEMS
• Predicting and controlling infectious disease
Naoki Masuda, Petter Holme - Predicting and controlling infectious disease
The Universe epidemics using temporal networks.
http://f1000.com/prime/reports/b/5/6/
Haraldsdottir S, Gupta S, Anderson RM: Preliminary studies of sexual
networks in a male homosexual community in Iceland. J Acquir Immune
Defic Syndr. 1992, 5:374–81.
DATA VISUALIZATION SUMMIT 10 1
0
11. GRAPHS HELP SOLVE PROBLEMS
• Recommendation systems
The Universe
Credit: http://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/
DATA VISUALIZATION SUMMIT 11 1
1
12. GRAPHS HELP SOLVE PROBLEMS
• Recipe recommendation using ingredient networks
The Universe
Credit: http://www.ladamic.com/wordpress/?p=294
1
DATA VISUALIZATION SUMMIT 21
2
13. GRAPHS HELP SOLVE PROBLEMS
• Power grid
The Universe
Credit: http://www.npr.org/templates/story/story.php?storyId=110997398
DATA VISUALIZATION SUMMIT 13 1
3
14. SMALL GRAPHS
• Famous “Zachary’s Karate Club” study in 1977 only involved 34
nodes.
• It could be drawn by hand on paper
The Universe
Zachary’s Karate Club (1977) W. W. Zachary, An information flow model for conflict and fission in small
groups, Journal of Anthropological Research 33, 452-473 (1977).
DATA VISUALIZATION SUMMIT 14 1
4
15. MEDIUM GRAPHS
• Your own Facebook or LinkedIn social network
• The Harlem Shake: Anatomy of a Viral Meme
The Universe
Gilad Lotan. http://www.huffingtonpost.com/gilad-lotan/the-harlem-shake_b_2804799.html
DATA VISUALIZATION SUMMIT 15 1
5
16. LARGE GRAPHS
• The Internet Map (~350 000 domains)
• DBPedia (~290M relationships)
• Friendster Social Network dataset* (1.8B edges)
The Universe
Internet Map (http://internet-map.net)
* http://snap.stanford.edu/data/index.html
DATA VISUALIZATION SUMMIT 16 1
6
17. IMPLICIT GRAPHS
• Graphs can be explicit or implicit
• Explicit: The network exists in nature (Social Network, Food Webs,
Airlines Network)
• Implicit: The network is derived from other data (Word networks, co-
authorship)
• Example of an implicit graph:
• A set of documents have a set of tags
• One can create a link when two tags are on the same document
• Aggregate all links across all documents
DATA VISUALIZATION SUMMIT 17 1
7
18. SIMILARITY GRAPHS
• Graphs of all the co-occurrences between LinkedIn Skills (2011)
DATA VISUALIZATION SUMMIT 18 1
8
19. VISUALIZATION
• Visualization and statistics are the two basic toolkits one can use
on graphs
• Complex questions are asked when studying graphs
• Easy
• Min, max, average, quartiles Excel can do this!
• Exact queries, search
• Harder
• Patterns, trends, correlations
• Changes over time, context
• Anomalies, data errors Visualization can do this!
• Geographical representation
DATA VISUALIZATION SUMMIT 19 1
9
20. GRAPH VISUALIZATION
• Due to the size of graphs and the complexity of questions,
visualization is the natural tool to understand what’s going on
“ We are more easily persuaded by the reasons we
ourselves discover than by those which are given to us by
others.” Blaise Pascal
Let me play with the data!
Direct manipulation
DATA VISUALIZATION SUMMIT 20 2
0
21. DATA EXPLORATION AND INTERACTION
• Use visualization and statistics to discover new hypothesis
• Exploratory data analysis
“The greatest value of a picture is when it forces us
to notice what we never expected to see.”
John Tukey
• The user interface is centered around the human
• Empowers the user to understand the structure and patterns in
the data
• The machine augments the human
• How?
• Overview and details, zoom and pan interface
• Interactive, direct-manipulation
DATA VISUALIZATION SUMMIT 21 2
1
22. MAP YOUR DATA
• Iterative process to transform relational data into a map
• Use color, size and position to highlight, group and set up a
hierarchy
DATA VISUALIZATION SUMMIT 22 2
2
23. FROM INFORMATION TO KNOWLEDGE
• Exploring networks interactively & iterating often provide
“Eureka” moments for domain experts
Eureka
DATA VISUALIZATION SUMMIT 23 2
3
24. BIG GRAPH DATA
• Big graph data doesn’t necessarily mean you’re visualizing or
analyzing a large graph
• Small graphs can be extracted from large graphs and analyzed
• Small graphs can be extracted from non-graph data as well
• Graphs are just nodes and relationships after all
• Example: Adverse Drug Event Analysis with Hadoop, R, and Gephi
(Josh Wills, Cloudera, 2012)
DATA VISUALIZATION SUMMIT 24 2
4
25. GEPHI
• Built to solve large graph visualization problems.
• Open source tool for Windows, Mac OS X and Linux
• Large international community involved
• The latest version has been downloaded > 100,000 times
• Extensible with plug-ins
• Available at http://gephi.org
DATA VISUALIZATION SUMMIT 25 2
5
26. GEPHI
DATA EDITION
VISUAL
MAPPING FILTER
VISUALIZATION STATISTICS
LAYOUT
TIMELINE
DATA VISUALIZATION SUMMIT 26 2
6
27. SIGMA.JS
• Open-source lightweight JavaScript library to draw graphs
• Uses HTML5 Canvas
• Display dynamically graphs that can be generated on the fly
• Available at http://sigmajs.org
Sigma.js v0.1
DATA VISUALIZATION SUMMIT 27 2
7
28. SUMMARY
• Big graph data = Relational Big Data
• Graphs are everywhere!
• Graphs have fascinating structure and patterns one can analyze
• Visualization is a natural tool for such complex data and complex
questions
• On graphs, visualization done right allows interaction and
iteration. Play.
• The hard part is to extract a small or medium graph from big data
• Open source tools like Gephi or Sigma.js are a good start
DATA VISUALIZATION SUMMIT 28 2
8
29. Become a graph evangelist!
QUESTIONS?
Mathieu Bastian (@mathieubastian)
DATA VISUALIZATION SUMMIT 29 2
9
30. REFERENCES & LINKS
Join the Social Network Analysis class by Lada Adamic on Coursera Sigma.js, Alexis Jacomy and al.
https://www.coursera.org/course/sna http://sigmajs.org
Support the Gephi Consortium Linked: How Everything Is Connected to Everything Else and What It
http://consortium.gephi.org Means, Albert-Laszlo Barabasi
http://www.amazon.com/gp/product/0452284392/
Computational Information Design, Ben Fry (2004)
http://benfry.com/phd/ Six Degrees: The Science of a Connected Age, Duncan J. Watts
http://www.amazon.com/gp/product/0393325423/
The Atlas of Economic Complexity, Harvard's Center for International
Development (CID) and the MIT Media Lab Nexus: Small Worlds and the Groundbreaking Science of Networks,
http://atlas.media.mit.edu/ Mark Buchanan
http://www.amazon.com/gp/product/0393324427
The Mesh of Civilizations and International Email Flows, Bogdan State,
Patrick Park, Ingmar Weber, Yelena Mejova, Michael Macy Connected: The Surprising Power of Our Social Networks and How They
http://arxiv.org/abs/1303.0045 Shape Our Lives, Nicholas A. Christakis and James H. Fowler
http://www.amazon.com/dp/product/0316036137
The Human Disease Network, Goh K-I, Cusick ME, Valle D, Childs B, Atelier Iceberg – Gephi
Vidal M, Barabási A-L (2007) http://www.slideshare.net/ateliericeberg/gephi-17680699
http://www.pnas.org/content/104/21/8685.full
Adding Value through graph analysis using Titan and Faunus, Matthias
What does your intranet look like? Broecheler
http://intranetdiary.blogspot.co.uk/2012/11/network-visualisation.html http://www.slideshare.net/knowfrominfo/titan-talk-ebaymarch2013
Recipe recommendation using ingredient networks, Chun-Yuen Teng, Yu- Network Maps Board on Pinterest, Mathieu Bastian
Ru Lin, Lada A. Adamic http://pinterest.com/mathieubastian/network-maps/
http://arxiv.org/abs/1111.3919
Network Science Book, Albert-László Barabási
US Presidents Inaugural Speeches 1969-2013 Text Network Analysis http://barabasilab.neu.edu/networksciencebook
http://noduslabs.com/cases/presidents-inaugural-speeches-text-
network-analysis/ Adverse Drug Event Analysis with Hadoop, R, and Gephi, Cloudera
https://github.com/cloudera/ades
10 Reasons Why We Visualise Data
http://www.slideshare.net/Facegroup/10-reasons-why-we-visualise-data
DATA VISUALIZATION SUMMIT 30 3
0