SlideShare a Scribd company logo
1 of 30
Download to read offline
M AT H I E U B A S T I A N

D ATA V I S U A L I Z AT I O N S U M M I T,
                                              1
SAN FRANCISCO, APRIL 11-12, 2013
BIG GRAPH DATA
    •  The story of big graph data is just starting
    •  BIG GRAPH DATA




DATA VISUALIZATION SUMMIT                             2
                                                          2
BIG GRAPH DATA
    •  The story of big graph data is just starting
    •  BIG GRAPH DATA


      BIG DATA    GRAPHS




DATA VISUALIZATION SUMMIT                             3
                                                          3
BIG GRAPH DATA
    •  The story of big graph data is just starting
    •  BIG GRAPH DATA


      BIG DATA      GRAPHS


                                                       DISTRIBUTED SYSTEMS
                  COMPLEX
  STORAGE
                                                    DATABASES
               INDEXATION
                                 LARGE DATASETS                 ALGORITHM

             CLOUD COMPUTING
                                                  HADOOP
                               ANALYTICS
 REAL-TIME                                                        VISUALIZATION


DATA VISUALIZATION SUMMIT                                                         4
                                                                                      4
BIG GRAPH DATA
    •  The story of big graph data is just starting
    •  BIG GRAPH DATA


      BIG DATA      GRAPHS


                                                       DISTRIBUTED SYSTEMS
                  COMPLEX
  STORAGE
                                                    DATABASES
               INDEXATION
                                 LARGE DATASETS                 ALGORITHM

             CLOUD COMPUTING
                                                  HADOOP
                               ANALYTICS
 REAL-TIME                                                        VISUALIZATION


DATA VISUALIZATION SUMMIT                                                         5
                                                                                      5
BIG DATA
    •  “The Petabyte age”
    •  All industries and domains can leverage big data




           Health      Government      Finance       Technology

    •  Big Data => Big Problems
    •  Focusing on building the technology to handle big data, and big
       graph data (ex: graph databases)
    •  Seeking efficient analysis of ever more complex systems



DATA VISUALIZATION SUMMIT                                                6
                                                                             6
GRAPHS
    •  Graphs are everywhere, and it’s easy to collect graph data
    •  The world is more complex and interconnected that we thought




        Source: Collective Dynamics of Small-World Networks, D Watts, S Strogatz, Nature 393, 440-442


DATA VISUALIZATION SUMMIT                                                                               7
                                                                                                            7
NETWORK SCIENCE
    •  The study of graphs has been exploding in the last 15 years
    •  Networks have properties and patterns one can study
      •  Robustness – How a network is resistant to random attacks?
      •  Contagion – How fast a disease or gossip spread in a network?
      •  Communities – How many communities exist in a network?
      •  Centrality – Who is the most central individual in a network?
    •  If you read one of these books, you understand Network Science




DATA VISUALIZATION SUMMIT                                                8
                                                                             8
GRAPHS HELP SOLVE PROBLEMS
    •  Saddam Hussein Network (2003)




           The Universe

                                 C. Wilson. Searching for Saddam: a five-part series on how the US military
                                 used social networking to capture the Iraqi dictator. 2010. www.slate.com/
                                 id/2245228/.



DATA VISUALIZATION SUMMIT                                                                             9
                                                                                                          9
GRAPHS HELP SOLVE PROBLEMS
    •  Predicting and controlling infectious disease




                                       Naoki Masuda, Petter Holme - Predicting and controlling infectious disease
            The Universe               epidemics using temporal networks.
                                       http://f1000.com/prime/reports/b/5/6/

                                       Haraldsdottir S, Gupta S, Anderson RM: Preliminary studies of sexual
                                       networks in a male homosexual community in Iceland. J Acquir Immune
                                       Defic Syndr. 1992, 5:374–81.




DATA VISUALIZATION SUMMIT                                                                               10 1
                                                                                                           0
GRAPHS HELP SOLVE PROBLEMS
    •  Recommendation systems




             The Universe


     Credit: http://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/


DATA VISUALIZATION SUMMIT                                                                   11 1
                                                                                              1
GRAPHS HELP SOLVE PROBLEMS
    •  Recipe recommendation using ingredient networks




             The Universe


     Credit: http://www.ladamic.com/wordpress/?p=294


                                                         1
DATA VISUALIZATION SUMMIT                                21
                                                         2
GRAPHS HELP SOLVE PROBLEMS
    •  Power grid




              The Universe


     Credit: http://www.npr.org/templates/story/story.php?storyId=110997398


DATA VISUALIZATION SUMMIT                                                     13 1
                                                                                3
SMALL GRAPHS
    •  Famous “Zachary’s Karate Club” study in 1977 only involved 34
       nodes.
    •  It could be drawn by hand on paper




              The Universe

       Zachary’s Karate Club (1977)   W. W. Zachary, An information flow model for conflict and fission in small
                                      groups, Journal of Anthropological Research 33, 452-473 (1977).



DATA VISUALIZATION SUMMIT                                                                                14 1
                                                                                                             4
MEDIUM GRAPHS
    •  Your own Facebook or LinkedIn social network
    •  The Harlem Shake: Anatomy of a Viral Meme




             The Universe

       Gilad Lotan. http://www.huffingtonpost.com/gilad-lotan/the-harlem-shake_b_2804799.html




DATA VISUALIZATION SUMMIT                                                                       15 1
                                                                                                  5
LARGE GRAPHS
    •  The Internet Map (~350 000 domains)
    •  DBPedia (~290M relationships)
    •  Friendster Social Network dataset* (1.8B edges)




              The Universe

       Internet Map (http://internet-map.net)
                                                  * http://snap.stanford.edu/data/index.html



DATA VISUALIZATION SUMMIT                                                              16 1
                                                                                         6
IMPLICIT GRAPHS
    •  Graphs can be explicit or implicit
      •  Explicit: The network exists in nature (Social Network, Food Webs,
         Airlines Network)
      •  Implicit: The network is derived from other data (Word networks, co-
         authorship)


    •  Example of an implicit graph:
        •  A set of documents have a set of tags
        •  One can create a link when two tags are on the same document
        •  Aggregate all links across all documents




DATA VISUALIZATION SUMMIT                                                       17 1
                                                                                  7
SIMILARITY GRAPHS
    •  Graphs of all the co-occurrences between LinkedIn Skills (2011)




DATA VISUALIZATION SUMMIT                                                18 1
                                                                           8
VISUALIZATION
    •  Visualization and statistics are the two basic toolkits one can use
       on graphs
    •  Complex questions are asked when studying graphs


    •  Easy
      •  Min, max, average, quartiles          Excel can do this!
      •  Exact queries, search


    •  Harder
      •  Patterns, trends, correlations
      •  Changes over time, context
      •  Anomalies, data errors                Visualization can do this!
      •  Geographical representation



DATA VISUALIZATION SUMMIT                                                    19 1
                                                                               9
GRAPH VISUALIZATION
    •  Due to the size of graphs and the complexity of questions,
       visualization is the natural tool to understand what’s going on

                “ We are more easily persuaded by the reasons we
                ourselves discover than by those which are given to us by
                others.” Blaise Pascal
                       Let me play with the data!




 Direct manipulation



DATA VISUALIZATION SUMMIT                                                   20 2
                                                                              0
DATA EXPLORATION AND INTERACTION
    •  Use visualization and statistics to discover new hypothesis
      •  Exploratory data analysis
        “The greatest value of a picture is when it forces us
        to notice what we never expected to see.”

        John Tukey

    •  The user interface is centered around the human
    •  Empowers the user to understand the structure and patterns in
       the data
    •  The machine augments the human
    •  How?
      •  Overview and details, zoom and pan interface
      •  Interactive, direct-manipulation


DATA VISUALIZATION SUMMIT                                              21 2
                                                                         1
MAP YOUR DATA
    •  Iterative process to transform relational data into a map




    •  Use color, size and position to highlight, group and set up a
       hierarchy




DATA VISUALIZATION SUMMIT                                              22 2
                                                                         2
FROM INFORMATION TO KNOWLEDGE
    •  Exploring networks interactively & iterating often provide
       “Eureka” moments for domain experts




                                                           Eureka




DATA VISUALIZATION SUMMIT                                           23 2
                                                                      3
BIG GRAPH DATA
    •  Big graph data doesn’t necessarily mean you’re visualizing or
       analyzing a large graph
    •  Small graphs can be extracted from large graphs and analyzed
    •  Small graphs can be extracted from non-graph data as well
    •  Graphs are just nodes and relationships after all


    •  Example: Adverse Drug Event Analysis with Hadoop, R, and Gephi
       (Josh Wills, Cloudera, 2012)




DATA VISUALIZATION SUMMIT                                               24 2
                                                                          4
GEPHI
    •  Built to solve large graph visualization problems.
    •  Open source tool for Windows, Mac OS X and Linux
    •  Large international community involved
    •  The latest version has been downloaded > 100,000 times
    •  Extensible with plug-ins
    •  Available at http://gephi.org




DATA VISUALIZATION SUMMIT                                       25 2
                                                                  5
GEPHI
              DATA EDITION


      VISUAL
     MAPPING                                  FILTER


                             VISUALIZATION   STATISTICS




     LAYOUT
                              TIMELINE

DATA VISUALIZATION SUMMIT                                 26 2
                                                            6
SIGMA.JS
    •  Open-source lightweight JavaScript library to draw graphs
    •  Uses HTML5 Canvas
    •  Display dynamically graphs that can be generated on the fly
    •  Available at http://sigmajs.org




                                                   Sigma.js v0.1


DATA VISUALIZATION SUMMIT                                            27 2
                                                                       7
SUMMARY
    •  Big graph data = Relational Big Data
    •  Graphs are everywhere!
    •  Graphs have fascinating structure and patterns one can analyze
    •  Visualization is a natural tool for such complex data and complex
       questions
    •  On graphs, visualization done right allows interaction and
       iteration. Play.
    •  The hard part is to extract a small or medium graph from big data
    •  Open source tools like Gephi or Sigma.js are a good start




DATA VISUALIZATION SUMMIT                                                  28 2
                                                                             8
Become a graph evangelist!




                    QUESTIONS?

                   Mathieu Bastian (@mathieubastian)



DATA VISUALIZATION SUMMIT                              29 2
                                                         9
REFERENCES & LINKS
    Join the Social Network Analysis class by Lada Adamic on Coursera        Sigma.js, Alexis Jacomy and al.
    https://www.coursera.org/course/sna                                      http://sigmajs.org

    Support the Gephi Consortium                                             Linked: How Everything Is Connected to Everything Else and What It
    http://consortium.gephi.org                                              Means, Albert-Laszlo Barabasi
                                                                             http://www.amazon.com/gp/product/0452284392/
    Computational Information Design, Ben Fry (2004)
    http://benfry.com/phd/                                                   Six Degrees: The Science of a Connected Age, Duncan J. Watts
                                                                             http://www.amazon.com/gp/product/0393325423/
    The Atlas of Economic Complexity, Harvard's Center for International
    Development (CID) and the MIT Media Lab                                  Nexus: Small Worlds and the Groundbreaking Science of Networks,
    http://atlas.media.mit.edu/                                              Mark Buchanan
                                                                             http://www.amazon.com/gp/product/0393324427
    The Mesh of Civilizations and International Email Flows, Bogdan State,
    Patrick Park, Ingmar Weber, Yelena Mejova, Michael Macy                  Connected: The Surprising Power of Our Social Networks and How They
    http://arxiv.org/abs/1303.0045                                           Shape Our Lives, Nicholas A. Christakis and James H. Fowler
                                                                             http://www.amazon.com/dp/product/0316036137
    The Human Disease Network, Goh K-I, Cusick ME, Valle D, Childs B,        Atelier Iceberg – Gephi
    Vidal M, Barabási A-L (2007)                                             http://www.slideshare.net/ateliericeberg/gephi-17680699
    http://www.pnas.org/content/104/21/8685.full
                                                                             Adding Value through graph analysis using Titan and Faunus, Matthias
    What does your intranet look like?                                       Broecheler
    http://intranetdiary.blogspot.co.uk/2012/11/network-visualisation.html   http://www.slideshare.net/knowfrominfo/titan-talk-ebaymarch2013

    Recipe recommendation using ingredient networks, Chun-Yuen Teng, Yu-     Network Maps Board on Pinterest, Mathieu Bastian
    Ru Lin, Lada A. Adamic                                                   http://pinterest.com/mathieubastian/network-maps/
    http://arxiv.org/abs/1111.3919
                                                                             Network Science Book, Albert-László Barabási
    US Presidents Inaugural Speeches 1969-2013 Text Network Analysis         http://barabasilab.neu.edu/networksciencebook
    http://noduslabs.com/cases/presidents-inaugural-speeches-text-
    network-analysis/                                                        Adverse Drug Event Analysis with Hadoop, R, and Gephi, Cloudera
                                                                             https://github.com/cloudera/ades
    10 Reasons Why We Visualise Data
    http://www.slideshare.net/Facegroup/10-reasons-why-we-visualise-data




DATA VISUALIZATION SUMMIT                                                                                                                           30 3
                                                                                                                                                      0

More Related Content

Similar to Visualize Big Graph Data

STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsJason Riedy
 
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsSIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsJason Riedy
 
Inhibitors to Information Sharing
Inhibitors to Information SharingInhibitors to Information Sharing
Inhibitors to Information SharingWalter Kitchenman
 
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs Jason Riedy
 
Geohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial DataGeohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial DataDataCards
 
Heatmaps are the Heat
Heatmaps are the HeatHeatmaps are the Heat
Heatmaps are the HeatAbe Usher
 
TNGIC 2011 Keynote Managing Mountains of Data
TNGIC 2011 Keynote Managing Mountains of DataTNGIC 2011 Keynote Managing Mountains of Data
TNGIC 2011 Keynote Managing Mountains of DataZsoltNC
 
Graph visualization options and latest developments
Graph visualization options and latest developmentsGraph visualization options and latest developments
Graph visualization options and latest developmentsLinkurious
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesEunjeong (Lucy) Park
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionFabio Stella
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013CS, NcState
 
THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013
THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013
THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013Gigaom
 
Friend Gastein 2012-10-04
Friend Gastein 2012-10-04Friend Gastein 2012-10-04
Friend Gastein 2012-10-04Sage Base
 
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLESANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLESNexgen Technology
 
Global bigdata conf_01282013
Global bigdata conf_01282013Global bigdata conf_01282013
Global bigdata conf_01282013HPCC Systems
 

Similar to Visualize Big Graph Data (20)

Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
 
Big Data
Big Data Big Data
Big Data
 
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive GraphsSIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
SIAM Annual Meeting 2012: Streaming Graph Analytics for Massive Graphs
 
Inhibitors to Information Sharing
Inhibitors to Information SharingInhibitors to Information Sharing
Inhibitors to Information Sharing
 
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs
SIAM PP 2012: Scalable Algorithms for Analysis of Massive, Streaming Graphs
 
Geohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial DataGeohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial Data
 
Heatmaps are the Heat
Heatmaps are the HeatHeatmaps are the Heat
Heatmaps are the Heat
 
FR.pptx
FR.pptxFR.pptx
FR.pptx
 
TNGIC 2011 Keynote Managing Mountains of Data
TNGIC 2011 Keynote Managing Mountains of DataTNGIC 2011 Keynote Managing Mountains of Data
TNGIC 2011 Keynote Managing Mountains of Data
 
Big data 101
Big data 101Big data 101
Big data 101
 
Graph visualization options and latest developments
Graph visualization options and latest developmentsGraph visualization options and latest developments
Graph visualization options and latest developments
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013
THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013
THE CIA’S “GRAND CHALLENGES” WITH BIG DATA from Structure:Data 2013
 
Friend Gastein 2012-10-04
Friend Gastein 2012-10-04Friend Gastein 2012-10-04
Friend Gastein 2012-10-04
 
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLESANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
 
Global bigdata conf_01282013
Global bigdata conf_01282013Global bigdata conf_01282013
Global bigdata conf_01282013
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Visualize Big Graph Data

  • 1. M AT H I E U B A S T I A N D ATA V I S U A L I Z AT I O N S U M M I T, 1 SAN FRANCISCO, APRIL 11-12, 2013
  • 2. BIG GRAPH DATA •  The story of big graph data is just starting •  BIG GRAPH DATA DATA VISUALIZATION SUMMIT 2 2
  • 3. BIG GRAPH DATA •  The story of big graph data is just starting •  BIG GRAPH DATA BIG DATA GRAPHS DATA VISUALIZATION SUMMIT 3 3
  • 4. BIG GRAPH DATA •  The story of big graph data is just starting •  BIG GRAPH DATA BIG DATA GRAPHS DISTRIBUTED SYSTEMS COMPLEX STORAGE DATABASES INDEXATION LARGE DATASETS ALGORITHM CLOUD COMPUTING HADOOP ANALYTICS REAL-TIME VISUALIZATION DATA VISUALIZATION SUMMIT 4 4
  • 5. BIG GRAPH DATA •  The story of big graph data is just starting •  BIG GRAPH DATA BIG DATA GRAPHS DISTRIBUTED SYSTEMS COMPLEX STORAGE DATABASES INDEXATION LARGE DATASETS ALGORITHM CLOUD COMPUTING HADOOP ANALYTICS REAL-TIME VISUALIZATION DATA VISUALIZATION SUMMIT 5 5
  • 6. BIG DATA •  “The Petabyte age” •  All industries and domains can leverage big data Health Government Finance Technology •  Big Data => Big Problems •  Focusing on building the technology to handle big data, and big graph data (ex: graph databases) •  Seeking efficient analysis of ever more complex systems DATA VISUALIZATION SUMMIT 6 6
  • 7. GRAPHS •  Graphs are everywhere, and it’s easy to collect graph data •  The world is more complex and interconnected that we thought Source: Collective Dynamics of Small-World Networks, D Watts, S Strogatz, Nature 393, 440-442 DATA VISUALIZATION SUMMIT 7 7
  • 8. NETWORK SCIENCE •  The study of graphs has been exploding in the last 15 years •  Networks have properties and patterns one can study •  Robustness – How a network is resistant to random attacks? •  Contagion – How fast a disease or gossip spread in a network? •  Communities – How many communities exist in a network? •  Centrality – Who is the most central individual in a network? •  If you read one of these books, you understand Network Science DATA VISUALIZATION SUMMIT 8 8
  • 9. GRAPHS HELP SOLVE PROBLEMS •  Saddam Hussein Network (2003) The Universe C. Wilson. Searching for Saddam: a five-part series on how the US military used social networking to capture the Iraqi dictator. 2010. www.slate.com/ id/2245228/. DATA VISUALIZATION SUMMIT 9 9
  • 10. GRAPHS HELP SOLVE PROBLEMS •  Predicting and controlling infectious disease Naoki Masuda, Petter Holme - Predicting and controlling infectious disease The Universe epidemics using temporal networks. http://f1000.com/prime/reports/b/5/6/ Haraldsdottir S, Gupta S, Anderson RM: Preliminary studies of sexual networks in a male homosexual community in Iceland. J Acquir Immune Defic Syndr. 1992, 5:374–81. DATA VISUALIZATION SUMMIT 10 1 0
  • 11. GRAPHS HELP SOLVE PROBLEMS •  Recommendation systems The Universe Credit: http://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/ DATA VISUALIZATION SUMMIT 11 1 1
  • 12. GRAPHS HELP SOLVE PROBLEMS •  Recipe recommendation using ingredient networks The Universe Credit: http://www.ladamic.com/wordpress/?p=294 1 DATA VISUALIZATION SUMMIT 21 2
  • 13. GRAPHS HELP SOLVE PROBLEMS •  Power grid The Universe Credit: http://www.npr.org/templates/story/story.php?storyId=110997398 DATA VISUALIZATION SUMMIT 13 1 3
  • 14. SMALL GRAPHS •  Famous “Zachary’s Karate Club” study in 1977 only involved 34 nodes. •  It could be drawn by hand on paper The Universe Zachary’s Karate Club (1977) W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33, 452-473 (1977). DATA VISUALIZATION SUMMIT 14 1 4
  • 15. MEDIUM GRAPHS •  Your own Facebook or LinkedIn social network •  The Harlem Shake: Anatomy of a Viral Meme The Universe Gilad Lotan. http://www.huffingtonpost.com/gilad-lotan/the-harlem-shake_b_2804799.html DATA VISUALIZATION SUMMIT 15 1 5
  • 16. LARGE GRAPHS •  The Internet Map (~350 000 domains) •  DBPedia (~290M relationships) •  Friendster Social Network dataset* (1.8B edges) The Universe Internet Map (http://internet-map.net) * http://snap.stanford.edu/data/index.html DATA VISUALIZATION SUMMIT 16 1 6
  • 17. IMPLICIT GRAPHS •  Graphs can be explicit or implicit •  Explicit: The network exists in nature (Social Network, Food Webs, Airlines Network) •  Implicit: The network is derived from other data (Word networks, co- authorship) •  Example of an implicit graph: •  A set of documents have a set of tags •  One can create a link when two tags are on the same document •  Aggregate all links across all documents DATA VISUALIZATION SUMMIT 17 1 7
  • 18. SIMILARITY GRAPHS •  Graphs of all the co-occurrences between LinkedIn Skills (2011) DATA VISUALIZATION SUMMIT 18 1 8
  • 19. VISUALIZATION •  Visualization and statistics are the two basic toolkits one can use on graphs •  Complex questions are asked when studying graphs •  Easy •  Min, max, average, quartiles Excel can do this! •  Exact queries, search •  Harder •  Patterns, trends, correlations •  Changes over time, context •  Anomalies, data errors Visualization can do this! •  Geographical representation DATA VISUALIZATION SUMMIT 19 1 9
  • 20. GRAPH VISUALIZATION •  Due to the size of graphs and the complexity of questions, visualization is the natural tool to understand what’s going on “ We are more easily persuaded by the reasons we ourselves discover than by those which are given to us by others.” Blaise Pascal Let me play with the data! Direct manipulation DATA VISUALIZATION SUMMIT 20 2 0
  • 21. DATA EXPLORATION AND INTERACTION •  Use visualization and statistics to discover new hypothesis •  Exploratory data analysis “The greatest value of a picture is when it forces us to notice what we never expected to see.” John Tukey •  The user interface is centered around the human •  Empowers the user to understand the structure and patterns in the data •  The machine augments the human •  How? •  Overview and details, zoom and pan interface •  Interactive, direct-manipulation DATA VISUALIZATION SUMMIT 21 2 1
  • 22. MAP YOUR DATA •  Iterative process to transform relational data into a map •  Use color, size and position to highlight, group and set up a hierarchy DATA VISUALIZATION SUMMIT 22 2 2
  • 23. FROM INFORMATION TO KNOWLEDGE •  Exploring networks interactively & iterating often provide “Eureka” moments for domain experts Eureka DATA VISUALIZATION SUMMIT 23 2 3
  • 24. BIG GRAPH DATA •  Big graph data doesn’t necessarily mean you’re visualizing or analyzing a large graph •  Small graphs can be extracted from large graphs and analyzed •  Small graphs can be extracted from non-graph data as well •  Graphs are just nodes and relationships after all •  Example: Adverse Drug Event Analysis with Hadoop, R, and Gephi (Josh Wills, Cloudera, 2012) DATA VISUALIZATION SUMMIT 24 2 4
  • 25. GEPHI •  Built to solve large graph visualization problems. •  Open source tool for Windows, Mac OS X and Linux •  Large international community involved •  The latest version has been downloaded > 100,000 times •  Extensible with plug-ins •  Available at http://gephi.org DATA VISUALIZATION SUMMIT 25 2 5
  • 26. GEPHI DATA EDITION VISUAL MAPPING FILTER VISUALIZATION STATISTICS LAYOUT TIMELINE DATA VISUALIZATION SUMMIT 26 2 6
  • 27. SIGMA.JS •  Open-source lightweight JavaScript library to draw graphs •  Uses HTML5 Canvas •  Display dynamically graphs that can be generated on the fly •  Available at http://sigmajs.org Sigma.js v0.1 DATA VISUALIZATION SUMMIT 27 2 7
  • 28. SUMMARY •  Big graph data = Relational Big Data •  Graphs are everywhere! •  Graphs have fascinating structure and patterns one can analyze •  Visualization is a natural tool for such complex data and complex questions •  On graphs, visualization done right allows interaction and iteration. Play. •  The hard part is to extract a small or medium graph from big data •  Open source tools like Gephi or Sigma.js are a good start DATA VISUALIZATION SUMMIT 28 2 8
  • 29. Become a graph evangelist! QUESTIONS? Mathieu Bastian (@mathieubastian) DATA VISUALIZATION SUMMIT 29 2 9
  • 30. REFERENCES & LINKS Join the Social Network Analysis class by Lada Adamic on Coursera Sigma.js, Alexis Jacomy and al. https://www.coursera.org/course/sna http://sigmajs.org Support the Gephi Consortium Linked: How Everything Is Connected to Everything Else and What It http://consortium.gephi.org Means, Albert-Laszlo Barabasi http://www.amazon.com/gp/product/0452284392/ Computational Information Design, Ben Fry (2004) http://benfry.com/phd/ Six Degrees: The Science of a Connected Age, Duncan J. Watts http://www.amazon.com/gp/product/0393325423/ The Atlas of Economic Complexity, Harvard's Center for International Development (CID) and the MIT Media Lab Nexus: Small Worlds and the Groundbreaking Science of Networks, http://atlas.media.mit.edu/ Mark Buchanan http://www.amazon.com/gp/product/0393324427 The Mesh of Civilizations and International Email Flows, Bogdan State, Patrick Park, Ingmar Weber, Yelena Mejova, Michael Macy Connected: The Surprising Power of Our Social Networks and How They http://arxiv.org/abs/1303.0045 Shape Our Lives, Nicholas A. Christakis and James H. Fowler http://www.amazon.com/dp/product/0316036137 The Human Disease Network, Goh K-I, Cusick ME, Valle D, Childs B, Atelier Iceberg – Gephi Vidal M, Barabási A-L (2007) http://www.slideshare.net/ateliericeberg/gephi-17680699 http://www.pnas.org/content/104/21/8685.full Adding Value through graph analysis using Titan and Faunus, Matthias What does your intranet look like? Broecheler http://intranetdiary.blogspot.co.uk/2012/11/network-visualisation.html http://www.slideshare.net/knowfrominfo/titan-talk-ebaymarch2013 Recipe recommendation using ingredient networks, Chun-Yuen Teng, Yu- Network Maps Board on Pinterest, Mathieu Bastian Ru Lin, Lada A. Adamic http://pinterest.com/mathieubastian/network-maps/ http://arxiv.org/abs/1111.3919 Network Science Book, Albert-László Barabási US Presidents Inaugural Speeches 1969-2013 Text Network Analysis http://barabasilab.neu.edu/networksciencebook http://noduslabs.com/cases/presidents-inaugural-speeches-text- network-analysis/ Adverse Drug Event Analysis with Hadoop, R, and Gephi, Cloudera https://github.com/cloudera/ades 10 Reasons Why We Visualise Data http://www.slideshare.net/Facegroup/10-reasons-why-we-visualise-data DATA VISUALIZATION SUMMIT 30 3 0