SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Large Scale Social Networks
     Analysis – LS SNA
   Rui Sarmento           João Gama
           Tiago Cunha           Albert Bifet

            LIAAD/INESC TEC
         FEP - University of Porto

              April 13, 2013
Outline – LS SNA                                            2/19
1.   Motivation
2.   Software Tools
     –   State of the art – Recent Evolution
     –   PEGASUS
     –   Graphlab
     –   Snap (Stanford Network Analysis Platform)
     –   Other Tools
3.   Case Study
     –   Network of companies and financial organizations
     –   Some Numbers
     –   Algorithms and Used tools
     –   Processing Time
4.   Summary & Conclusions
1.Motivation – LS SNA                3/19
Generic Problem:
  Nowadays, the huge amounts of data
  available pose problems for analysis with
  regular hardware and/or software.
Example Facts:
  “We have produced more data in the last two
  years than in all of prior history so we are
  witnessing a Big Bang of Data” – Tim
  McGuire, Mckinsey
1.Motivation – LS SNA                4/19
Solution:
  Emerging technologies, like modern models
  for parallel computing, multicore computers
  or even clusters of computers, can be very
  useful for analyzing massive network data.
1.Motivation – LS SNA                                          5/19
Particular case Study:
  CrunchBase database (accessed May 2012)
• Network A of companies and financial organizations/funds, e.g:

                         Y                            X


           »   Company Y has connection to investment fund X
• Network B of persons and companies e.g.:

                         A                            Y

           »   Person A has connection to company Y
1.Motivation – LS SNA                     6/19
What can we do?
 - we want to analyze entities behavior in terms of
   relationships, or other influences.
- we want to determine some characteristic of the
   network from the point of view of the self-
   centered and the network as a whole.
What is the problem?
- Takes too much time (many hours or even days)
   to do it with normal software like Gephi or R even
   with a good PC
2. Software Tools – LS SNA 7/19
• State of the art – Recent Evolution
2001 – Boost Graph Library (C++)
2005 – Parallel BGL (C++), Hadoop (Java)
2007 – Development of Graphlab Starts
2008 – SNAP Small-world Network Analysis and
  Partitioning (C, openMP)
  .
  .
2013 – Several Graph Frameworks using Hadoop
  and/or HDFS
2. Software Tools – LS SNA 8/19
• PEGASUS
  – Computation framework written in JAVA
  – Is an open-source, graph-mining system with
    massive scalability
  – Dependent of Hadoop
  – Graph Oriented Tool
2. Software Tools – LS SNA 9/19
• Graphlab API
  – Computation framework written in C++
  – Computation in GraphLab is applied to dependent
    records which are stored as vertices in a large
    distributed data-graph
  – Computation in GraphLab is expressed as vertex-
    programs which are executed in parallel on each
    vertex and can interact with neighboring vertices.
  – GraphLab programs interact by directly reading the
    state of neighboring vertices and by modifying the
    state of adjacent edges.
  – HDFS Integration: Access your data directly from HDFS
2. Software Tools – LS SNA 10/19
• Snap (Stanford Network Analysis Platform)
  – Not Parallel however…
  – SNAP library is written in C++ and optimized for
    maximum performance and compact graph
    representation
  – It easily scales to massive networks with hundreds
    of millions of nodes, and billions of edges
  – …although some algorithms in Snap might be slow
    due to complexity
2. Software Tools – LS SNA 11/19
• Other Tools (Resuming)
  – Several more tools available:
     • Giraph – graph oriented
     • Rhadoop (Package for R and Hadoop) – generic tool


  => All previous tools dependant of Hadoop which
    seems to be more and more commonly adopted
2. Software Tools – LS SNA 12/19
Software           Pegasus          Graphlab                Snap
Algorithms
available from
                     Degree           approximate             Cascades
software install     PageRank         diameter                Centrality
(graph analysis)     Random Walk      kcore                   Cliques
                     with Restart     pagerank                Community
                     (RWR)            connected               Concomp
                     Radius           component               Forestfire
                     Connected        simple coloring         Graphgen
                     Components       directed triangle
                                      count                   Graphhash
                                      format convert          Kcores
                                      sssp                    Kronem
                                      undirected triangle     Krongen
                                      count                   Kronfit
                                                              Maggen
                                                              Magfit
                                                              Motifs
                                                              Ncpplot
                                                              Netevol
                                                              Netinf
                                                              Netstat
                                                              Mkdatasets
                                                              infopath
3. Case Study – LS SNA                           13/19
   => Some Numbers
• Network of companies and financial organizations/funds
     1. Number of firms: 88,269
     2. Number of investment funds: 7697
• Network of persons and companies
     1. Number of persons: 118,394
3. Case Study – LS SNA                        14/19
 => Algorithms and Used tools
     – Node Degree with PEGASUS
     – Friends of Friends with Hadoop Map-Reduce
     – Centrality Measures with Snap (Stanford Network
        Analysis Platform)
     – Triangles Counting with Graphlab
3. Case Study – LS SNA   15/19
 => Processing Time
4. Summary & Conclusions LS SNA
                             16/19
• Summary & Conclusions
  – This paper resumes which tools to look for when
    dealing with big graphs studies.
  – We are witnesses of a big proliferation of software
    tools aimed at the analysis of big scale graphs.
  – What was once a problem to deal with these
    networks is solved with the right tools
References I – LS SNA                                    17/19
• APACHE. 2012. Apache Giraph [Online]. The Apache Software Foundation.
  Available: http://incubator.apache.org/giraph/.
• GRAPHLAB. Graphlab The Abstraction [Online]. Available:
  http://graphlab.org/home/abstraction/ 2012].
• GRAPHLAB. 2012. Graph Analytics Toolkit [Online]. Available:
  http://graphlab.org/toolkits/graph-analytics/ 2012].
• HOLMES, A. 2012. Hadoop In Practice, Manning.
• LESKOVEC, J. Stanford Network Analysis Platform [Online]. Available:
  http://snap.stanford.edu/snap/ [Accessed 12-2012 2012].
• MAZZA, G. 2012. FrontPage - Hadoop Wiki [Online]. Available:
  http://wiki.apache.org/lucene-hadoop/ [Accessed 11-2012.
• THANEDAR, V. 2012. API Documentation [Online]. Available:
  http://developer.crunchbase.com/docs [Accessed 04-2012 2012].
References II – LS SNA                                 18/19
• UNIVERSITY, C. M. 2012. Project Pegasus [Online]. Available:
  http://www.cs.cmu.edu/~pegasus/ 2012].
• WASHINGTON, U. O. What is Hadoop? [Online]. Available:
  http://escience.washington.edu/get-help-now/what-hadoop [Accessed
  05-03-2013 2013].
• OWENS, J. R. 2013. Hadoop Real-World Solutions Cookbook. PACKT
  Publishing.
• HOLMES, A. 2012. Hadoop In Practice, Manning.
• McGuire, T. Big Data Better Decisions [Online]. Available:
  http://www.slideshare.net/McK_CMSOForum/big-data-and-advanced-
  analytics [Accessed 05-03-2013 2013].
END – LS SNA          19/19



         Thank You!
         Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisYuanyuan Tian
 
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKBig Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKMatt Stubbs
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and VerilogGanesan Narayanasamy
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With PythonSarah Guido
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoopdbpublications
 
Delivering Application-Layer​ Traffic Optimization​ (ALTO) Services based on ...
Delivering Application-Layer​ Traffic Optimization​ (ALTO) Services based on ...Delivering Application-Layer​ Traffic Optimization​ (ALTO) Services based on ...
Delivering Application-Layer​ Traffic Optimization​ (ALTO) Services based on ...Danny Alex Lachos Perez
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Rusif Eyvazli
 
Data science in ruby is it possible? is it fast? should we use it?
Data science in ruby is it possible? is it fast? should we use it?Data science in ruby is it possible? is it fast? should we use it?
Data science in ruby is it possible? is it fast? should we use it?Rodrigo Urubatan
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniquejournalBEEI
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis PatternsMikio L. Braun
 
Graph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierGraph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierDemai Ni
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaNithin Kakkireni
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkeldariof
 
A Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionA Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionRakebul Hasan
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 

Was ist angesagt? (20)

Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKBig Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
Harvard poster
Harvard posterHarvard poster
Harvard poster
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
Delivering Application-Layer​ Traffic Optimization​ (ALTO) Services based on ...
Delivering Application-Layer​ Traffic Optimization​ (ALTO) Services based on ...Delivering Application-Layer​ Traffic Optimization​ (ALTO) Services based on ...
Delivering Application-Layer​ Traffic Optimization​ (ALTO) Services based on ...
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
Data science in ruby is it possible? is it fast? should we use it?
Data science in ruby is it possible? is it fast? should we use it?Data science in ruby is it possible? is it fast? should we use it?
Data science in ruby is it possible? is it fast? should we use it?
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce Technique
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
Graph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierGraph Data: a New Data Management Frontier
Graph Data: a New Data Management Frontier
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce framework
 
A Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionA Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance Prediction
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 

Ähnlich wie Large scale social networks analysis joclad 2013

BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONScscpconf
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions csandit
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...dbpublications
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...Geoffrey Fox
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark Summit
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationGeorge Long
 
Analyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache SparkAnalyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache SparkNicola Ferraro
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Ahmed Kamal
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureGabriele Modena
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?IJCSIS Research Publications
 
Big Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingBig Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingAnimesh Chaturvedi
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analyticsinoshg
 
Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Anirudh Gangwar
 

Ähnlich wie Large scale social networks analysis joclad 2013 (20)

BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & Manipulation
 
Analyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache SparkAnalyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache Spark
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
 
Big Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingBig Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computing
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.
 

Kürzlich hochgeladen

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Large scale social networks analysis joclad 2013

  • 1. Large Scale Social Networks Analysis – LS SNA Rui Sarmento João Gama Tiago Cunha Albert Bifet LIAAD/INESC TEC FEP - University of Porto April 13, 2013
  • 2. Outline – LS SNA 2/19 1. Motivation 2. Software Tools – State of the art – Recent Evolution – PEGASUS – Graphlab – Snap (Stanford Network Analysis Platform) – Other Tools 3. Case Study – Network of companies and financial organizations – Some Numbers – Algorithms and Used tools – Processing Time 4. Summary & Conclusions
  • 3. 1.Motivation – LS SNA 3/19 Generic Problem: Nowadays, the huge amounts of data available pose problems for analysis with regular hardware and/or software. Example Facts: “We have produced more data in the last two years than in all of prior history so we are witnessing a Big Bang of Data” – Tim McGuire, Mckinsey
  • 4. 1.Motivation – LS SNA 4/19 Solution: Emerging technologies, like modern models for parallel computing, multicore computers or even clusters of computers, can be very useful for analyzing massive network data.
  • 5. 1.Motivation – LS SNA 5/19 Particular case Study: CrunchBase database (accessed May 2012) • Network A of companies and financial organizations/funds, e.g: Y X » Company Y has connection to investment fund X • Network B of persons and companies e.g.: A Y » Person A has connection to company Y
  • 6. 1.Motivation – LS SNA 6/19 What can we do? - we want to analyze entities behavior in terms of relationships, or other influences. - we want to determine some characteristic of the network from the point of view of the self- centered and the network as a whole. What is the problem? - Takes too much time (many hours or even days) to do it with normal software like Gephi or R even with a good PC
  • 7. 2. Software Tools – LS SNA 7/19 • State of the art – Recent Evolution 2001 – Boost Graph Library (C++) 2005 – Parallel BGL (C++), Hadoop (Java) 2007 – Development of Graphlab Starts 2008 – SNAP Small-world Network Analysis and Partitioning (C, openMP) . . 2013 – Several Graph Frameworks using Hadoop and/or HDFS
  • 8. 2. Software Tools – LS SNA 8/19 • PEGASUS – Computation framework written in JAVA – Is an open-source, graph-mining system with massive scalability – Dependent of Hadoop – Graph Oriented Tool
  • 9. 2. Software Tools – LS SNA 9/19 • Graphlab API – Computation framework written in C++ – Computation in GraphLab is applied to dependent records which are stored as vertices in a large distributed data-graph – Computation in GraphLab is expressed as vertex- programs which are executed in parallel on each vertex and can interact with neighboring vertices. – GraphLab programs interact by directly reading the state of neighboring vertices and by modifying the state of adjacent edges. – HDFS Integration: Access your data directly from HDFS
  • 10. 2. Software Tools – LS SNA 10/19 • Snap (Stanford Network Analysis Platform) – Not Parallel however… – SNAP library is written in C++ and optimized for maximum performance and compact graph representation – It easily scales to massive networks with hundreds of millions of nodes, and billions of edges – …although some algorithms in Snap might be slow due to complexity
  • 11. 2. Software Tools – LS SNA 11/19 • Other Tools (Resuming) – Several more tools available: • Giraph – graph oriented • Rhadoop (Package for R and Hadoop) – generic tool => All previous tools dependant of Hadoop which seems to be more and more commonly adopted
  • 12. 2. Software Tools – LS SNA 12/19 Software Pegasus Graphlab Snap Algorithms available from Degree approximate Cascades software install PageRank diameter Centrality (graph analysis) Random Walk kcore Cliques with Restart pagerank Community (RWR) connected Concomp Radius component Forestfire Connected simple coloring Graphgen Components directed triangle count Graphhash format convert Kcores sssp Kronem undirected triangle Krongen count Kronfit Maggen Magfit Motifs Ncpplot Netevol Netinf Netstat Mkdatasets infopath
  • 13. 3. Case Study – LS SNA 13/19 => Some Numbers • Network of companies and financial organizations/funds 1. Number of firms: 88,269 2. Number of investment funds: 7697 • Network of persons and companies 1. Number of persons: 118,394
  • 14. 3. Case Study – LS SNA 14/19 => Algorithms and Used tools – Node Degree with PEGASUS – Friends of Friends with Hadoop Map-Reduce – Centrality Measures with Snap (Stanford Network Analysis Platform) – Triangles Counting with Graphlab
  • 15. 3. Case Study – LS SNA 15/19 => Processing Time
  • 16. 4. Summary & Conclusions LS SNA 16/19 • Summary & Conclusions – This paper resumes which tools to look for when dealing with big graphs studies. – We are witnesses of a big proliferation of software tools aimed at the analysis of big scale graphs. – What was once a problem to deal with these networks is solved with the right tools
  • 17. References I – LS SNA 17/19 • APACHE. 2012. Apache Giraph [Online]. The Apache Software Foundation. Available: http://incubator.apache.org/giraph/. • GRAPHLAB. Graphlab The Abstraction [Online]. Available: http://graphlab.org/home/abstraction/ 2012]. • GRAPHLAB. 2012. Graph Analytics Toolkit [Online]. Available: http://graphlab.org/toolkits/graph-analytics/ 2012]. • HOLMES, A. 2012. Hadoop In Practice, Manning. • LESKOVEC, J. Stanford Network Analysis Platform [Online]. Available: http://snap.stanford.edu/snap/ [Accessed 12-2012 2012]. • MAZZA, G. 2012. FrontPage - Hadoop Wiki [Online]. Available: http://wiki.apache.org/lucene-hadoop/ [Accessed 11-2012. • THANEDAR, V. 2012. API Documentation [Online]. Available: http://developer.crunchbase.com/docs [Accessed 04-2012 2012].
  • 18. References II – LS SNA 18/19 • UNIVERSITY, C. M. 2012. Project Pegasus [Online]. Available: http://www.cs.cmu.edu/~pegasus/ 2012]. • WASHINGTON, U. O. What is Hadoop? [Online]. Available: http://escience.washington.edu/get-help-now/what-hadoop [Accessed 05-03-2013 2013]. • OWENS, J. R. 2013. Hadoop Real-World Solutions Cookbook. PACKT Publishing. • HOLMES, A. 2012. Hadoop In Practice, Manning. • McGuire, T. Big Data Better Decisions [Online]. Available: http://www.slideshare.net/McK_CMSOForum/big-data-and-advanced- analytics [Accessed 05-03-2013 2013].
  • 19. END – LS SNA 19/19 Thank You! Questions?