SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Graph databases, the Web of Data
        storage engines


        Pere Urbón Bayes
          Senior Software Engineer
                Independent

             purbon@purbon.com
                                     purbon.com

                                     in/purbon
          February of 2010
                                     @purbon
Graph databases, the Web of Data
        storage engines
●   We are going to talk about
       –   Graph databases, facts and definitions.
       –   Graph database vendors.
       –   Use cases and applications, graph theory.




                    The web of data storage engines - DataDevRoom - Fosdem 2011   2
Graph databases, the Web of Data
        storage engines

“A graph database is a database that uses graph
 structures with nodes, edges, and properties to
         represent and store information.

  General graph databases that can store any
   graph are distinct from specialized graph
  databases such as triple stores and network
                  databases.”
                                                                             Wikipedia


               The web of data storage engines - DataDevRoom - Fosdem 2011           3
Graph Database
                          Property graph

●   Abstractions
          –   Nodes
          –   Relationships
          –   Properties on both.
    John smith liked http://www.example.com at 01/10/11




                         The web of data storage engines - DataDevRoom - Fosdem 2011   4
Graph databases
                                         Facts

Connectivity
                                                                        Everything
                                                                        connected


                                            RDF       Ontologies
                                                          Linked Data
                                         Tagging
                                 Blogs                Folksonomies
                                     Social Networks


               Text files



                        1990's                  2010's                2020's              Decades

                            The web of data storage engines - DataDevRoom - Fosdem 2011             5
Graph databases
                                       Facts

Size of




                        1990's                2010's                2020's              Decades
     http://www.guardian.co.uk/business/2009/may/18/digital-content-expansion
                          The web of data storage engines - DataDevRoom - Fosdem 2011             6
Graph databases
                                 Facts

Performance

          Lists


                                              Graph like structures
                                                  Semantic web
                                                Semantic reasoning
                                                   Linked data


                                         Performance slowdown



                                                                          Unstructured

                    The web of data storage engines - DataDevRoom - Fosdem 2011      7
Graph databases
                      Performance comparison

                                           Query             RDBMS              OIM      GraphDB
                                    Q1: count             20.38             17.35        0
                                    Q2: projection        17.34             43.7         33.19
                                    Q3: scan              32.76             174.64       3.14
                                    Q4: values            12.28             20.77        0.01
                                    Q5: select            7.34              5.43         0.84
                                    Q6: hubs              >3hours           >3hours      624.68



            RDBMS        OIM        GraphDB
  data     27.36 GB    54 GB        9.69 GB
overhead   10.9        21.51        3.86
  load     52891 s     17543 s      95579 s


                           The web of data storage engines - DataDevRoom - Fosdem 2011            8
Graph databases
                             Vendors

●   Neo4J (neo4j.org)

●   Embedded, disk-based, fully transactional
    Java persistence engine that stores data
    structured in graphs rather than in tables.
●   Dual-Licensed AGPL and Commercial.
●   High Availability, scalability, concurrent,etc.

                   The web of data storage engines - DataDevRoom - Fosdem 2011   9
Graph databases
                              Vendors

●   InfiniteGraph



●   A java distributed, scalable, with high
    performance results commercial graph
    database, provided with the experience of
    Objectivity Inc.
●   More info: http://www.infinitegraph.com/
                    The web of data storage engines - DataDevRoom - Fosdem 2011   10
Graph databases
                           Vendors

●   OrientDB

●   An embedded pure java fast, transactional,
    scalable document-graph storage engine.
●   Schema free, ACID, suport for SQL and JSON.
●   Apache License 2.0
●   More info: http://www.orientechnologies.com/

                 The web of data storage engines - DataDevRoom - Fosdem 2011   11
Graph databases
                     More Vendors

●   Dex: The high performance graph database.
●   HyperGraphDB: An IA and semantic web graph
    database.
●   Infogrid: The Internet graph database.
●   Sones: SaaS dot Net graph database.
●   AllegroGraph: The semantic graph database.
●   VertexDB: High performance database server.

                  The web of data storage engines - DataDevRoom - Fosdem 2011   12
Graph Theory
                            analytics

●   Clustering                            ●   Task planning
    (Communities)                         ●   Scheduling
●   Social connexions                     ●   Process assignation
●   Hubs                                  ●   Routing
●   Graph Mining                          ●   Logistics
●   Centrality measures                   ●   League planning


                   The web of data storage engines - DataDevRoom - Fosdem 2011   13
Graph Theory
                         Applications

●   Pattern Recognition
●   Dependency analysis
●   Impact analysis
●   Network flow
        –   Traffic analysis and optimization
        –   Delivery optimization
●   Optimization of tasks

                     The web of data storage engines - DataDevRoom - Fosdem 2011   14
Graph Like
                          Applications

●   Recommendations
       –   Heuristics (PageRank)
       –   Local
               ●   Shortest Paths
               ●   Hammock Functions
               ●   Walks
               ●   Search algorithms
               ●   Shooting stars
               ●   K-nearest neighbours

                      The web of data storage engines - DataDevRoom - Fosdem 2011   15
Graph Like
                      Applications




●   Location based services
●   Hubs
●   Spatial databases
●   Logical (multi-)index construction


                  The web of data storage engines - DataDevRoom - Fosdem 2011   16
Web
                       Trending Topics

●   Semantic web
        –   RDF (OWL) Store
        –   RDF-Sail
        –   SPARQL
●   Linked data (Open Data)
●   Link analysis
●   Structure mining

                       The web of data storage engines - DataDevRoom - Fosdem 2011   17
Graph databases
                                   Performance
                   HPC Scalable Graph Analysis Benchmark IWGD 2010

Kernel      DEX         Neo4j        Jena        HyperGraphDB
Scale 15
Load(s)     7,44        697          141         +24h
Scan (s)    0,0010      2,71         0,689
2-Hops(s)   0,0120      0,0260       0,443
BC (s)      14,8        8,24         138
Size (MB)   30          17           207

                                  Kernel            DEX             Neo4j         Jena       HyperGraph
                                  Scale 20                                                   DB
                                  Load(s)           317             32.094        4.560      +24h
                                  Scan (s)          0,005           751           18,6
                                  2-Hops(s)         0,033           0,0230        0,4580
                                  BC (s)            617             7027          59512
                                  Size (MB)         893             539           6656

                               The web of data storage engines - DataDevRoom - Fosdem 2011          18
Graph databases
        XI FOSDEM Dinner

Interested in Graph Databases and NoSQL,
        attending this year FOSDEM.

              Meeting point:
                 20:00 PM
      In front of Le Roy d'Espagne
              Grand Place 1
                  Brussels

           The web of data storage engines - DataDevRoom - Fosdem 2011   19
Graph databases
        Moviepilot is hiring


Interested in movies, data analytics, ruby, git,
            opensource. Join us!.

Moviepilot is a leading provider and discovery
 services for movies and TV series, based in
                     Berlin.

  Interested, talk with @jannis or @purbon
              The web of data storage engines - DataDevRoom - Fosdem 2011   20
Graph databases, the Web of Data
        storage engines

             Questions?

        Pere Urbón Bayes
          Senior Software Engineer
                Independent

                purbon@purbon.com


          February of 2010

          The web of data storage engines - DataDevRoom - Fosdem 2011   21

Weitere ähnliche Inhalte

Was ist angesagt?

Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for GraphsJean Ihm
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseMo Patel
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
Gain Insights with Graph Analytics
Gain Insights with Graph Analytics Gain Insights with Graph Analytics
Gain Insights with Graph Analytics Jean Ihm
 
袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战hdhappy001
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systemshdhappy001
 
Giving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS CommunityGiving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS CommunityMongoDB
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLSpark Summit
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBCarol McDonald
 
How Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolutionHow Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolutionLuca Garulli
 
When Graphs Meet Machine Learning
When Graphs Meet Machine LearningWhen Graphs Meet Machine Learning
When Graphs Meet Machine LearningJean Ihm
 
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...NAVER D2
 
GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016Joshua Bae
 
Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo MondrianSimone Campora
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 

Was ist angesagt? (20)

Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Gain Insights with Graph Analytics
Gain Insights with Graph Analytics Gain Insights with Graph Analytics
Gain Insights with Graph Analytics
 
袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems
 
Giving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS CommunityGiving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS Community
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
 
How Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolutionHow Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolution
 
When Graphs Meet Machine Learning
When Graphs Meet Machine LearningWhen Graphs Meet Machine Learning
When Graphs Meet Machine Learning
 
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
 
GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016GDB in SV_1st_meetup_09082016
GDB in SV_1st_meetup_09082016
 
Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo Mondrian
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 

Andere mochten auch

Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataMarko Rodriguez
 
Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Dániel Stein
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
GraphDevRoom Call for Sponsors
GraphDevRoom Call for SponsorsGraphDevRoom Call for Sponsors
GraphDevRoom Call for SponsorsPere Urbón-Bayes
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...Hortonworks
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambariHortonworks
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks
 
21 Hidden LinkedIn Hacks Revealed
21 Hidden LinkedIn Hacks Revealed21 Hidden LinkedIn Hacks Revealed
21 Hidden LinkedIn Hacks RevealedEmma Brudner
 
15 Tips for Compelling Company Updates on LinkedIn
15 Tips for Compelling Company Updates on LinkedIn15 Tips for Compelling Company Updates on LinkedIn
15 Tips for Compelling Company Updates on LinkedInLinkedIn
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 

Andere mochten auch (14)

Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
 
Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Cooking Software101
Cooking Software101Cooking Software101
Cooking Software101
 
GraphDevRoom Call for Sponsors
GraphDevRoom Call for SponsorsGraphDevRoom Call for Sponsors
GraphDevRoom Call for Sponsors
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambari
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
How to Create a Twitter Cover Photo in PowerPoint [Tutorial]
How to Create a Twitter Cover Photo in PowerPoint [Tutorial]How to Create a Twitter Cover Photo in PowerPoint [Tutorial]
How to Create a Twitter Cover Photo in PowerPoint [Tutorial]
 
5 Things You Should Be Doing on LinkedIn
5 Things You Should Be Doing on LinkedIn5 Things You Should Be Doing on LinkedIn
5 Things You Should Be Doing on LinkedIn
 
21 Hidden LinkedIn Hacks Revealed
21 Hidden LinkedIn Hacks Revealed21 Hidden LinkedIn Hacks Revealed
21 Hidden LinkedIn Hacks Revealed
 
15 Tips for Compelling Company Updates on LinkedIn
15 Tips for Compelling Company Updates on LinkedIn15 Tips for Compelling Company Updates on LinkedIn
15 Tips for Compelling Company Updates on LinkedIn
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Ähnlich wie Graph Databases, The Web of Data Storage Engines

20130204 graph to-pacer-xml
20130204 graph to-pacer-xml20130204 graph to-pacer-xml
20130204 graph to-pacer-xmlDavid Colebatch
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratchVinayak Hegde
 
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableMyth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableDenodo
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentationMat Wall
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementRENDER project
 
Data Mining
Data MiningData Mining
Data Miningswami920
 
No SQL at The Guardian
No SQL at The GuardianNo SQL at The Guardian
No SQL at The GuardianMat Wall
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 
using Spring and MongoDB on Cloud Foundry
using Spring and MongoDB on Cloud Foundryusing Spring and MongoDB on Cloud Foundry
using Spring and MongoDB on Cloud FoundryJoshua Long
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Toolsboorad
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Jcon2020 keynote-high-performance-java-cloud-native
Jcon2020 keynote-high-performance-java-cloud-nativeJcon2020 keynote-high-performance-java-cloud-native
Jcon2020 keynote-high-performance-java-cloud-nativeMarkus Kett
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architecturesRaji Gogulapati
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceMongoDB
 
Research ON Big Data
Research ON Big DataResearch ON Big Data
Research ON Big Datamysqlops
 

Ähnlich wie Graph Databases, The Web of Data Storage Engines (20)

20130204 graph to-pacer-xml
20130204 graph to-pacer-xml20130204 graph to-pacer-xml
20130204 graph to-pacer-xml
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratch
 
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableMyth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentation
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data Management
 
Data Mining
Data MiningData Mining
Data Mining
 
No SQL at The Guardian
No SQL at The GuardianNo SQL at The Guardian
No SQL at The Guardian
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
using Spring and MongoDB on Cloud Foundry
using Spring and MongoDB on Cloud Foundryusing Spring and MongoDB on Cloud Foundry
using Spring and MongoDB on Cloud Foundry
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Tools
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Jcon2020 keynote-high-performance-java-cloud-native
Jcon2020 keynote-high-performance-java-cloud-nativeJcon2020 keynote-high-performance-java-cloud-native
Jcon2020 keynote-high-performance-java-cloud-native
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architectures
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
Research ON Big Data
Research ON Big DataResearch ON Big Data
Research ON Big Data
 

Graph Databases, The Web of Data Storage Engines

  • 1. Graph databases, the Web of Data storage engines Pere Urbón Bayes Senior Software Engineer Independent purbon@purbon.com purbon.com in/purbon February of 2010 @purbon
  • 2. Graph databases, the Web of Data storage engines ● We are going to talk about – Graph databases, facts and definitions. – Graph database vendors. – Use cases and applications, graph theory. The web of data storage engines - DataDevRoom - Fosdem 2011 2
  • 3. Graph databases, the Web of Data storage engines “A graph database is a database that uses graph structures with nodes, edges, and properties to represent and store information. General graph databases that can store any graph are distinct from specialized graph databases such as triple stores and network databases.” Wikipedia The web of data storage engines - DataDevRoom - Fosdem 2011 3
  • 4. Graph Database Property graph ● Abstractions – Nodes – Relationships – Properties on both. John smith liked http://www.example.com at 01/10/11 The web of data storage engines - DataDevRoom - Fosdem 2011 4
  • 5. Graph databases Facts Connectivity Everything connected RDF Ontologies Linked Data Tagging Blogs Folksonomies Social Networks Text files 1990's 2010's 2020's Decades The web of data storage engines - DataDevRoom - Fosdem 2011 5
  • 6. Graph databases Facts Size of 1990's 2010's 2020's Decades http://www.guardian.co.uk/business/2009/may/18/digital-content-expansion The web of data storage engines - DataDevRoom - Fosdem 2011 6
  • 7. Graph databases Facts Performance Lists Graph like structures Semantic web Semantic reasoning Linked data Performance slowdown Unstructured The web of data storage engines - DataDevRoom - Fosdem 2011 7
  • 8. Graph databases Performance comparison Query RDBMS OIM GraphDB Q1: count 20.38 17.35 0 Q2: projection 17.34 43.7 33.19 Q3: scan 32.76 174.64 3.14 Q4: values 12.28 20.77 0.01 Q5: select 7.34 5.43 0.84 Q6: hubs >3hours >3hours 624.68 RDBMS OIM GraphDB data 27.36 GB 54 GB 9.69 GB overhead 10.9 21.51 3.86 load 52891 s 17543 s 95579 s The web of data storage engines - DataDevRoom - Fosdem 2011 8
  • 9. Graph databases Vendors ● Neo4J (neo4j.org) ● Embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables. ● Dual-Licensed AGPL and Commercial. ● High Availability, scalability, concurrent,etc. The web of data storage engines - DataDevRoom - Fosdem 2011 9
  • 10. Graph databases Vendors ● InfiniteGraph ● A java distributed, scalable, with high performance results commercial graph database, provided with the experience of Objectivity Inc. ● More info: http://www.infinitegraph.com/ The web of data storage engines - DataDevRoom - Fosdem 2011 10
  • 11. Graph databases Vendors ● OrientDB ● An embedded pure java fast, transactional, scalable document-graph storage engine. ● Schema free, ACID, suport for SQL and JSON. ● Apache License 2.0 ● More info: http://www.orientechnologies.com/ The web of data storage engines - DataDevRoom - Fosdem 2011 11
  • 12. Graph databases More Vendors ● Dex: The high performance graph database. ● HyperGraphDB: An IA and semantic web graph database. ● Infogrid: The Internet graph database. ● Sones: SaaS dot Net graph database. ● AllegroGraph: The semantic graph database. ● VertexDB: High performance database server. The web of data storage engines - DataDevRoom - Fosdem 2011 12
  • 13. Graph Theory analytics ● Clustering ● Task planning (Communities) ● Scheduling ● Social connexions ● Process assignation ● Hubs ● Routing ● Graph Mining ● Logistics ● Centrality measures ● League planning The web of data storage engines - DataDevRoom - Fosdem 2011 13
  • 14. Graph Theory Applications ● Pattern Recognition ● Dependency analysis ● Impact analysis ● Network flow – Traffic analysis and optimization – Delivery optimization ● Optimization of tasks The web of data storage engines - DataDevRoom - Fosdem 2011 14
  • 15. Graph Like Applications ● Recommendations – Heuristics (PageRank) – Local ● Shortest Paths ● Hammock Functions ● Walks ● Search algorithms ● Shooting stars ● K-nearest neighbours The web of data storage engines - DataDevRoom - Fosdem 2011 15
  • 16. Graph Like Applications ● Location based services ● Hubs ● Spatial databases ● Logical (multi-)index construction The web of data storage engines - DataDevRoom - Fosdem 2011 16
  • 17. Web Trending Topics ● Semantic web – RDF (OWL) Store – RDF-Sail – SPARQL ● Linked data (Open Data) ● Link analysis ● Structure mining The web of data storage engines - DataDevRoom - Fosdem 2011 17
  • 18. Graph databases Performance HPC Scalable Graph Analysis Benchmark IWGD 2010 Kernel DEX Neo4j Jena HyperGraphDB Scale 15 Load(s) 7,44 697 141 +24h Scan (s) 0,0010 2,71 0,689 2-Hops(s) 0,0120 0,0260 0,443 BC (s) 14,8 8,24 138 Size (MB) 30 17 207 Kernel DEX Neo4j Jena HyperGraph Scale 20 DB Load(s) 317 32.094 4.560 +24h Scan (s) 0,005 751 18,6 2-Hops(s) 0,033 0,0230 0,4580 BC (s) 617 7027 59512 Size (MB) 893 539 6656 The web of data storage engines - DataDevRoom - Fosdem 2011 18
  • 19. Graph databases XI FOSDEM Dinner Interested in Graph Databases and NoSQL, attending this year FOSDEM. Meeting point: 20:00 PM In front of Le Roy d'Espagne Grand Place 1 Brussels The web of data storage engines - DataDevRoom - Fosdem 2011 19
  • 20. Graph databases Moviepilot is hiring Interested in movies, data analytics, ruby, git, opensource. Join us!. Moviepilot is a leading provider and discovery services for movies and TV series, based in Berlin. Interested, talk with @jannis or @purbon The web of data storage engines - DataDevRoom - Fosdem 2011 20
  • 21. Graph databases, the Web of Data storage engines Questions? Pere Urbón Bayes Senior Software Engineer Independent purbon@purbon.com February of 2010 The web of data storage engines - DataDevRoom - Fosdem 2011 21