SlideShare a Scribd company logo
1 of 45
Sparksee Graph Database!
Technology overview!
April 2014
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Graph Databases!
Introduction to Sparksee!
Sparksee Internals!
Performance analysis!
High scalability!
HPC-SGAB Benchmark !
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Graph Databases!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
Graphs are everywhere!
!
— Increasing number of huge networks such as the Web,
Social Networks, Biological Systems, GPS…!
!
— Very large graphs!
!
— Interest for analyzing the !
interrelation between the entities !
in theses networks!
!
Classical graph representation!
!
— Adjacency matrix!
! Very large NxN sparse matrix, no labels, no multigraph,
! no attributes!
— Adjacency list!
! No labels, no attributes, still sparse consuming!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
Classical graph storage!
— Relational database!
! Prefixed schema or very large table for nodes and edges, not !
! suitable for path traversals and graph exploration!
— XML!
! XML data is stored in the form of trees!
! Much work done on finding exact or approximate patterns !
! (subtrees)!
! Not thought for complex graph queries!
— RDF!
! Widely adopted standard for manipulating graph-like data!
! Large support from large vendors!
! SPARQL has become a de facto standard
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
New approaches to graph analysis!
!
— Complex analysis computations on very large distributed
graphs !
! Map-reduce (Pegasus)!
! Vertex-centric computation model (Pregel)
!
— Graph Databases: database functionalities to store and
query graph-like data !
! Graph storage in a file system of a computer node with buffer !
! pool (Neo4j, Hypergraph, OrientDB, Infinitegraph!
! Multiple servers accessible through a load balancer (Neo4j HA)
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
Requirements for graph databases!
!
— Data and schema represented as a graph!
— Data operations based on graph operations!
— Graph-based integrity restrictions!
— Multigraphs!
— Attributes attached to both vertices and edges!
— Graph queries combining edge traversals with attribute !
accesses!
— Diversity of workloads!
— Efficient secondary memory management!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Introduction to Sparksee!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
Sparksee!
!
IS a high-performance and out-of-core !
graph database management system
!
FOR large scale labeled and attributed multigraphs!
!
BASED ON vertical partitioning and collections of objects
identifiers stored as bitmaps
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
!
Sparksee — Characteristics!
!
— Graph split into small structures
! Move to main memory just significant parts (caching)
— Object identifiers (oids) instead of complex objects
! Reduce memory requirements
— Specific structures to improve traversals
! Index the edges and the neighbors of each node
— Attribute indices
! Improve queries based on value filters
— Implemented in C++
! Different APIs (Java, .NET, etc.) through wrappers
!
!
Sparksee — Capabilities!
!
Efficiency
! very compact representation using bitmaps. Highly compressible data !
! structures.
Capacity
! more than 100 billion vertices and edges in a single multicore computer.
Performance
! subsecond response in recommendation queries.
Scalability
! high throughput for concurrent queries.
Consistency
! partial transactional support with recovery.
Multiplatform
! Linux, Windows, MacOSX, Mobile
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
!
Logical graph model!
!
Labeled
! a label (type) for each vertex and edge !
Directed
! edges can have a fixed direction, from tail to head !
Attributed
! variable list of attributes for each!
! vertex and edge !
Multigraph
! multiple edges between two !
! vertices !
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
!
!
!
Sparksee — Architecture!
!
!
!
!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
GDB
GRAPH
DATA STRUCTURES
PLATFORM
DEXCORE
SparkseeCpp – Graph Algorithms
SWIG
SparkseeJava SparkseeNet
.NET
App
JAVA
App
C++
App
BUFFERPOOL
Python
App
Mobile
App
SparkseePhyton
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Sparksee internals
!
!
!
Graph representation!
!
We define a graph G = (V,E,L,T,H,A1,…,Ap) as: !
LABELS L = {(o, l ) | o ∈ (V ∪ E ) ∧ l ∈ string}
TAILS T = {(e, t ) | e ∈ E ∧ t ∈ V }
HEADS H = {(e, h) | e ∈ E ∧ h ∈ V }
ATTRIBUTES Ai = {(o, c ) | o ∈ (V ∪ E ) ∧ c ∈ {int, string, ...}}
!
With this representation:
— the graph is split into multiple lists of pairs!
— the first element of each pair is always a vertex or an edge!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
Graph representation!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
L v1, ARTICLE),
(v2,
ARTICLE),T (e1, v1), (e2,
v2), (e3, v4),
(e , v ), (e ,H (e1, v3), (e2,
v3), (e3, v3),
(e , v ), (e ,Aid (v1, 1), (v2, 2),
(v3, 3), (v4, 4),
(v , 1), (v , 2)Atitle (v1, Europa),
(v2, Europe),
(v , Europe),Anlc (v1, ca), (v2,
fr), (v3, en),
(v , en), (e ,Afilename (v5,
europe.png),
(v , bcn.jpg)Atag (e4, continent)
!
!
Value sets!
!
Groups all pairs of the !
original set with the !
same value as a pair !
between the value and !
the set of objects with !
such value. !
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
L v1, ARTICLE), (v2, ARTICLE),
(v3, ARTICLE),
(v4, ARTICLE), (v5, IMAGE),
(v6, IMAGE), (e1, BABEL), (e2,
BABEL), (e3, REF), (e4, REF),
(e5, CONTAINS),
(e6, CONTAINS), (e7,
CONTAINS)
(ARTICLE, {v1, v2, v3, v4}),
(BABEL, {e1, e2}),
(CONTAINS, {e5, e6, e7}),
(IMAGE, {v5, v6}), (REF, {e3,
e4})
T (e1, v1), (e2, v2), (e3, v4), (e4,
v4), (e5, v3), (e6, v3), (e7, v4)
(v1, {e1}), (v2, {e2}), (v3, {e5,
e6}), (v4, {e3, e4, e7})
H (e1, v3), (e2, v3), (e3, v3), (e4,
v3), (e5, v5), (e6, v6), (e7, v6)
(v3, {e1, e2, e3, e4}), (v5, {e5}),
(v6, {e6, e7})
Aid (v1, 1), (v2, 2), (v3, 3), (v4, 4),
(v5, 1), (v6, 2)
(1, {v1, v5}), (2, {v2, v6}), (3,
{v3}), (4, {v4})
Atitle (v1, Europa), (v2, Europe), (v3,
Europe), (v4, Barcelona)
(Barcelona, {v4}), (Europa,
{v1}), (Europe, {v2, v3})
Anlc (v1, ca), (v2, fr), (v3, en), (v4,
en), (e1, en),(e2, en)
(ca, {v1}), (en, {v3, v4, e1, e2}),
(fr, {v2})
Afilena
me
(v5, europe.png), (v6, bcn.jpg) (bcn.jpg, {v6}), (europe.png,
{v5})
Atag (e4, continent) (continent, {e4})
!
Bitmap representation!
!
— Each vertex and edge is identified by a unique and
immutable !
oid (object identifier)
!
— Each vertex or edge set is stored in a bitmap structure:
! Each position in a bitmap corresponds to the oid of an object!
! Reduced amount of space (compression techniques)
! Very efficient binary logic operations
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
Value set representation!
!
— A value set is represented as two maps!
! One maps each different value to a vertex or edge set!
! The other maps each vertex or edge to a value oid
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
Example of a bitmap based representation!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
Integrity rules!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
!
!
Value set operations!
!
domain returns the set of distinct values
objects returns the set of vertices or edges associated to a
value!
lookup returns the set of values !
associated to a set of objects!
insert adds a vertex or edge to the !
collection of objects of a value!
remove removes a vertex or edge !
from the collection of objects of a value
Graph query examples
— Number of articles!
! |objects (LABELS, ‘ARTICLE’)|
— Out-degree of English article ‘Europe’!
! |objects (TAILS, objects( TITLE, ‘Europe’) ∩ objects (NLC, ‘en’) ∩ objects
! (LABELS, ‘ARTICLE’))|
— Articles with references to the image with filename ‘bcn.jpg’
! ! {lookup(TAILS, x ) |x ∈ objects (HEAD, objects (FILENAME, ′ bcn.jpg′ ) !
! ! ∩ objects (LABELS, ′ IMAGE′ ))} !
— Count the articles of each language
{(x , y ) | x ∈ domain(NLC) ∧ y = |(objects (NLC, x ) ∩ objects (LABELS, !
! ! ′ ARTICLE′ ))|}
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
Implementation details
— Bitmaps are compressed by grouping the bits into clusters
of 32 consecutive bits (up to 137 billion objects per graph)!
— Locality is improved by generating consecutive oids for
each distinct vertex or edge labels!
— Sorted tree structure of bitmap clusters to speedup the
insert, remove, and binary logic operations!
— Maps are implemented using B+ trees
— The tail, head and attribute value sets have been split into
specific value sets for each label
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Performance analysis
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Queries!
!
!
!
!
!
!
!
!
!
Q1: Find the article with the largest outdegree and traverse its shortest path tree
Q2: Recommend articles related to the most popular one
Q3: Find new images for articles from translations in other languages
Q4: Find, for each different language, the number of articles and images referenced
Q5: For each article with images, materialize the count of images
Q6: Remove all articles without images
Q1 Q2 Q3 Q4 Q5 Q6
k-hops and path traversals + +
graph pattern matching +
aggregations and edge connectivity +
graph transformation + +
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Performance Out-of-core!
!
Wikipedia Benchmark out-of-core, 1GB buffer pool.
!
!
!
!
!
!
!
!
(⋆) Java VM with 45 GB
MonetDb MySQL Neo4J* SPARKSEE
Graph Size (GB) 12.00 15.72 42.00 16.98
Load (h) Error 1.36 8.99 2.89
Q1 (s) 4,801.6 > 12 h. > 12 h. 120.5
Q2 (s) 3,788.4 13,841.6 > 12 h. 205.4
Q3 (s) 458.9 33.0 481.0 10.8
Q4 (s) 279,3 45.0 > 12 h. 144.9
Q5 (s) 267.4 930.3 > 12 h. 140.9
Q6 (s) Error 10707.0 > 12 h. 25791.6
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Query statistics!
!
!
!
!
!
!
!
!
!
!
Query results edge trav. edge trav./sec mem MB bitmaps
Q1 624,525 236,387,207 1,987,616.30 832.19 42.97%
Q2 5 261,735,954 1,270,747.94 2,974.50 48.59%
Q3 51,780 1,536,698 143,885.58 320.81 48.00%
Q4 254 4,987,879 33,984.32 245.13 77.67%
Q5 2,401,597 5,934,724 42,072.39 319.00 80.64%
Q6 52,380,949 281,433,106 37,434.27 11,583.88 67.76%
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Bitmap memory usage!
!
!
!
!
!
!
!
!
!
!
Size (MB) Q1 Q2 Q3 Q4 Q5 Q6
LABELS 13.56 11.60 11.60 11.60 11.60 11.60 1.51
TAILS 1,272.32 1,030.90 857.09 229.67 164.79 164.79 90.18
HEADS 633.98 506.98 47.09
Attr. ID 122.77 0.85
Attr. TITLE 835.92 10.87
Attr. NLC 3,618.49 791.29 833.64 617.15
Attr. FILENAME 769.79
Attr. TAG 31.94 2.29
TOTAL 7,298.77 1,042.50 1,375.67 1,032.56 1,010.03 176.39 769.94
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Analysis of bitmap usage!
!
!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Bitmap size distribution!
!
!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
Out of core stress test!
!
!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee technology
!
RMAT/Query 1 Scalability Test!
!
!
!
!
!
!
!
!
!
!
228 is out-of-core (2 billion edges)
º
*Sparsity Technologies — Powering Extreme
Data
sparsity–technologies.com
º
Sparksee Graph
Database
Sparksee technology
SNA Benchmark — Q1, Q6, Q9 and Q12
!
!
!
!
!
!
!
!
!
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
High scalability
!
!
!
High Scalability test — Mirror Servers in Amazon Elastic with a
Load Balancer
!
!
!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee technology
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
HPC-SGAB Benchmark
!
Definition
— HPC-SGAB: Badet et al. 2009!
! Measured in TEPS: traversed edges per second!
— Graph!
! Synthetic (R-Mat)!
! Power law distribution!
! Average: 8 edges/node
— Operations!
! ! Kernel 1: load graph and create indexes!
! ! Kernel 2: find the edge(s) with maximum weight!
! ! Kernel 3: k-hops!
! ! Kernel 4: betweenness centrality (Brandes algorithm)
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
!
Experimental setup
— Systems Tested!
! Sparksee (former DEX)!
! Neo4j!
! HypergraphDB!
! Jena (RDF)
— Platform!
! Single computer with 2 quad core Xeon E5410!
! 11GB RAM!
! LFF 2.25 TB disk!
! Single threaded
— Default benchmark configuration
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
Summary of results
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
Kernel 1 - Load time
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
Kernel 4 - Betweenness centrality
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
!
Bibliography!
!
R. Angles, A. Prat, D. Dominguez, J.L. Larriba, Benchmarking database systems
for social network applications (GRADES 2013)
!
N. Martínez, V. Muntés, S. Gómez, M.A. Águila, D. Dominguez, J.L. Larriba,
Efficient Graph Management Based On Bitmap Indices (IDEAS 2012)
!
N. Martínez, S. Gómez, F. Escalé, DEX: a High-Performance Graph Database
Management System (GDM 2011)
!
D. Dominguez, P. Urbón, A. Giménez, S. Gómez, N. Martínez, and J. L. Larriba,
Survey of Graph Database Performance on the HPC Scalable Graph Analysis
Benchmark (IWDG 2010)
!
N. Martínez, V. Muntés, S. Gómez, J. Nin, M. A. Sánchez, and J. Larriba, Dex:
High-performance Exploration on Large Graphs for Information Retrieval (CIKM
2007)
!
! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee technology
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Thanks!
Q&A!

More Related Content

What's hot

Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Chris Fregly
 

What's hot (19)

5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
 
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
 
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
 
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
 
Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
 
Toronto Spark Meetup Dec 14 2015
Toronto Spark Meetup Dec 14 2015Toronto Spark Meetup Dec 14 2015
Toronto Spark Meetup Dec 14 2015
 
Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016  Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016
 
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
 
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
 
DC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
DC Spark Users Group March 15 2016 - Spark and Netflix RecommendationsDC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
DC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
 
Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016
 
Sydney Spark Meetup Dec 08, 2015
Sydney Spark Meetup Dec 08, 2015Sydney Spark Meetup Dec 08, 2015
Sydney Spark Meetup Dec 08, 2015
 
Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015
 
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistHUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
 
Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
Copenhagen Spark Meetup Nov 25, 2015
Copenhagen Spark Meetup Nov 25, 2015Copenhagen Spark Meetup Nov 25, 2015
Copenhagen Spark Meetup Nov 25, 2015
 

Similar to Sparksee Technology overview

Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
Anubhav Jain
 

Similar to Sparksee Technology overview (20)

Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
 
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
 
Riak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup GroupRiak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup Group
 
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic Data
 
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
 
Scala 20140715
Scala 20140715Scala 20140715
Scala 20140715
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
Vectorized R Execution in Apache Spark
Vectorized R Execution in Apache SparkVectorized R Execution in Apache Spark
Vectorized R Execution in Apache Spark
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 

Recently uploaded

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 

Recently uploaded (20)

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 

Sparksee Technology overview

  • 1. Sparksee Graph Database! Technology overview! April 2014 º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
  • 2. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex Graph Databases! Introduction to Sparksee! Sparksee Internals! Performance analysis! High scalability! HPC-SGAB Benchmark !
  • 3. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex Graph Databases!
  • 4. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseGraph Databases Graphs are everywhere! ! — Increasing number of huge networks such as the Web, Social Networks, Biological Systems, GPS…! ! — Very large graphs! ! — Interest for analyzing the ! interrelation between the entities ! in theses networks! !
  • 5. Classical graph representation! ! — Adjacency matrix! ! Very large NxN sparse matrix, no labels, no multigraph, ! no attributes! — Adjacency list! ! No labels, no attributes, still sparse consuming! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseGraph Databases
  • 6. Classical graph storage! — Relational database! ! Prefixed schema or very large table for nodes and edges, not ! ! suitable for path traversals and graph exploration! — XML! ! XML data is stored in the form of trees! ! Much work done on finding exact or approximate patterns ! ! (subtrees)! ! Not thought for complex graph queries! — RDF! ! Widely adopted standard for manipulating graph-like data! ! Large support from large vendors! ! SPARQL has become a de facto standard º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseGraph Databases
  • 7. New approaches to graph analysis! ! — Complex analysis computations on very large distributed graphs ! ! Map-reduce (Pegasus)! ! Vertex-centric computation model (Pregel) ! — Graph Databases: database functionalities to store and query graph-like data ! ! Graph storage in a file system of a computer node with buffer ! ! pool (Neo4j, Hypergraph, OrientDB, Infinitegraph! ! Multiple servers accessible through a load balancer (Neo4j HA) º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseGraph Databases
  • 8. Requirements for graph databases! ! — Data and schema represented as a graph! — Data operations based on graph operations! — Graph-based integrity restrictions! — Multigraphs! — Attributes attached to both vertices and edges! — Graph queries combining edge traversals with attribute ! accesses! — Diversity of workloads! — Efficient secondary memory management! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseGraph Databases
  • 9. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex Introduction to Sparksee!
  • 10. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseIntroduction to Sparksee Sparksee! ! IS a high-performance and out-of-core ! graph database management system ! FOR large scale labeled and attributed multigraphs! ! BASED ON vertical partitioning and collections of objects identifiers stored as bitmaps
  • 11. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseIntroduction to Sparksee ! Sparksee — Characteristics! ! — Graph split into small structures ! Move to main memory just significant parts (caching) — Object identifiers (oids) instead of complex objects ! Reduce memory requirements — Specific structures to improve traversals ! Index the edges and the neighbors of each node — Attribute indices ! Improve queries based on value filters — Implemented in C++ ! Different APIs (Java, .NET, etc.) through wrappers
  • 12. ! ! Sparksee — Capabilities! ! Efficiency ! very compact representation using bitmaps. Highly compressible data ! ! structures. Capacity ! more than 100 billion vertices and edges in a single multicore computer. Performance ! subsecond response in recommendation queries. Scalability ! high throughput for concurrent queries. Consistency ! partial transactional support with recovery. Multiplatform ! Linux, Windows, MacOSX, Mobile º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseIntroduction to Sparksee
  • 13. ! Logical graph model! ! Labeled ! a label (type) for each vertex and edge ! Directed ! edges can have a fixed direction, from tail to head ! Attributed ! variable list of attributes for each! ! vertex and edge ! Multigraph ! multiple edges between two ! ! vertices ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseIntroduction to Sparksee
  • 14. ! ! ! Sparksee — Architecture! ! ! ! ! ! ! ! ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseIntroduction to Sparksee GDB GRAPH DATA STRUCTURES PLATFORM DEXCORE SparkseeCpp – Graph Algorithms SWIG SparkseeJava SparkseeNet .NET App JAVA App C++ App BUFFERPOOL Python App Mobile App SparkseePhyton
  • 15. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex Sparksee internals
  • 16. ! ! ! Graph representation! ! We define a graph G = (V,E,L,T,H,A1,…,Ap) as: ! LABELS L = {(o, l ) | o ∈ (V ∪ E ) ∧ l ∈ string} TAILS T = {(e, t ) | e ∈ E ∧ t ∈ V } HEADS H = {(e, h) | e ∈ E ∧ h ∈ V } ATTRIBUTES Ai = {(o, c ) | o ∈ (V ∪ E ) ∧ c ∈ {int, string, ...}} ! With this representation: — the graph is split into multiple lists of pairs! — the first element of each pair is always a vertex or an edge! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 17. Graph representation! ! ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals L v1, ARTICLE), (v2, ARTICLE),T (e1, v1), (e2, v2), (e3, v4), (e , v ), (e ,H (e1, v3), (e2, v3), (e3, v3), (e , v ), (e ,Aid (v1, 1), (v2, 2), (v3, 3), (v4, 4), (v , 1), (v , 2)Atitle (v1, Europa), (v2, Europe), (v , Europe),Anlc (v1, ca), (v2, fr), (v3, en), (v , en), (e ,Afilename (v5, europe.png), (v , bcn.jpg)Atag (e4, continent)
  • 18. ! ! Value sets! ! Groups all pairs of the ! original set with the ! same value as a pair ! between the value and ! the set of objects with ! such value. ! ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals L v1, ARTICLE), (v2, ARTICLE), (v3, ARTICLE), (v4, ARTICLE), (v5, IMAGE), (v6, IMAGE), (e1, BABEL), (e2, BABEL), (e3, REF), (e4, REF), (e5, CONTAINS), (e6, CONTAINS), (e7, CONTAINS) (ARTICLE, {v1, v2, v3, v4}), (BABEL, {e1, e2}), (CONTAINS, {e5, e6, e7}), (IMAGE, {v5, v6}), (REF, {e3, e4}) T (e1, v1), (e2, v2), (e3, v4), (e4, v4), (e5, v3), (e6, v3), (e7, v4) (v1, {e1}), (v2, {e2}), (v3, {e5, e6}), (v4, {e3, e4, e7}) H (e1, v3), (e2, v3), (e3, v3), (e4, v3), (e5, v5), (e6, v6), (e7, v6) (v3, {e1, e2, e3, e4}), (v5, {e5}), (v6, {e6, e7}) Aid (v1, 1), (v2, 2), (v3, 3), (v4, 4), (v5, 1), (v6, 2) (1, {v1, v5}), (2, {v2, v6}), (3, {v3}), (4, {v4}) Atitle (v1, Europa), (v2, Europe), (v3, Europe), (v4, Barcelona) (Barcelona, {v4}), (Europa, {v1}), (Europe, {v2, v3}) Anlc (v1, ca), (v2, fr), (v3, en), (v4, en), (e1, en),(e2, en) (ca, {v1}), (en, {v3, v4, e1, e2}), (fr, {v2}) Afilena me (v5, europe.png), (v6, bcn.jpg) (bcn.jpg, {v6}), (europe.png, {v5}) Atag (e4, continent) (continent, {e4})
  • 19. ! Bitmap representation! ! — Each vertex and edge is identified by a unique and immutable ! oid (object identifier) ! — Each vertex or edge set is stored in a bitmap structure: ! Each position in a bitmap corresponds to the oid of an object! ! Reduced amount of space (compression techniques) ! Very efficient binary logic operations º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 20. ! Value set representation! ! — A value set is represented as two maps! ! One maps each different value to a vertex or edge set! ! The other maps each vertex or edge to a value oid ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 21. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals ! Example of a bitmap based representation! ! ! ! ! ! ! ! !
  • 22. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals ! Integrity rules! ! ! ! ! ! ! ! !
  • 23. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals ! ! ! Value set operations! ! domain returns the set of distinct values objects returns the set of vertices or edges associated to a value! lookup returns the set of values ! associated to a set of objects! insert adds a vertex or edge to the ! collection of objects of a value! remove removes a vertex or edge ! from the collection of objects of a value
  • 24. Graph query examples — Number of articles! ! |objects (LABELS, ‘ARTICLE’)| — Out-degree of English article ‘Europe’! ! |objects (TAILS, objects( TITLE, ‘Europe’) ∩ objects (NLC, ‘en’) ∩ objects ! (LABELS, ‘ARTICLE’))| — Articles with references to the image with filename ‘bcn.jpg’ ! ! {lookup(TAILS, x ) |x ∈ objects (HEAD, objects (FILENAME, ′ bcn.jpg′ ) ! ! ! ∩ objects (LABELS, ′ IMAGE′ ))} ! — Count the articles of each language {(x , y ) | x ∈ domain(NLC) ∧ y = |(objects (NLC, x ) ∩ objects (LABELS, ! ! ! ′ ARTICLE′ ))|} º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 25. ! Implementation details — Bitmaps are compressed by grouping the bits into clusters of 32 consecutive bits (up to 137 billion objects per graph)! — Locality is improved by generating consecutive oids for each distinct vertex or edge labels! — Sorted tree structure of bitmap clusters to speedup the insert, remove, and binary logic operations! — Maps are implemented using B+ trees — The tail, head and attribute value sets have been split into specific value sets for each label º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 26. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex Performance analysis
  • 27. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis ! Queries! ! ! ! ! ! ! ! ! ! Q1: Find the article with the largest outdegree and traverse its shortest path tree Q2: Recommend articles related to the most popular one Q3: Find new images for articles from translations in other languages Q4: Find, for each different language, the number of articles and images referenced Q5: For each article with images, materialize the count of images Q6: Remove all articles without images Q1 Q2 Q3 Q4 Q5 Q6 k-hops and path traversals + + graph pattern matching + aggregations and edge connectivity + graph transformation + +
  • 28. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis ! Performance Out-of-core! ! Wikipedia Benchmark out-of-core, 1GB buffer pool. ! ! ! ! ! ! ! ! (⋆) Java VM with 45 GB MonetDb MySQL Neo4J* SPARKSEE Graph Size (GB) 12.00 15.72 42.00 16.98 Load (h) Error 1.36 8.99 2.89 Q1 (s) 4,801.6 > 12 h. > 12 h. 120.5 Q2 (s) 3,788.4 13,841.6 > 12 h. 205.4 Q3 (s) 458.9 33.0 481.0 10.8 Q4 (s) 279,3 45.0 > 12 h. 144.9 Q5 (s) 267.4 930.3 > 12 h. 140.9 Q6 (s) Error 10707.0 > 12 h. 25791.6
  • 29. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis ! Query statistics! ! ! ! ! ! ! ! ! ! ! Query results edge trav. edge trav./sec mem MB bitmaps Q1 624,525 236,387,207 1,987,616.30 832.19 42.97% Q2 5 261,735,954 1,270,747.94 2,974.50 48.59% Q3 51,780 1,536,698 143,885.58 320.81 48.00% Q4 254 4,987,879 33,984.32 245.13 77.67% Q5 2,401,597 5,934,724 42,072.39 319.00 80.64% Q6 52,380,949 281,433,106 37,434.27 11,583.88 67.76% !
  • 30. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis ! Bitmap memory usage! ! ! ! ! ! ! ! ! ! ! Size (MB) Q1 Q2 Q3 Q4 Q5 Q6 LABELS 13.56 11.60 11.60 11.60 11.60 11.60 1.51 TAILS 1,272.32 1,030.90 857.09 229.67 164.79 164.79 90.18 HEADS 633.98 506.98 47.09 Attr. ID 122.77 0.85 Attr. TITLE 835.92 10.87 Attr. NLC 3,618.49 791.29 833.64 617.15 Attr. FILENAME 769.79 Attr. TAG 31.94 2.29 TOTAL 7,298.77 1,042.50 1,375.67 1,032.56 1,010.03 176.39 769.94
  • 31. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis ! Analysis of bitmap usage! ! ! ! ! ! ! ! ! ! !
  • 32. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis ! Bitmap size distribution! ! ! ! ! ! ! ! ! ! !
  • 33. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis Out of core stress test! ! ! ! ! ! ! ! ! ! !
  • 34. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee technology ! RMAT/Query 1 Scalability Test! ! ! ! ! ! ! ! ! ! ! 228 is out-of-core (2 billion edges)
  • 35. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph Database Sparksee technology SNA Benchmark — Q1, Q6, Q9 and Q12 ! ! ! ! ! ! ! ! !
  • 36. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex High scalability
  • 37. ! ! ! High Scalability test — Mirror Servers in Amazon Elastic with a Load Balancer ! ! ! ! ! ! ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee technology
  • 38. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex HPC-SGAB Benchmark
  • 39. ! Definition — HPC-SGAB: Badet et al. 2009! ! Measured in TEPS: traversed edges per second! — Graph! ! Synthetic (R-Mat)! ! Power law distribution! ! Average: 8 edges/node — Operations! ! ! Kernel 1: load graph and create indexes! ! ! Kernel 2: find the edge(s) with maximum weight! ! ! Kernel 3: k-hops! ! ! Kernel 4: betweenness centrality (Brandes algorithm) º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 40. ! ! Experimental setup — Systems Tested! ! Sparksee (former DEX)! ! Neo4j! ! HypergraphDB! ! Jena (RDF) — Platform! ! Single computer with 2 quad core Xeon E5410! ! 11GB RAM! ! LFF 2.25 TB disk! ! Single threaded — Default benchmark configuration º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 41. Summary of results ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 42. Kernel 1 - Load time ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 43. Kernel 4 - Betweenness centrality ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 44. ! ! Bibliography! ! R. Angles, A. Prat, D. Dominguez, J.L. Larriba, Benchmarking database systems for social network applications (GRADES 2013) ! N. Martínez, V. Muntés, S. Gómez, M.A. Águila, D. Dominguez, J.L. Larriba, Efficient Graph Management Based On Bitmap Indices (IDEAS 2012) ! N. Martínez, S. Gómez, F. Escalé, DEX: a High-Performance Graph Database Management System (GDM 2011) ! D. Dominguez, P. Urbón, A. Giménez, S. Gómez, N. Martínez, and J. L. Larriba, Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark (IWDG 2010) ! N. Martínez, V. Muntés, S. Gómez, J. Nin, M. A. Sánchez, and J. Larriba, Dex: High-performance Exploration on Large Graphs for Information Retrieval (CIKM 2007) ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee technology
  • 45. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex Thanks! Q&A!