SlideShare ist ein Scribd-Unternehmen logo
1 von 71
Graph Databases
&
Neo4J
Girish Khanzode
Graph Databases
• Graph Based NoSQL Database
• Property Graph Model
• Neo4j
• Noe4j Architecture
• Data Storage
• Programmatic Data Access
• Core API
• Lucene
• Auto Index lifecycle
• Traversers API
• Cypher
• Graph Algorithms
• Neo4j HA
• Cache Sharding
• References
Graphs
• A collection nodes (things) and edges (relationships) that
connect pairs of nodes
– Suitable for any data that is related
• Can attach properties (key-value pairs) on nodes and
relationships
• Relationships connect two nodes and both nodes and
relationships can hold an arbitrary amount of key-value pairs
Graph Relations are Universal
Graph
Graphs
• Well-understood patterns and algorithms
– Studied since Leonard Euler's 7 Bridges (1736)
– Codd's Relational Model (1970)
• Knowledge graph - beyond links, search is smarter when considering how things
are related
• Facebook graph search – people interested in finding things in their part of the
world
• Bing + Britannica: referencing and cross-referencing
• People - relationships to people, to organizations, to places, to things - personal
graph
A Graph Database
• Relationships are first citizens
• NoSQL database optimized for connected data
– Social networking, logistics networks, recommendation engines
– Relationships are as important as the records
– 1000 times faster than RDBMS for connected data
• Uses graph structures with nodes, edges and properties to store data
• Open source graph databases - Neo4j, InfiniteGraph, InfoGrid,OrientDB
• Very fast querying across records
Graph Database
A Graph Database
• Transactional with the usual operations
• RDBMS - can tell sales in last year
• Graph database – can tell customer which book to buy next
• Index-free adjacency
– Every node is a pointer to its adjacent element
• Edges hold most of the important information and relations
– nodes to other nodes
– nodes to properties
Graph Based NoSQL Database
• No rigid format of SQL or the tables and columns representation
• Uses a flexible graphical representation - addresses scalability concerns
• Data can be easily transformed from one model to the other using a
graph based NoSQL database
• Nodes are organised by some relationships with one another represented
by edges between the nodes
• Both nodes and the relationships have some defined properties
Graph Based NoSQL Database
• Labelled, directed, attributed multi-graph - Graphs contains nodes which
are labelled properly with some properties and these nodes have some
relationship with one another which is shown by the directional edges
• While relational database models can replicate the graphical ones, the
edge would require a join which is a costly proposition
Advantages
• Easier Relationships Analysis
• Very fast for associative data sets
– Like social networks
• Map more directly to object oriented applications
– Object classification and Parent->Child relationships
Disadvantages
• If data is just tabular with not much relationship between the
data, graph databases do not fare well
• OLAP support for graph databases not mature
Performance Experiment
• Compute social network path exists
• 1000 persons
• Average 50 friends per person
• pathExists(a, b) limited to depth 4
# persons query time
Relational
database
1000 2000ms
Neo4j 1000 2ms
Neo4j 1000000 2ms
Property Graph Model
name: the Doctor
age: 907
species:Time Lord
first name: Rose
late name:Tyler
vehicle: Skoda
model:Type 40
Graphs -Whiteboard-friendly
• No decomposition, ER design, normalization / de-
normalization as needed with RDBMS
Neo4j
• A Graph Database
• A Property Graph containing Nodes, Relationships with Properties on
both
• Manage complex, highly connected data
• Scalable - High-performance with High-Availability
– Traverse 1,000,000+ relationships / second on commodity hardware
• Server with REST API, or Embeddable on the JVM
Neo4j
• Full ACID transactions
• Schema free, bottom-up data model design
• Stable
• Easier than RDBMS since no need for normalization
• Implemented in Java
• Open Source
Neo4j
• Schema free – Data does not have to adhere to any convention
• Support for wide variety of languages - Java, Python, Perl, Scala,Cypher
• A graph database can be thought of as a key-value store, with full support
for relationships.
• Graph databases don’t avoid design efforts
• Good design still requires effort
Why Neo4J?
• The internet is a network of pages connected to each other.
What is a better way to model that than in graphs?
• No time lost fighting with less expressive data-stores
• Easy to implement experimental features
• A single instance of Neo4j can house at most 34 billion nodes,
34 billion relationships and 68 billion properties
Core API
REST API
JVM Language Bindings
Traversal Framework
Caches
Memory-Mapped (N)IO
Filesystem
Java Ruby Clojure…
Graph Matching
Noe4j Architecture
Software Architecture
Data Storage
• Neo4j stores graph data in a number of different store files
• Each store file contains the data for a specific part of the
graph
– neostore.nodestore.db
– neostore.relationshipstore.db
– neostore.propertystore.db
– neostore.propertystore.db.index
– neostore.propertystore.db.strings
– neostore.propertystore.db.arrays
Node Store
• Size: 9 bytes
– 1st byte - in-use flag
– Next 4 bytes - ID of first relationship
– Last 4 bytes - ID of first property of node
• Fixed size records enable fast lookups
Relationship store
• neostore.relationshipstore.db
• Size: 33 bytes
• 1st byte - In use flag
• Next 8 bytes - IDs of the nodes at the start and end of the relationship
• 4 bytes - Pointer to the relationship type
• 16 bytes - pointers for the next and previous relationship records for each of the start and end nodes. (
property chain)
• 4 bytes - next property id
Relationships Storage
Data Size
nodes 235 (∼ 34 billion)
relationships 235 (∼ 34 billion)
properties 236 to 238 depending on property types (maximum ∼ 274
billion, always at least ∼ 68 billion)
relationship
types
215 (∼ 32 000)
Neo4j API – LogicalView
Programmatic Data Access
• JavaAPIs - JVM languages bind to sameAPIs
• JRuby, Jython, Clojure, Scala…
• Manage nodes and relationships
• Indexing – find data without traversal
• Traversing
• Path finding
• Pattern matching
Core API
• Deals with graphs in terms of their fundamentals
• Nodes - properties
– KV Pairs
• Relationships
– Start node
– End node
– Properties
• KV Pairs
Create Node
GraphDatabaseService db = new EmbeddedGraphDatabase("/tmp/neo");
Transaction tx = db.beginTx();
try {
Node theDoctor = db.createNode();
theDoctor.setProperty("character", "the Doctor");
tx.success();
} finally
{
tx.finish();
}
Create Relationships
Transaction tx = db.beginTx();
try {
Node theDoctor = db.createNode();
theDoctor.setProperty("character", "The Doctor");
Node susan = db.createNode();
susan.setProperty("firstname", "Susan");
susan.setProperty("lastname", "Campbell");
susan.createRelationshipTo(theDoctor,DynamicRelationshipType.withName("COMPANION_OF"));
tx.success();
} finally
{
tx.finish();
}
Index a Graph
• Graphs themselves are indexes
• Can create short-cuts to well-known nodes
• In program, keep a reference to any interesting node
• Indexes offer flexibility in what constitutes an “interesting
node”
Lucene
• The default index implementation for Neo4j
– Default implementation for IndexManager
• Supports many indexes per database
• Each index supports nodes or relationships
• Supports exact and regex-based matching
• Supports scoring
– Number of hits in the index for a given item
– Great for recommendations
Create a Node Index
GraphDatabaseService db = …
Index<Node> planets = db.index().forNodes("planets");
Type
Type
Indexname
CreateOR
retrieve
Create a Relationship Index
GraphDatabaseService db = …
Index<Relationship> enemies = db.index().forRelationships("enemies");
Type
Type
Indexname
CreateOR
retrieve
Exact Matches
GraphDatabaseService db = …
Index<Node> actors = doctorWhoDatabase.index().forNodes("actors");
Node rogerDelgado = actors.get("actor", "Roger Delgado“).getSingle();
Valueto
match
Firstmatch
only
Key
Query Matches
GraphDatabaseService db = …
Index<Node> species = doctorWhoDatabase.index().forNodes("species");
IndexHits<Node> speciesHits = species.query("species“,"S*n");
Query
Key
Transactions to Mutate Indexes
• Mutating access is still protected by transactions which cover both index and graph
GraphDatabaseService db = …
Transaction tx = db.beginTx();
try {
Node nixon= db.createNode();
nixon("character", "Richard Nixon");
db.index().forNodes("characters").add(nixon,
"character“, nixon.getProperty("character"));
tx.success();
} finally {
tx.finish();
}
Auto Index lifecycle
• Auto Index - stays consistent with the graph data
• Specify the property name to index while creation
• If node/relationship or property is removed from the graph it is removed
from the index
• If database started with auto indexing enabled but different auto indexed
properties than the last run, then already auto-indexed entities will be
deleted as they are worked upon
• Re-indexing is a manual
– Existing properties not indexed unless touched
Auto Index lifecycle
AutoIndexer<Node> nodeAutoIndex = graphDb.index().getNodeAutoIndexer();
nodeAutoIndex.startAutoIndexingProperty("species");
nodeAutoIndex.setEnabled( true );
ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
Node -> Relationship Indexes Supported
Core API
• Basic (nodes, relationships)
• Fast
• Imperative
• Flexible - Easily intermix mutating operations
Traversers API
• Mechanisms to query graph navigating from starting node to
related nodes according to algorithm to get answers
• Expressive
• Fast
• Declarative (mostly)
• Opinionated
Cypher - A Graph Query Language
• Query Language for Neo4j
• A declarative graph pattern matching language
– SQL for graphs
– Tabular results
• aggregation, ordering and limits
• Mutating operations
• CRUD
• Easy to formulate queries based on relationships
• Many features stem from improving pain points of SQL like join tables
Cypher - A Graph Query Language
Cypher
Query
• Query:
MATCH(n:Crew)-[r:KNOWS*]-m
WHERE n.name = ‘Neo’
RETUEN nAS Nep,r,m
Operations
• Aggregation - COUNT, SUM, AVG, MAX, MIN, COLLECT
• Where clause
start doctor=node:characters(name = 'Doctor‘)
match (doctor)<-[:PLAYED]-(actor)-[:APPEARED_IN]->(episode) where actor.actor = 'Tom
Baker‘ and episode.title =~ /.*Dalek.*/
return episode.title
• Ordering
– order by <property>
– order by <property> desc
Graph Algorithms
• Neo4j has built-in algorithms
• Callable through JVM and REST APIs
• Higher level of abstraction
• Graph Matching
– Look for patterns in a data set - retail analytics
– Higher-level abstraction than raw traversers
• REST API
– Access the server
• Binary protocol
– JSON as default format
Neo4j HA - High Availability Cluster
• A scalability package known as high availability or HA that
uses a master-slave cluster architecture
– Full data redundancy
– Service fault tolerance
– Linear read scalability
– Master-slave replication
• Single data-centre or global zones
– tolerance for high-latency
Neo4j HA
• Redundancy - improved uptime
– automatic failover
• In a Neo4j HA cluster the full graph is replicated to each instance in the
cluster.
• Full dataset is replicated across the entire cluster to each server
• Read operations can be done locally on each slave
• Read capacity of the HA cluster increases linearly with the number of
servers
Neo4j HA
HA Cluster Architecture
• Cluster performs automatic master election
• Supports master-slave replication for clustering and DR
across sites
HA Cluster Architecture
Write to a Master
• All write operations are co-ordinated by the master
• Writes to the master are fast
• Slaves eventually catch up
Write to a Master
Write to a Slave
• Writes to a slave cause a synchronous transaction
with the master
• Other slaves eventually catch up
Write to a Slave
Server Overload Problem
• Unlike other classes of NOSQL database, a graph does not
have predictable lookup since it is a highly mutable structure
• We want to co-locate related nodes for traversal
performance, but we don’t want to place so many connected
nodes on the same database that it becomes heavily loaded
• The black-hole problem - popular nodes get lumped together
on a single instance, but there is low point cut
Server Overload Problem
Thinly Spread Network
• The opposite is also true, that we don’t want too widely connected nodes
across different database instances since it will incur a substantial
performance penalty at runtime as traversals cross the (relatively latent)
network
• Load-leveling alone can lead to many relationships crossing instances
• These are very expensive to traverse, networks are many orders of
magnitude slower than in-memory traversals
Thinly Spread Network
Minimal Point Cut
• The best approach is to balance a graph across database instances by
creating a minimum point cut for a graph, where graph nodes are placed
such that there are few relationships that span shards
• Good strategy is to take a local view of the graph (no global locks) and
work incrementally (short bursts)
• Take into account use patterns
• Unlike other NoSQL stores, graph s are not predictable so we can not use
techniques like consistent hashing for scale out
Minimal Point Cut
Cache Sharding
• A strategy for large data sets of terabyte scale
• Mandates consistent request routing
• For instance, requests for user A are always sent to server 1,
while requests for user B are always sent to server 2 and so on
• The key assumption is that requests for user A typically touch
parts of the graph around user A, such has his or her friends,
preferences, likes and so on
Cache Sharding
• This means that the neighbourhood of the graph around user
A will be cached on server 1, while the neighbourhood around
user B will be cached on server 2
• By employing consistent routing of requests, the caches of all
servers in the HA cluster can be utilized maximally
• Strategy is highly effective for managing a large graph that
does not fit in RAM
Consistent Routing
• Always try to route related requests to the same server to hopefully
benefit from warm caches
Domain Specific Sharding
• No easy to shard graphs like documents or KV stores
• High performance graph databases limited in terms of data set size that
can be handled by a single machine
• Use replicas to speed up and improve availability but limits data set size
limited to a single machine’s disk/memory
• No perfect algorithm exists but domain insight of expert helps
Domain Specific Sharding
• Some domains can shard easily (geo, most web apps) using consistent
routing approach and cache sharding
– Geo - where the connections between cities are few compared with the
connections within the cities. So can place cities or countries on different
nodes
• Eventually (Petabytes) level data cannot be replicated practically
• Need to shard data across machines
References
1. http://www.neo4j.org
2. http://www.neo4j.org/learn/cypher
3. Bachman, Michal (2013)GraphAware -TowardsOnline Analytical Processing in Graph Databases
http://graphaware.com/assets/bachman-msc-thesis.pdf
4. Hunger, Michael (2012). Cypher and Neo4j http://vimeo.com/83797381
5. Mistry, Deep Neo4j: A Developer’s Perspective
http://osintegrators.com/opensoftwareintegrators%7Cneo4jadevelopersperspective
6. MapGraph:A High LevelAPI for Fast Development of High Performance GraphAnalytics on GPUs
7. Parallel Breadth First Search on GPU Clusters
8. DB-Engines Ranking of Graph DBMS
ThankYou
Check Out My LinkedIn Profile at
https://in.linkedin.com/in/girishkhanzode

Weitere ähnliche Inhalte

Was ist angesagt?

Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j
 
Introduction to Neo4j and .Net
Introduction to Neo4j and .NetIntroduction to Neo4j and .Net
Introduction to Neo4j and .NetNeo4j
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks EDB
 
NoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereNoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereEugene Hanikblum
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j PresentationMax De Marzi
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainNeo4j
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Neo4j 4 Overview
Neo4j 4 OverviewNeo4j 4 Overview
Neo4j 4 OverviewNeo4j
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptxSurya937648
 

Was ist angesagt? (20)

Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic training
 
Introduction to Neo4j and .Net
Introduction to Neo4j and .NetIntroduction to Neo4j and .Net
Introduction to Neo4j and .Net
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
 
NoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereNoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and Where
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j Presentation
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Neo4j 4 Overview
Neo4j 4 OverviewNeo4j 4 Overview
Neo4j 4 Overview
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Graph database
Graph database Graph database
Graph database
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Mongo DB Presentation
Mongo DB PresentationMongo DB Presentation
Mongo DB Presentation
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
 

Ähnlich wie Graph Databases & Neo4J Overview

Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training IntroductionMax De Marzi
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databasesthai
 
Ciel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnellesCiel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnellesXavier Gorse
 
Graph Database and Neo4j
Graph Database and Neo4jGraph Database and Neo4j
Graph Database and Neo4jSina Khorami
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms_mdev_
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jDatabricks
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jFred Madrid
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
 
DBMS & Data Models - In Introduction
DBMS & Data Models - In IntroductionDBMS & Data Models - In Introduction
DBMS & Data Models - In IntroductionRajeev Srivastava
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDBArpit Poladia
 
Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011jexp
 
Demo Neo4j - Big Data Paris
Demo Neo4j - Big Data ParisDemo Neo4j - Big Data Paris
Demo Neo4j - Big Data ParisNeo4j
 
Intro to Graphs for Fedict
Intro to Graphs for FedictIntro to Graphs for Fedict
Intro to Graphs for FedictRik Van Bruggen
 

Ähnlich wie Graph Databases & Neo4J Overview (20)

Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Ciel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnellesCiel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnelles
 
Graph Database and Neo4j
Graph Database and Neo4jGraph Database and Neo4j
Graph Database and Neo4j
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph Databases
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
DBMS & Data Models - In Introduction
DBMS & Data Models - In IntroductionDBMS & Data Models - In Introduction
DBMS & Data Models - In Introduction
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDB
 
Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011
 
NoSql Brownbag
NoSql BrownbagNoSql Brownbag
NoSql Brownbag
 
Demo Neo4j - Big Data Paris
Demo Neo4j - Big Data ParisDemo Neo4j - Big Data Paris
Demo Neo4j - Big Data Paris
 
Intro to Graphs for Fedict
Intro to Graphs for FedictIntro to Graphs for Fedict
Intro to Graphs for Fedict
 

Mehr von Girish Khanzode (12)

Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Data Visulalization
Data VisulalizationData Visulalization
Data Visulalization
 
IR
IRIR
IR
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
NLP
NLPNLP
NLP
 
NLTK
NLTKNLTK
NLTK
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Hadoop
HadoopHadoop
Hadoop
 
Language R
Language RLanguage R
Language R
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
 
Funtional Programming
Funtional ProgrammingFuntional Programming
Funtional Programming
 

Kürzlich hochgeladen

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Kürzlich hochgeladen (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Graph Databases & Neo4J Overview

  • 2. Graph Databases • Graph Based NoSQL Database • Property Graph Model • Neo4j • Noe4j Architecture • Data Storage • Programmatic Data Access • Core API • Lucene • Auto Index lifecycle • Traversers API • Cypher • Graph Algorithms • Neo4j HA • Cache Sharding • References
  • 3. Graphs • A collection nodes (things) and edges (relationships) that connect pairs of nodes – Suitable for any data that is related • Can attach properties (key-value pairs) on nodes and relationships • Relationships connect two nodes and both nodes and relationships can hold an arbitrary amount of key-value pairs
  • 6. Graphs • Well-understood patterns and algorithms – Studied since Leonard Euler's 7 Bridges (1736) – Codd's Relational Model (1970) • Knowledge graph - beyond links, search is smarter when considering how things are related • Facebook graph search – people interested in finding things in their part of the world • Bing + Britannica: referencing and cross-referencing • People - relationships to people, to organizations, to places, to things - personal graph
  • 7. A Graph Database • Relationships are first citizens • NoSQL database optimized for connected data – Social networking, logistics networks, recommendation engines – Relationships are as important as the records – 1000 times faster than RDBMS for connected data • Uses graph structures with nodes, edges and properties to store data • Open source graph databases - Neo4j, InfiniteGraph, InfoGrid,OrientDB • Very fast querying across records
  • 9. A Graph Database • Transactional with the usual operations • RDBMS - can tell sales in last year • Graph database – can tell customer which book to buy next • Index-free adjacency – Every node is a pointer to its adjacent element • Edges hold most of the important information and relations – nodes to other nodes – nodes to properties
  • 10. Graph Based NoSQL Database • No rigid format of SQL or the tables and columns representation • Uses a flexible graphical representation - addresses scalability concerns • Data can be easily transformed from one model to the other using a graph based NoSQL database • Nodes are organised by some relationships with one another represented by edges between the nodes • Both nodes and the relationships have some defined properties
  • 11. Graph Based NoSQL Database • Labelled, directed, attributed multi-graph - Graphs contains nodes which are labelled properly with some properties and these nodes have some relationship with one another which is shown by the directional edges • While relational database models can replicate the graphical ones, the edge would require a join which is a costly proposition
  • 12. Advantages • Easier Relationships Analysis • Very fast for associative data sets – Like social networks • Map more directly to object oriented applications – Object classification and Parent->Child relationships
  • 13. Disadvantages • If data is just tabular with not much relationship between the data, graph databases do not fare well • OLAP support for graph databases not mature
  • 14. Performance Experiment • Compute social network path exists • 1000 persons • Average 50 friends per person • pathExists(a, b) limited to depth 4 # persons query time Relational database 1000 2000ms Neo4j 1000 2ms Neo4j 1000000 2ms
  • 15. Property Graph Model name: the Doctor age: 907 species:Time Lord first name: Rose late name:Tyler vehicle: Skoda model:Type 40
  • 16. Graphs -Whiteboard-friendly • No decomposition, ER design, normalization / de- normalization as needed with RDBMS
  • 17. Neo4j • A Graph Database • A Property Graph containing Nodes, Relationships with Properties on both • Manage complex, highly connected data • Scalable - High-performance with High-Availability – Traverse 1,000,000+ relationships / second on commodity hardware • Server with REST API, or Embeddable on the JVM
  • 18. Neo4j • Full ACID transactions • Schema free, bottom-up data model design • Stable • Easier than RDBMS since no need for normalization • Implemented in Java • Open Source
  • 19. Neo4j • Schema free – Data does not have to adhere to any convention • Support for wide variety of languages - Java, Python, Perl, Scala,Cypher • A graph database can be thought of as a key-value store, with full support for relationships. • Graph databases don’t avoid design efforts • Good design still requires effort
  • 20. Why Neo4J? • The internet is a network of pages connected to each other. What is a better way to model that than in graphs? • No time lost fighting with less expressive data-stores • Easy to implement experimental features • A single instance of Neo4j can house at most 34 billion nodes, 34 billion relationships and 68 billion properties
  • 21. Core API REST API JVM Language Bindings Traversal Framework Caches Memory-Mapped (N)IO Filesystem Java Ruby Clojure… Graph Matching Noe4j Architecture
  • 23. Data Storage • Neo4j stores graph data in a number of different store files • Each store file contains the data for a specific part of the graph – neostore.nodestore.db – neostore.relationshipstore.db – neostore.propertystore.db – neostore.propertystore.db.index – neostore.propertystore.db.strings – neostore.propertystore.db.arrays
  • 24. Node Store • Size: 9 bytes – 1st byte - in-use flag – Next 4 bytes - ID of first relationship – Last 4 bytes - ID of first property of node • Fixed size records enable fast lookups
  • 25. Relationship store • neostore.relationshipstore.db • Size: 33 bytes • 1st byte - In use flag • Next 8 bytes - IDs of the nodes at the start and end of the relationship • 4 bytes - Pointer to the relationship type • 16 bytes - pointers for the next and previous relationship records for each of the start and end nodes. ( property chain) • 4 bytes - next property id
  • 27. Data Size nodes 235 (∼ 34 billion) relationships 235 (∼ 34 billion) properties 236 to 238 depending on property types (maximum ∼ 274 billion, always at least ∼ 68 billion) relationship types 215 (∼ 32 000)
  • 28. Neo4j API – LogicalView
  • 29. Programmatic Data Access • JavaAPIs - JVM languages bind to sameAPIs • JRuby, Jython, Clojure, Scala… • Manage nodes and relationships • Indexing – find data without traversal • Traversing • Path finding • Pattern matching
  • 30. Core API • Deals with graphs in terms of their fundamentals • Nodes - properties – KV Pairs • Relationships – Start node – End node – Properties • KV Pairs
  • 31. Create Node GraphDatabaseService db = new EmbeddedGraphDatabase("/tmp/neo"); Transaction tx = db.beginTx(); try { Node theDoctor = db.createNode(); theDoctor.setProperty("character", "the Doctor"); tx.success(); } finally { tx.finish(); }
  • 32. Create Relationships Transaction tx = db.beginTx(); try { Node theDoctor = db.createNode(); theDoctor.setProperty("character", "The Doctor"); Node susan = db.createNode(); susan.setProperty("firstname", "Susan"); susan.setProperty("lastname", "Campbell"); susan.createRelationshipTo(theDoctor,DynamicRelationshipType.withName("COMPANION_OF")); tx.success(); } finally { tx.finish(); }
  • 33. Index a Graph • Graphs themselves are indexes • Can create short-cuts to well-known nodes • In program, keep a reference to any interesting node • Indexes offer flexibility in what constitutes an “interesting node”
  • 34. Lucene • The default index implementation for Neo4j – Default implementation for IndexManager • Supports many indexes per database • Each index supports nodes or relationships • Supports exact and regex-based matching • Supports scoring – Number of hits in the index for a given item – Great for recommendations
  • 35. Create a Node Index GraphDatabaseService db = … Index<Node> planets = db.index().forNodes("planets"); Type Type Indexname CreateOR retrieve
  • 36. Create a Relationship Index GraphDatabaseService db = … Index<Relationship> enemies = db.index().forRelationships("enemies"); Type Type Indexname CreateOR retrieve
  • 37. Exact Matches GraphDatabaseService db = … Index<Node> actors = doctorWhoDatabase.index().forNodes("actors"); Node rogerDelgado = actors.get("actor", "Roger Delgado“).getSingle(); Valueto match Firstmatch only Key
  • 38. Query Matches GraphDatabaseService db = … Index<Node> species = doctorWhoDatabase.index().forNodes("species"); IndexHits<Node> speciesHits = species.query("species“,"S*n"); Query Key
  • 39. Transactions to Mutate Indexes • Mutating access is still protected by transactions which cover both index and graph GraphDatabaseService db = … Transaction tx = db.beginTx(); try { Node nixon= db.createNode(); nixon("character", "Richard Nixon"); db.index().forNodes("characters").add(nixon, "character“, nixon.getProperty("character")); tx.success(); } finally { tx.finish(); }
  • 40. Auto Index lifecycle • Auto Index - stays consistent with the graph data • Specify the property name to index while creation • If node/relationship or property is removed from the graph it is removed from the index • If database started with auto indexing enabled but different auto indexed properties than the last run, then already auto-indexed entities will be deleted as they are worked upon • Re-indexing is a manual – Existing properties not indexed unless touched
  • 41. Auto Index lifecycle AutoIndexer<Node> nodeAutoIndex = graphDb.index().getNodeAutoIndexer(); nodeAutoIndex.startAutoIndexingProperty("species"); nodeAutoIndex.setEnabled( true ); ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex(); Node -> Relationship Indexes Supported
  • 42. Core API • Basic (nodes, relationships) • Fast • Imperative • Flexible - Easily intermix mutating operations
  • 43. Traversers API • Mechanisms to query graph navigating from starting node to related nodes according to algorithm to get answers • Expressive • Fast • Declarative (mostly) • Opinionated
  • 44. Cypher - A Graph Query Language • Query Language for Neo4j • A declarative graph pattern matching language – SQL for graphs – Tabular results • aggregation, ordering and limits • Mutating operations • CRUD • Easy to formulate queries based on relationships • Many features stem from improving pain points of SQL like join tables
  • 45. Cypher - A Graph Query Language
  • 48. Operations • Aggregation - COUNT, SUM, AVG, MAX, MIN, COLLECT • Where clause start doctor=node:characters(name = 'Doctor‘) match (doctor)<-[:PLAYED]-(actor)-[:APPEARED_IN]->(episode) where actor.actor = 'Tom Baker‘ and episode.title =~ /.*Dalek.*/ return episode.title • Ordering – order by <property> – order by <property> desc
  • 49. Graph Algorithms • Neo4j has built-in algorithms • Callable through JVM and REST APIs • Higher level of abstraction • Graph Matching – Look for patterns in a data set - retail analytics – Higher-level abstraction than raw traversers • REST API – Access the server • Binary protocol – JSON as default format
  • 50. Neo4j HA - High Availability Cluster • A scalability package known as high availability or HA that uses a master-slave cluster architecture – Full data redundancy – Service fault tolerance – Linear read scalability – Master-slave replication • Single data-centre or global zones – tolerance for high-latency
  • 51. Neo4j HA • Redundancy - improved uptime – automatic failover • In a Neo4j HA cluster the full graph is replicated to each instance in the cluster. • Full dataset is replicated across the entire cluster to each server • Read operations can be done locally on each slave • Read capacity of the HA cluster increases linearly with the number of servers
  • 53. HA Cluster Architecture • Cluster performs automatic master election • Supports master-slave replication for clustering and DR across sites
  • 55. Write to a Master • All write operations are co-ordinated by the master • Writes to the master are fast • Slaves eventually catch up
  • 56. Write to a Master
  • 57. Write to a Slave • Writes to a slave cause a synchronous transaction with the master • Other slaves eventually catch up
  • 58. Write to a Slave
  • 59. Server Overload Problem • Unlike other classes of NOSQL database, a graph does not have predictable lookup since it is a highly mutable structure • We want to co-locate related nodes for traversal performance, but we don’t want to place so many connected nodes on the same database that it becomes heavily loaded • The black-hole problem - popular nodes get lumped together on a single instance, but there is low point cut
  • 61. Thinly Spread Network • The opposite is also true, that we don’t want too widely connected nodes across different database instances since it will incur a substantial performance penalty at runtime as traversals cross the (relatively latent) network • Load-leveling alone can lead to many relationships crossing instances • These are very expensive to traverse, networks are many orders of magnitude slower than in-memory traversals
  • 63. Minimal Point Cut • The best approach is to balance a graph across database instances by creating a minimum point cut for a graph, where graph nodes are placed such that there are few relationships that span shards • Good strategy is to take a local view of the graph (no global locks) and work incrementally (short bursts) • Take into account use patterns • Unlike other NoSQL stores, graph s are not predictable so we can not use techniques like consistent hashing for scale out
  • 65. Cache Sharding • A strategy for large data sets of terabyte scale • Mandates consistent request routing • For instance, requests for user A are always sent to server 1, while requests for user B are always sent to server 2 and so on • The key assumption is that requests for user A typically touch parts of the graph around user A, such has his or her friends, preferences, likes and so on
  • 66. Cache Sharding • This means that the neighbourhood of the graph around user A will be cached on server 1, while the neighbourhood around user B will be cached on server 2 • By employing consistent routing of requests, the caches of all servers in the HA cluster can be utilized maximally • Strategy is highly effective for managing a large graph that does not fit in RAM
  • 67. Consistent Routing • Always try to route related requests to the same server to hopefully benefit from warm caches
  • 68. Domain Specific Sharding • No easy to shard graphs like documents or KV stores • High performance graph databases limited in terms of data set size that can be handled by a single machine • Use replicas to speed up and improve availability but limits data set size limited to a single machine’s disk/memory • No perfect algorithm exists but domain insight of expert helps
  • 69. Domain Specific Sharding • Some domains can shard easily (geo, most web apps) using consistent routing approach and cache sharding – Geo - where the connections between cities are few compared with the connections within the cities. So can place cities or countries on different nodes • Eventually (Petabytes) level data cannot be replicated practically • Need to shard data across machines
  • 70. References 1. http://www.neo4j.org 2. http://www.neo4j.org/learn/cypher 3. Bachman, Michal (2013)GraphAware -TowardsOnline Analytical Processing in Graph Databases http://graphaware.com/assets/bachman-msc-thesis.pdf 4. Hunger, Michael (2012). Cypher and Neo4j http://vimeo.com/83797381 5. Mistry, Deep Neo4j: A Developer’s Perspective http://osintegrators.com/opensoftwareintegrators%7Cneo4jadevelopersperspective 6. MapGraph:A High LevelAPI for Fast Development of High Performance GraphAnalytics on GPUs 7. Parallel Breadth First Search on GPU Clusters 8. DB-Engines Ranking of Graph DBMS
  • 71. ThankYou Check Out My LinkedIn Profile at https://in.linkedin.com/in/girishkhanzode