2. • A cheeky bit of computer science
• Database architecture from 30,000ft
• Why Neo4j is graph native, and why it matters
• Quantitative performance advantages
• Finish
Overview
3.
4. Applied to data, native data formats or communication protocols are those
supported by a certain computer hardware or software, with maximal
consistency and minimal amount of additional components.
-- Wikipedia
Native: A Definition
5. Those who can imagine anything,
can create the impossible.
13. • Classic B-trees common pattern for on disk-databases
• “Index” in memory, files on leaf nodes on disk
• B+ Trees for linear scans are neat! But…
Databases Usually <3 Trees
17. • It could be tables or columns or KV or documents…
• Each database is likely very good for that model
• Evolution driven by its primary workload in its
primary market
• Any add-on doesn’t benefit from this
• Unloved
• Opportunistic (e.g. “multi model”)
• Models don’t compose easily
All Databases have a native model
21. Graph Layer
• Take existing data store
• Bolt-on Graph-like API from third-
party open source
• Declare victory
Graph Operator
• Take existing data store
• Add graph features into the query
language
• Declare victory
Two Non-Native Approaches to Graph
23. Non-Native Architectures
No Cypher!
Other DBMS
(e.g. Column Store)
Graph Layer
Graph API
Other DBMS
(e.g. Document Store)
Other QL Graph Operator
Graph Layer Graph Operator
24. Non-Native Architectures
Requires
convention
at user levelDenormalization
No Cypher!
Other DBMS
(e.g. Column Store)
Graph Layer
Graph API
Other DBMS
(e.g. Document Store)
Other QL Graph Operator
Graph Layer Graph Operator
25. Non-Native Architectures
Requires
convention
at user levelDenormalization
No Cypher!
Other DBMS
(e.g. Column Store)
Graph Layer
Graph API
Other DBMS
(e.g. Document Store)
Other QL Graph Operator
Graph Layer Graph Operator
Does not understand graphs
Cannot prevent dangling relationships/logical corruption/etc
26. • Engine and store are not designed for graphs
• Graphs are not motivating workload
• Denormalization only works to certain modest limits
• E.g. depth 3
• Operational concerns: schema rigidity, evolution
Graph Layer Drawbacks
28. • Works by convention only
• Underlying engine cannot enforce integrity
• Data structures and store formats are
designed for another job entirely
• Performance concerns
Graph Operator Drawbacks
53. • 11 million nodes
• 116 million relationships
• 20 iterations
• < 10 seconds
DBPedia
54. • Combine OLTP and OLAP in the same cluster
• Work on up-to-date data, no complex ETL,
warehousing
• Mix with graph algorithms
Neo4j is an HTAP Database
56. • Asymptotic benchmarking effort for
native graph tech
• “What Neo4j can do when it’s pushed
to its limits?”
• The results are impressive
Pushing Neo4j to the Limits
57. • Asymptotic benchmarking effort for native graph tech
• “What Neo4j can do when it’s pushed to its limits?”
• The results are impressive
Pushing Neo4j to the Limits
58. Traversals
Realistic retail dataset from Amazon
Commodity dual Xeon processor server
Social recommendation (Java procedure) equivalent to:
MATCH (you)-[:BOUGHT]->(something)<-[:BOUGHT]-(other)-[:BOUGHT]->(reco)
WHERE id(you)={id}
RETURN reco
59. • Can comfortably handle 1 trillion relationships on a single server
• 24x2TB SSDs, 33TB size on disk.
• Compiled Cypher query
• Random reads
• Sustains over 100k user transactions/sec
• Even with 99.8% page faults because of modest 512GB RAM
Read Scale
60. • Import Friendster dataset
• 1.8 billion relationships takes around 20
minutes
• That is 1M writes/second!
Write Scale
The Neo4j database fits this definition: a small number of modules each dedicated to some part of graph storage and query.
There’s no other DBMS underneath requiring translation into/out from the native world.
And that provides serious benefits to the end user.
Science time.
A reminder of algorithms and data structures
What are the properties of this list?
If you want to add something, it’s easy to just insert
If you want to find something (or find out it’s not there) it’s laborious.
It is O(1) for writes = great
It is O(N) for reads = sucky
Pop it in a box
Put an API on it
Voila, it’s a database!
Yes it’s crappy database, but for our purposes it suits. If you squint it could even be blockchain.
Works great for one client, but…
Works great for one client, but…
Conflict Free Replicated Data Type
A CRDT is a data structure that has well known merge rules.
We can write into several concurrent copies (on different servers) and merge them all later.
Great! Because we don’t care about ordering this is easy peasy and even a CRDT library can do this for us.
But this database is still awful for reads: reads get slower the more data you add.
You could even go another route and assume that you rarely read and do something like large fast ring buffers. Lots of options. But reading what you’ve written is always expensive.
Let’s try again.
Binary tree
Log (n) for reads
Log (n) for writes because you first have to do log(n) for reads
Can read anywhere in principle
Can write across the leading edge of the tree.
Contention is generally federated through structure.
When you’ve got trees then you get lots of logs
That is log(n) lookups, and m log(n) traversal speed for graphs - this model isn’t a good choice for graph workloads
Of course Neo4j has some indexes that are tree based, but most of the time we only use them to find starting points in the graph
Traversals in Neo4j are O(1)
Native graph = far fewer log(n) penalties
The linked list is great for writes, less good for reads.
B-trees strike a balance between reads and writes.
Your design and implementation choices empower you for your native model
Your design and implementation choices limit you for other use cases
Caveat emptor – buyer beware.
Models don’t compose easily
Can make documents from graphs conveniently, but not so much the other way
Non-native Sea Lamprey - costs $500k per year to control in NY state!
It’s not a native part of the ecosystem
The graph trend is enormous and outstripping all other models.
If you’re a vendor in one of the slower growing models, you need some graph *story*
Bandwagon jumping
Some vendors have spotted the enormous graph trend and are simply jumping on the bandwagon
Let’s take a look at their non-native architecture.
Achitecture
We’ve seen two approaches in the market where a non-graph vendor has tried to stretch their data structures to graph
Today most non-native graphs have their own APIs – not cypher, not open Cypher.
That excludes them from an amazing ecosystem of tools and people that is to their detriment.
I also think that Cypher is by far the best graph query language – by definition, it built on the learning of earlier languages: SQL, Gremlin, Sparql.
[Japanese Knotweed]
Graph API suffers because most of the data store is focussed on the existing data model
The data structures aren’t designed for graphs, nor are the store formats.
Graphs are a hobby, tick box, something to answer RFPs.
Graphs are not the motivating workload.
The motivating workload doesn't even have relationships and therefore the DB engine will not optimise
Upper levels try to compensate but generally only can do so for a few hops
How many hops even to traverse your data center? Or your train ride? Or your Mars mission?
Column store provides nested hashmap data structure
Hashmap-of-hashmaps
Theoretical O(1) lookups for items seems great! But O(n) in practice because of collisions and pathologically O(n2) for inserting n objects!
But is not mechanically sympathetic
Hashes distributed data to avoid clashes
But performance comes from data locality
Work at disk speed if unoptimized
Work at RAM speed if optimized
But have to denormalize
Serious imitations (e.g. up to depth 3 queries optimised only)
And then add in network latency for distributed hashring
[Himalayan Balsam]
Add a graph lookup operator to the query language
Use some conventions in the existing model to infer linkage that the new operator can use
But no native support for links means slow.
The data structures aren’t designed for graphs, nor are the store formats.
Also means you need clever workarounds and clever workarounds and you reach the limits of those workarounds quickly
Again: How many hops even to traverse your data center? Or your train ride? Or your Mars mission?
And if you disobey those conventions – no graph, and there is nothing to enforce them.
Underlying model knows nothing about links, so:
Is not that good for general purpose graphs because you can’t denormalize for all possible use cases
Deleting documents leave dangling links (document engine doesn’t have referential constraints)
More generally, user has to ensure conventions are upheld to make graph features work.
Easy to unintentionally disable graph features when other folks have only a document view of the data.
And then add network latency for all lookups
Poor performance at modest search depth, difficult governance (engine does not respect graph), poor expresivity for any reasonable graph problem
Non native serve 2 (or more domains).
Always prefer their primary domain: it’s what most of their users need.
So while there are CS and engineering considerations, there’s also the notion of doing one thing well that underpins Neo4j.
Neo4j supports graph workloads natively. From bottom to top.
It is not a document store, or a column store, it is a native graph database.
Let’s see how we do it.
For us that one thing is graphs
But graphs are useful in a variety of processing contexts.
First of all, Online Transaction Processing. OLTP. What OLTP typically means for a graph is reading or writing small part of the whole graph.
The second way we see people using graphs is for Online Analytical Processing. Analytics typically means processing much larger sections of the graph, and often, in fact, processing the whole graph. For the last few decades, the trend has been for specialist technology to handle analytic workloads - different systems, different data models maybe - and isolated from OLTP systems. Well now there’s a new trend
Frecently there’s been lots of talk about something called HTAP - *Hybrid* transactional and analytical processing. The idea is that if you could have one system that serves both workloads, you can run your analytics on up-to-date or nearly up-to-date data, so that you can respond to things faster. Also, maybe it’s just not worth the complexity of two totally different systems. What are we doing about this at Neo4j?
Since Neo4j 3.1 the cluster architecture has supported dividing the cluster into different groups. Here I’m showing 5 servers on the left that handle transaction workload, updating the graph and a read-only replica which is useful for read-heavy workloads
What this gives you is a part of the cluster that is perfect for OLAP workloads. Mostly isolated from the main transactional cluster, work over here won’t impact the transactional workload.
You can also specialize the hardware for each workload - for example use machines with more RAM or CPU cores for the LDAP workload.
How do you use this cluster?
Well you use the Neo4j Drivers to talk to all the servers in this cluster. If you’ve got OLAP workload you want it to go to just to the OLAP specialized machines you can do this just trhought configuration.
When you create a Neo4j Driver in your application, you specify a policy.
And on the servers you say what that policy means, which groups it should send queries to, and which servers are in those groups.
So that gives us our workload directed to the right servers in the cluster. I don’t really need to have two different applications.
I can can have one application doing a mixture of OLTP and OLAP and have still have the work routed to the right place
Now let’s look at the work itslef in more detail
This a model for using Neo4j:
You have an application. It sends Cypher queries. They get run by the query Engine. Which queries the graph model.
This model is great, but now we’re adding something else into the picture
Graph Alogoirithms. They’re firmly on the analytics side of things. They look at a whole graph. You run them and they lead to actions like “this transaction seems fraudulent, you should investigate” or they lead to insights like “this is the type of customer we do well selling to, we should tune our business around them”
There are two broad categories of algorithms available with 3.3. Centrality algorithms identify nodes that have significant positions in the network. Clustering algorithms are about detecting groups or clusters of nodes.
So if we want to run these graph algorithms, how do they fit into the picture?
Well we’ve packaged the algorithms as a set of procedures. This means they sit alongside the cypher query engine behind exactly the same Cypher interface
To run one of the algorithms, it’s just a call to the relevant procedure. Works just the same way as running a normal cypher query.
Now let’s have a look at one of the algorithms in more detail
I’ve picked PageRank because it’s quite well known. PageRank scores the importance of each node according to the importance of the other nodes that link to it. So it’s a kind of recursive definition
Practically what that means is that you have iterate. So cosider all the nodes and all the relationships In the graph, many times over.
As the algorithm
Efficiency for graph operations is paramount.
You don’t need huge macho clusters to do this.
I think these are incredibly useful building blocks for your next-gen systems – I’m looking forward to seeing the kind applications that get built with this stuff
On and on scalability note that Neo4j is light enough to scale down to some really interesting edge compute cases
– like Stefan Armbruster’s RasPi cluster!
But let’s dig down a bit further.
Cypher is at the heart of neo4j and we’ve heard a lot about it today.
I’d like to invite Tobias Lindaaker to the stage to talk about advances in the Cypher runtime that translate into performance advantages for you.
But now let’s reflect on what it means practically to choose graph native technology
So let’s zoom in on the lowest levels: what at the performance advantages of native graph.
But what can we do when we really push the envelope – to work the machinery as hard as possible?
Lots. Our CTO Johan decided to push the machinery to its limit and see what it can do.
Tease Johan.
User transaction means real units of work that are meaningful and valuable to the application.
Lots of traversals involved.
Not an artificial to-first-byte delivery benchmark.
Random reads are the hardest for a database to optimise so this is a truly challenging benchmark.
This is soon to be outdated – our new highly parallel importer will be far faster.
For transactional updates even on my modest laptop I can get several thousand ACID tx/sec online.
You can get so much work done so quickly with numbers like those.
You don’t have to follow me on this path though.
You take the blue pill, the story ends. You wake up in your data centre, the shoe-horning connected data into the those same DBMS systems not designed for it.
You take the red pill, and stay in graph land. And I show you how deep traversals can go in the real world.
We’re taking the red pill
I saw this on the internet and thought it looked like a neat challenge.
We had the Dbpedia dataset to hand which is comparable in size (slightly larger but from the real world, 11M nodes 116M links)
Theirs was synthetic, slightly smaller.
The original experiment ran on 288 cores with 1.5TB RAM.
Neo4j ran on a single workstation with 128GB RAM for the database in total – thanks to Michael Hunger for running the experiment.
That itself is remarkable illustration of how efficient neo4j can be. Sure it’s macho to run 6 large machines, but it’s more sensible not to.
*** Describe what’s going on *** then:
This is not really a fair comparison.
The work undertaken by the non-native store is far higher than the work undertaken by neo4j.
But that’s the whole point!
Because neo4j can optimise for graphs all the way down the stack, we can and have implemented all kinds of shortcuts that databases optimised for tables or columns or keys and values or documents can’t do.
If you saw a similar table a year ago: the Neo4j column is even faster now, in some cases 2x faster.
One more thing…
The neo4j engineering team has done some fantastic stuff in the last couple of years:
That’s a 3B nod, 18B rel graph pageranked with 20 iterations in less than 2 hours using the graph algos.
On commodity hardware.
Imagine what we can do with Cypher for Apache Spark too!
We also measure ourselves on the standard LDBC 100 benchmark:
Running since March 2016:
“SF100 Read” has improved *~2x* (~2800 tx/s --> ~5000 tx/s)
“SF100 Write” has improved *~4x* (~5000 tx/s --> ~20000 tx/s)
Just remains for me to invite you to join us for drinks