Is multi-model the future of
NoSQL?
Max Neunhöffer
SouthBay.NET Meetup, 5 March 2015
www.arangodb.com
Max Neunhöffer
I am a mathematician
“Earlier life”: Research in Computer Algebra
(Computational Group Theory)
Always juggle...
ArangoDB GmbH
triAGENS GmbH offers consulting services since 2004:
software architecture
project management
software develo...
Document and Key/Value Stores
Document store
A document store stores a set of documents, which usually
means JSON data, th...
Graph databases
Graph database
A graph database stores a labelled graph. Vertices and
edges can be documents. Graphs are g...
Column-oriented data stores
Column-oriented data astores
A column-oriented database stores tables but “keeps
columns toget...
Massively parallel: map-reduce and friends
The area of massively parallel
A massively parallel database can use thousands ...
Polyglot Persistence
Idea
Use the right data model for each part of a system.
For an application, persist
an object or str...
A typical Use Case — an Online Shop
We need to hold
customer data: usually homogeneous, but still variations
=⇒ use a rela...
Polyglot Persistence is nice, but . . .
Consequence: One needs multiple database systems in the persis-
tence layer of a s...
The Multi-Model Approach
Multi-model database
A multi-model database combines a document store with a
graph database and i...
Why is this possible at all?
Document stores and key/value stores
Document stores: have primary key, are key/value stores....
Why is this possible at all?
Document stores and graph databases
graph database: would like to associate arbitrary data wi...
A Map of the NoSQL Landscape
Transaction Processing DBs
Analytic processing DBs
Map/reduce
Column Stores
Extensibility
Doc...
Use case: Aircraft fleet management
One of our customers uses ArangoDB to
store each part, component, unit or aircraft as a...
Use case: Family tree management
For genealogy, the natural object is a family tree.
data naturally comes as a (directed) ...
Use case: knowledge bases
encode nearly arbitrary knowledge
often produced by machine learning
queried in very complex way...
Recently: Key/Value stores adding other models
(by Basho), originally a key/value store, adds support for
documents with t...
Recently: DataStax acquired Aurelius
In February 2015, DataStax (commercialised version of Cassan-
dra (column-oriented)),...
Recently: MongoDB 3.0 adds pluggable DB engine
is one of the most popular document stores.
In February 2015, they announce...
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient ...
Configurable consistency
ArangoDB offers
atomic and isolated CRUD operations for single documents,
transactions spanning mul...
Replication and Sharding — horizontal scalability
Right now, ArangoDB provides
easy setup of (asynchronous) replication,
w...
Powerful query language: AQL
The built in Arango Query Language AQL allows
complex, powerful and convenient queries,
with ...
Extensible through JavaScript and Foxx
The HTTP API of ArangoDB
can be extended by user-defined JavaScript code,
that is ex...
The Future of NoSQL: My Observations
I observe
2 decades ago the most versatile solutions eventually
dominated the relatio...
The Future of NoSQL: My Predictions
In 5 years time . . .
the default approach is to use a multi-model database,
the big v...
Links
https://www.arangodb.com
http://guesser.9hoeffer.de:8000
https://github.com/ArangoDB/guesser
https://github.com/triA...
Nächste SlideShare
Wird geladen in …5
×

Is multi-model the future of NoSQL?

878 Aufrufe

Veröffentlicht am

Recently a new breed of "multi-model" databases has emerged. They are a document store, a graph database and a key/value store combined in one program. Therefore they are able to cover a lot of use cases which otherwise would need multiple different database systems.

This approach promises a boost to the idea of "polyglot persistence", which has become very popular in recent years although it creates some friction in the form of data conversion and synchronisation between different systems. This is, because with a multi-model database one can enjoy the benefits of polyglot persistence without the disadvantages.

In this talk I will explain the motivation behind the multi-model approach, discuss its advantages and limitations, and will then risk to make some predictions about the NoSQL database market in five years time, which I shall only reveal during the talk.

Veröffentlicht in: Technologie
0 Kommentare
0 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Keine Downloads
Aufrufe
Aufrufe insgesamt
878
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
2
Aktionen
Geteilt
0
Downloads
16
Kommentare
0
Gefällt mir
0
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

Is multi-model the future of NoSQL?

  1. 1. Is multi-model the future of NoSQL? Max Neunhöffer SouthBay.NET Meetup, 5 March 2015 www.arangodb.com
  2. 2. Max Neunhöffer I am a mathematician “Earlier life”: Research in Computer Algebra (Computational Group Theory) Always juggled with big data Now: working in database development, NoSQL, ArangoDB I like: research, hacking, teaching, tickling the highest performance out of computer systems. 1
  3. 3. ArangoDB GmbH triAGENS GmbH offers consulting services since 2004: software architecture project management software development business analysis a lot of experience with specialised database systems have done NoSQL, before the term was coined at all 2011/2012, an idea emerged: to build the database one had wished to have all those years! development of ArangoDB as open source software since 2012 ArangoDB GmbH: spin-off to take care of ArangoDB (2014) 2
  4. 4. Document and Key/Value Stores Document store A document store stores a set of documents, which usually means JSON data, these sets are called collections. The database has access to the contents of the documents. each document in the collection has a unique key secondary indexes possible, leading to more powerful queries different documents in the same collection: structure can vary no schema is required for a collection database normalisation can be relaxed Key/value store Opaque values, only key lookup without secondary indexes: =⇒ high performance and perfect scalability 3
  5. 5. Graph databases Graph database A graph database stores a labelled graph. Vertices and edges can be documents. Graphs are good to model relations. graphs often describe data very naturally (e.g. the facebook friendship graph) graphs can be stored using tables, however, graph queries notoriously lead to expensive joins there are interesting and useful graph algorithms like “shortest path” or “neighbourhood” need a good query language to reap the benefits horizontal scalability is troublesome graph databases vary widely in scope and usage, no standard 4
  6. 6. Column-oriented data stores Column-oriented data astores A column-oriented database stores tables but “keeps columns together” rather than rows. access to a whole column is fast sparse rows are handled efficiently particularly good for certain types of data analysis often implemented in a key/value-like fashion row access can be slow columns have homogeneous data, so compression works well prominent examples: C-Store and Cassandra 5
  7. 7. Massively parallel: map-reduce and friends The area of massively parallel A massively parallel database can use thousands of servers distributed all over the world and still appears as a single service. Humongous data capacity and very high read/write performance examples are Apache Cassandra, Apache Hadoop, Google’s Spanner, Riak and others these systems have important use cases, in particular in the analytic domain query capabilities are somewhat limited like for example only “map/reduce” ⇒ good horizontal scalability at the cost of reduced query flexibility 6
  8. 8. Polyglot Persistence Idea Use the right data model for each part of a system. For an application, persist an object or structured data as a JSON document, a hash table in a key/value store, relations between objects in a graph database, a homogeneous array in a relational DBMS. If the table has many empty cells or inhomogeneous rows, use a column-oriented database. Take scalability needs into account! 7
  9. 9. A typical Use Case — an Online Shop We need to hold customer data: usually homogeneous, but still variations =⇒ use a relational DB: MySQL product data: even for a specialised business quite inhomogeneous =⇒ use a document store: shopping carts: need very fast lookup by session key =⇒ use a key/value store: order and sales data: relate customers and products =⇒ use a document store: recommendation engine data: links between different entities =⇒ use a graph database: 8
  10. 10. Polyglot Persistence is nice, but . . . Consequence: One needs multiple database systems in the persis- tence layer of a single project! Polyglot persistence introduces some friction through data synchronisation, data conversion, increased installation and administration effort, more training needs. Wouldn’t it be nice, . . . . . . to enjoy the benefits without the disadvantages? 9
  11. 11. The Multi-Model Approach Multi-model database A multi-model database combines a document store with a graph database and is at the same time a key/value store. Vertices are documents in a vertex collection, edges are documents in an edge collection. a single, common query language for all three data models is able to compete with specialised products on their turf allows for polyglot persistence using a single database queries can mix the different data models can replace a RDMBS in many cases 10
  12. 12. Why is this possible at all? Document stores and key/value stores Document stores: have primary key, are key/value stores. Without using secondary indexes, performance is nearly as good as with opaque data instead of JSON. Good horizontal scalability can be achieved for key lookups. 11
  13. 13. Why is this possible at all? Document stores and graph databases graph database: would like to associate arbitrary data with vertices and edges, so JSON documents are a good choice. A good edge index, giving fast access to neighbours. This can be a secondary index. Graph support in the query language. Implementations of graph algorithms in the DB engine. 12
  14. 14. A Map of the NoSQL Landscape Transaction Processing DBs Analytic processing DBs Map/reduce Column Stores Extensibility Documents Massively distributed Graphs Structured Data Key/Value Complex queries 13
  15. 15. Use case: Aircraft fleet management One of our customers uses ArangoDB to store each part, component, unit or aircraft as a document model containment as a graph thus can easily find all parts of some component keep track of maintenance intervals perform queries orthogonal to the graph structure thereby getting good efficiency for all needed queries 14
  16. 16. Use case: Family tree management For genealogy, the natural object is a family tree. data naturally comes as a (directed) graph many queries are traversals or shortest path but not all, for example: “all people with name James” in a family tree, sorted by birthday “all family members who studied at Berkeley”, sorted by number of children quite often, queries mixing the different models are useful 15
  17. 17. Use case: knowledge bases encode nearly arbitrary knowledge often produced by machine learning queried in very complex ways by expert systems often in connection to an inference engine need linked data with lots of associations typical queries have unpredictable path length, thus graph queries shine nevertheless, often queries orthogonal to the links are needed 16
  18. 18. Recently: Key/Value stores adding other models (by Basho), originally a key/value store, adds support for documents with their 2.0 version (late 2014) (sponsored by Pivotal), originally an in-memory key/value store, has over time added more data types and more complex operations FoundationDB (by FoundationDB) is a key/value store, but is now marketed as a multi-model database by adding additional layers on top OrientDB (by Orient Technologies) started as an object database and nowadays calls itself a multi-model database 17
  19. 19. Recently: DataStax acquired Aurelius In February 2015, DataStax (commercialised version of Cassan- dra (column-oriented)), announced the acquisition of Aurelius, the company behind TitanDB (a distributed graph database on top of Cassandra). In their own words: “Bringing Graph Database Technology To Cassandra.” “Will deliver massively scalable, always-on graph database technology.” “Will simplify the adoption of leading NoSQL technologies to support multi-model use case environments.” 18
  20. 20. Recently: MongoDB 3.0 adds pluggable DB engine is one of the most popular document stores. In February 2015, they announced their 3.0 version, to be released in March, featuring a pluggable storage engine layer transparent on-disk compression etc. This indicates their interest to support more data models than “just documents”. It will be very interesting indeed to see if and how they extend their query-language . . . 19
  21. 21. is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), memory efficient by shape detection, uses JavaScript throughout (Google’s V8 built into server), API extensible by JavaScript code in the Foxx framework, offers many drivers for a wide range of languages, is easy to use with web front end and good documentation, enjoys good professional as well as community support and has sharding since Version 2.0. 20
  22. 22. Configurable consistency ArangoDB offers atomic and isolated CRUD operations for single documents, transactions spanning multiple documents and multiple collections, snapshot semantics for complex queries, very secure durable storage using append only and storing multiple revisions, all this for documents as well as for graphs. In the near future, ArangoDB will implement complete MVCC semantics to allow for lock-free concurrent transactions and offer the same ACID semantics even with sharding. 21
  23. 23. Replication and Sharding — horizontal scalability Right now, ArangoDB provides easy setup of (asynchronous) replication, which allows read access parallelisation (master/slaves setup), sharding with automatic data distribution to multiple servers. Very soon, ArangoDB will feature fault tolerance by automatic failover and synchronous replication in cluster mode, zero administration by a self-reparing and self-balancing cluster architecture, full integration with Apache Mesos and Mesosphere. 22
  24. 24. Powerful query language: AQL The built in Arango Query Language AQL allows complex, powerful and convenient queries, with transaction semantics, allowing to do joins, with user definable functions (in JavaScript). AQL is independent of the driver used and offers protection against injections by design. For Version 2.3, we have reengineered the AQL query engine: use a C++ implementation for high performance, optimise distributed queries in the cluster. 23
  25. 25. Extensible through JavaScript and Foxx The HTTP API of ArangoDB can be extended by user-defined JavaScript code, that is executed in the DB server for high performance. This is formalised by the Foxx microservice framework, which allows to implement complex, user-defined APIs with direct access to the DB engine. Very flexible and secure authentication schemes can be implemented conveniently by the user in JavaScript. Because JavaScript runs everywhere (in the DB server as well as in the browser), one can use the same libraries in the back-end and in the front-end. =⇒ implement your own micro services 24
  26. 26. The Future of NoSQL: My Observations I observe 2 decades ago the most versatile solutions eventually dominated the relational DB market (Oracle, MySQL, PostgreSQL), the rise of the polyglot persistence idea a trend towards multi-model databases specialised products broadening their scope even relational systems add support for JSON documents devOps gaining influence (Docker phenomenon) 25
  27. 27. The Future of NoSQL: My Predictions In 5 years time . . . the default approach is to use a multi-model database, the big vendors will all add other data models, the NoSQL solutions will conquer a sizable portion of what is now dominated by the relational model, specialized products will only survive, if they find a niche. 26
  28. 28. Links https://www.arangodb.com http://guesser.9hoeffer.de:8000 https://github.com/ArangoDB/guesser https://github.com/triAGENS/ArangoDB-NET 27

×