6. KEY VALUE STORES
Data Model:
Global key-value mapping
Big scalable HashMap
Highly fault tolerant (typically)
Examples:
Redis, Riak, Voldemort. Dynamo
7. KEY VALUE STORES: PROS AND
CONS
Pros:
Simple data model
Scalable
Cons
Create your own “foreign keys”
Poor for complex data
8. COLUMN FAMILY
Main idea is based on BigTable: Google’s
distributed storage model for Structured Data
Data Model:
A big table, with column families
Map Reduce for querying/processing
Examples:
HBase, HyperTable, Cassandra
9. COLUMN FAMILY: PROS AND CONS
Pros:
Supports Semi-Structured Data
Naturally Indexed (columns)
Scalable
Cons
Poor for interconnected data
10. DOCUMENT DATABASES
Data Model:
A collection of documents
A document is a key value collection
Index-centric, uses map-reduce extensively
Examples:
CouchDB, MongoDB
11. DOCUMENT DATABASES: PROS AND
CONS
Pros:
Simple, powerful data model
Scalable
Cons
Poor for interconnected data
Query model limited to keys and indexes
Map reduce for larger queries
13. GRAPH DATABASES: PROS AND
CONS
Pros:
Powerful data model, as general as RDBMS
Connected data locally indexed
Easy to query
Cons
Sharding
Requires different data modelling
14. RDBMS
LIVING IN A NOSQL WORLD
Complexity
BigTable
Clones
Size
Key-Value
Store
Document
Databases
Graph
Databases
90% of
Use Cases
Relational
Databases
9,223,372,036,854,775,807
15. WHAT IS A GRAPH?
An abstract representation of a set of objects where
some pairs are connected by links.
Object (Vertex, Node)
Link (Edge, Arc,
Relationship)
16. WHAT IS A GRAPH DATABASE?
A database with an explicit graph structure
Each node knows its adjacent nodes through edges
As the number of nodes increases, the cost of a local
step (or hop) remains the same plus an Index for
lookups
17. APACHE TINKERPOP: A UNIFIED API
Dealing with such
complex databases,
requires a well-
implemented API by the
vendor. But using a
vendor specific API,
makes migrating to
another database
impossible.
The solution is provided
by Apache Tinkerpop.
18. WHAT IS APACHE TINKERPOP?
● A Graph processing system
● Currently under Apache incubation ( 2015 )
● Has Tinkerpop3 Structure API
● Graph, Element, Property
● Has Tinkerpop3 Process API
● TraversalSource, GraphComputer
● Gremlin query language
● A scripting language for graph traversal and mutation
● REST API
19. WHY APACHE TINKERPOP?
Tinkerpop is a generic API for graph databases
Think ODBC, JDBC or Hibernate for relational
databases
Integrates with:
Titan DB
Neo4j
Orient DB
And many more.
Uses Gremlin graph scripting language
20. TITAN DATABASE
Titan is a scalable graph database using Tinkerpop
APIs optimized for storing and querying graphs
containing hundreds of billions of vertices and edges
distributed across a multi-machine cluster.
Supports Apache Spark and Hadoop (implicitly) for
map-reduce operations.
Integrates with:
Elasticsearch, Solr, Lucene
Uses as a backend storage:
Apache Cassandra
Apache Hbase
21. PUTTING IT ALL TOGETHER
Apache Tinkerpop API
Gremlin server Graph traversal Gremlin client Monitoring
Titan DB
Storage specific (Cassandra, HBase, BerkeleyDB)
22. TITAN: EXAMPLE
Download titan server and console here
https://github.com/thinkaurelius/titan/wiki/Downloads
$ cd titan-1.0.0-hadoop1
$ bin/gremlin.sh
gremlin> graph=TitanFactory.open(“conf/titan-berkely-
es.properties”)
gremlin> g=GraphOfGodsFactory.load(graph).traversal()
25. SUMMARY
Graph databases are the solution for highly scalable
semi-structured connected data.
Apache Tinkerpop is a generic API for graph databases
to avoid DB vendor specific business logic code.
Titan DB is a scalable distributed graph database on
top of several other databases. It uses BerkeleyDB,
HBase or BerkeleyDB as an end storage. This helps the
database to be as linear or scalable you want it to be.