SlideShare ist ein Scribd-Unternehmen logo
1 von 27
GRAPH DATABASES: THE
SOLUTION FOR STORING
SEMI-STRUCTURED BIG DATA
Mohamed
Taher
Alrefaie
DATA IS
GETTING
BIGGER“Every two days, we
create as much
information as we
did us to 2003”. Eric
Schmidt, former
Google CEO, 2010.
DATA IS
MORE
CONNECTEDHaving a look at the
following proves it:
- Facebook Graph
- LinkedIn Graph
- Linked Data
- Blogs/Tagging
DATA IS LESS STRUCTURED
Modelling FB
Graph?
Persons,
friendships,
photos, locations,
apps, pages, ads,
interests, age
range, etc.
NOSQL DATABASES
Four types of
databases that
alleviate the
performance
issues of
relational
databases
KEY VALUE STORES
Data Model:
 Global key-value mapping
 Big scalable HashMap
 Highly fault tolerant (typically)
Examples:
 Redis, Riak, Voldemort. Dynamo
KEY VALUE STORES: PROS AND
CONS
Pros:
Simple data model
Scalable
Cons
Create your own “foreign keys”
Poor for complex data
COLUMN FAMILY
Main idea is based on BigTable: Google’s
distributed storage model for Structured Data
Data Model:
A big table, with column families
Map Reduce for querying/processing
Examples:
 HBase, HyperTable, Cassandra
COLUMN FAMILY: PROS AND CONS
Pros:
Supports Semi-Structured Data
Naturally Indexed (columns)
Scalable
Cons
Poor for interconnected data
DOCUMENT DATABASES
Data Model:
A collection of documents
A document is a key value collection
Index-centric, uses map-reduce extensively
Examples:
 CouchDB, MongoDB
DOCUMENT DATABASES: PROS AND
CONS
Pros:
Simple, powerful data model
Scalable
Cons
Poor for interconnected data
Query model limited to keys and indexes
Map reduce for larger queries
GRAPH DATABASES
Data Model:
Nodes and Relationships
Examples:
 Titan, Neo4j, OrientDB, etc.
GRAPH DATABASES: PROS AND
CONS
Pros:
Powerful data model, as general as RDBMS
Connected data locally indexed
Easy to query
Cons
Sharding
Requires different data modelling
RDBMS
LIVING IN A NOSQL WORLD
Complexity
BigTable
Clones
Size
Key-Value
Store
Document
Databases
Graph
Databases
90% of
Use Cases
Relational
Databases
9,223,372,036,854,775,807
WHAT IS A GRAPH?
An abstract representation of a set of objects where
some pairs are connected by links.
Object (Vertex, Node)
Link (Edge, Arc,
Relationship)
WHAT IS A GRAPH DATABASE?
A database with an explicit graph structure
Each node knows its adjacent nodes through edges
As the number of nodes increases, the cost of a local
step (or hop) remains the same plus an Index for
lookups
APACHE TINKERPOP: A UNIFIED API
Dealing with such
complex databases,
requires a well-
implemented API by the
vendor. But using a
vendor specific API,
makes migrating to
another database
impossible.
The solution is provided
by Apache Tinkerpop.
WHAT IS APACHE TINKERPOP?
● A Graph processing system
● Currently under Apache incubation ( 2015 )
● Has Tinkerpop3 Structure API
● Graph, Element, Property
● Has Tinkerpop3 Process API
● TraversalSource, GraphComputer
● Gremlin query language
● A scripting language for graph traversal and mutation
● REST API
WHY APACHE TINKERPOP?
Tinkerpop is a generic API for graph databases
Think ODBC, JDBC or Hibernate for relational
databases
Integrates with:
Titan DB
Neo4j
Orient DB
And many more.
Uses Gremlin graph scripting language
TITAN DATABASE
Titan is a scalable graph database using Tinkerpop
APIs optimized for storing and querying graphs
containing hundreds of billions of vertices and edges
distributed across a multi-machine cluster.
Supports Apache Spark and Hadoop (implicitly) for
map-reduce operations.
Integrates with:
 Elasticsearch, Solr, Lucene
Uses as a backend storage:
 Apache Cassandra
 Apache Hbase
PUTTING IT ALL TOGETHER
Apache Tinkerpop API
Gremlin server Graph traversal Gremlin client Monitoring
Titan DB
Storage specific (Cassandra, HBase, BerkeleyDB)
TITAN: EXAMPLE
Download titan server and console here
 https://github.com/thinkaurelius/titan/wiki/Downloads
$ cd titan-1.0.0-hadoop1
$ bin/gremlin.sh
gremlin> graph=TitanFactory.open(“conf/titan-berkely-
es.properties”)
gremlin> g=GraphOfGodsFactory.load(graph).traversal()
TINKERPOP: EXAMPLE
Graph g = TinkerGraph.open(); (1)
Vertex marko = g.addVertex(Element.ID, 1, "name", "marko", "age", 29); (2)
Vertex vadas = g.addVertex(Element.ID, 2, "name", "vadas", "age", 27);
Vertex lop = g.addVertex(Element.ID, 3, "name", "lop", "lang", "java");
Vertex josh = g.addVertex(Element.ID, 4, "name", "josh", "age", 32);
Vertex ripple = g.addVertex(Element.ID, 5, "name", "ripple", "lang", "java");
Vertex peter = g.addVertex(Element.ID, 6, "name", "peter", "age", 35);
marko.addEdge("knows", vadas, Element.ID, 7, "weight", 0.5f); (3)
marko.addEdge("knows", josh, Element.ID, 8, "weight", 1.0f);
marko.addEdge("created", lop, Element.ID, 9, "weight", 0.4f);
josh.addEdge("created", ripple, Element.ID, 10, "weight", 1.0f);
josh.addEdge("created", lop, Element.ID, 11, "weight", 0.4f);
peter.addEdge("created", lop, Element.ID, 12, "weight", 0.2f);
TINKERPOP: EXAMPLE (CONT.)
gremlin> g.V().has('name','marko')
.out('knows')
.values('name') (3)
==>vadas
==>josh
SUMMARY
Graph databases are the solution for highly scalable
semi-structured connected data.
Apache Tinkerpop is a generic API for graph databases
to avoid DB vendor specific business logic code.
Titan DB is a scalable distributed graph database on
top of several other databases. It uses BerkeleyDB,
HBase or BerkeleyDB as an end storage. This helps the
database to be as linear or scalable you want it to be.
REFERENCES
http://www.slideshare.net/maxdemarzi/introduction-to-graph-
databases-12735789
http://www.slideshare.net/mikejf12/an-introduction-to-apache-
tinkerpop
http://www.tinkerpop.com
http://tinkerpop.incubator.apache.org
http://tinkerpop.incubator.apache.org/docs/3.0.0.M9-
incubating/#gremlin-console
http://www.titandb.io
MOHAMED TAHER
ALREFAIE
07/12/2015

Weitere ähnliche Inhalte

Was ist angesagt?

Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
 

Was ist angesagt? (20)

Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with Spark
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit East
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
 

Andere mochten auch

Using a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMUsing a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDM
Neo4j
 

Andere mochten auch (20)

Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and GremlinIntro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
 
Adding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and FaunusAdding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and Faunus
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming Language
 
Analysis on Cloud Computing Database
Analysis on Cloud Computing DatabaseAnalysis on Cloud Computing Database
Analysis on Cloud Computing Database
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
The Path Forward
The Path ForwardThe Path Forward
The Path Forward
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
 
Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandra
 
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
 
Big Graph Data
Big Graph DataBig Graph Data
Big Graph Data
 
Graph db
Graph dbGraph db
Graph db
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databases
 
Graph Databases for Master Data Management
Graph Databases for Master Data ManagementGraph Databases for Master Data Management
Graph Databases for Master Data Management
 
Using a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDMUsing a Graph Database for Next-Gen MDM
Using a Graph Database for Next-Gen MDM
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with Graphs
 

Ähnlich wie Graph databases: Tinkerpop and Titan DB

DynamoDB Gluecon 2012
DynamoDB Gluecon 2012DynamoDB Gluecon 2012
DynamoDB Gluecon 2012
Appirio
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Update
vithakur
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 

Ähnlich wie Graph databases: Tinkerpop and Titan DB (20)

Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
DynamoDB Gluecon 2012
DynamoDB Gluecon 2012DynamoDB Gluecon 2012
DynamoDB Gluecon 2012
 
Gluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDBGluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDB
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Graph database in sv meetup
Graph database in sv meetupGraph database in sv meetup
Graph database in sv meetup
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Functional programming
 for optimization problems 
in Big Data
Functional programming
  for optimization problems 
in Big DataFunctional programming
  for optimization problems 
in Big Data
Functional programming
 for optimization problems 
in Big Data
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Update
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster Innards
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
GraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph DatabasesGraphTech Ecosystem - part 1: Graph Databases
GraphTech Ecosystem - part 1: Graph Databases
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 

Kürzlich hochgeladen

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 

Kürzlich hochgeladen (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 

Graph databases: Tinkerpop and Titan DB

  • 1. GRAPH DATABASES: THE SOLUTION FOR STORING SEMI-STRUCTURED BIG DATA Mohamed Taher Alrefaie
  • 2. DATA IS GETTING BIGGER“Every two days, we create as much information as we did us to 2003”. Eric Schmidt, former Google CEO, 2010.
  • 3. DATA IS MORE CONNECTEDHaving a look at the following proves it: - Facebook Graph - LinkedIn Graph - Linked Data - Blogs/Tagging
  • 4. DATA IS LESS STRUCTURED Modelling FB Graph? Persons, friendships, photos, locations, apps, pages, ads, interests, age range, etc.
  • 5. NOSQL DATABASES Four types of databases that alleviate the performance issues of relational databases
  • 6. KEY VALUE STORES Data Model:  Global key-value mapping  Big scalable HashMap  Highly fault tolerant (typically) Examples:  Redis, Riak, Voldemort. Dynamo
  • 7. KEY VALUE STORES: PROS AND CONS Pros: Simple data model Scalable Cons Create your own “foreign keys” Poor for complex data
  • 8. COLUMN FAMILY Main idea is based on BigTable: Google’s distributed storage model for Structured Data Data Model: A big table, with column families Map Reduce for querying/processing Examples:  HBase, HyperTable, Cassandra
  • 9. COLUMN FAMILY: PROS AND CONS Pros: Supports Semi-Structured Data Naturally Indexed (columns) Scalable Cons Poor for interconnected data
  • 10. DOCUMENT DATABASES Data Model: A collection of documents A document is a key value collection Index-centric, uses map-reduce extensively Examples:  CouchDB, MongoDB
  • 11. DOCUMENT DATABASES: PROS AND CONS Pros: Simple, powerful data model Scalable Cons Poor for interconnected data Query model limited to keys and indexes Map reduce for larger queries
  • 12. GRAPH DATABASES Data Model: Nodes and Relationships Examples:  Titan, Neo4j, OrientDB, etc.
  • 13. GRAPH DATABASES: PROS AND CONS Pros: Powerful data model, as general as RDBMS Connected data locally indexed Easy to query Cons Sharding Requires different data modelling
  • 14. RDBMS LIVING IN A NOSQL WORLD Complexity BigTable Clones Size Key-Value Store Document Databases Graph Databases 90% of Use Cases Relational Databases 9,223,372,036,854,775,807
  • 15. WHAT IS A GRAPH? An abstract representation of a set of objects where some pairs are connected by links. Object (Vertex, Node) Link (Edge, Arc, Relationship)
  • 16. WHAT IS A GRAPH DATABASE? A database with an explicit graph structure Each node knows its adjacent nodes through edges As the number of nodes increases, the cost of a local step (or hop) remains the same plus an Index for lookups
  • 17. APACHE TINKERPOP: A UNIFIED API Dealing with such complex databases, requires a well- implemented API by the vendor. But using a vendor specific API, makes migrating to another database impossible. The solution is provided by Apache Tinkerpop.
  • 18. WHAT IS APACHE TINKERPOP? ● A Graph processing system ● Currently under Apache incubation ( 2015 ) ● Has Tinkerpop3 Structure API ● Graph, Element, Property ● Has Tinkerpop3 Process API ● TraversalSource, GraphComputer ● Gremlin query language ● A scripting language for graph traversal and mutation ● REST API
  • 19. WHY APACHE TINKERPOP? Tinkerpop is a generic API for graph databases Think ODBC, JDBC or Hibernate for relational databases Integrates with: Titan DB Neo4j Orient DB And many more. Uses Gremlin graph scripting language
  • 20. TITAN DATABASE Titan is a scalable graph database using Tinkerpop APIs optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Supports Apache Spark and Hadoop (implicitly) for map-reduce operations. Integrates with:  Elasticsearch, Solr, Lucene Uses as a backend storage:  Apache Cassandra  Apache Hbase
  • 21. PUTTING IT ALL TOGETHER Apache Tinkerpop API Gremlin server Graph traversal Gremlin client Monitoring Titan DB Storage specific (Cassandra, HBase, BerkeleyDB)
  • 22. TITAN: EXAMPLE Download titan server and console here  https://github.com/thinkaurelius/titan/wiki/Downloads $ cd titan-1.0.0-hadoop1 $ bin/gremlin.sh gremlin> graph=TitanFactory.open(“conf/titan-berkely- es.properties”) gremlin> g=GraphOfGodsFactory.load(graph).traversal()
  • 23. TINKERPOP: EXAMPLE Graph g = TinkerGraph.open(); (1) Vertex marko = g.addVertex(Element.ID, 1, "name", "marko", "age", 29); (2) Vertex vadas = g.addVertex(Element.ID, 2, "name", "vadas", "age", 27); Vertex lop = g.addVertex(Element.ID, 3, "name", "lop", "lang", "java"); Vertex josh = g.addVertex(Element.ID, 4, "name", "josh", "age", 32); Vertex ripple = g.addVertex(Element.ID, 5, "name", "ripple", "lang", "java"); Vertex peter = g.addVertex(Element.ID, 6, "name", "peter", "age", 35); marko.addEdge("knows", vadas, Element.ID, 7, "weight", 0.5f); (3) marko.addEdge("knows", josh, Element.ID, 8, "weight", 1.0f); marko.addEdge("created", lop, Element.ID, 9, "weight", 0.4f); josh.addEdge("created", ripple, Element.ID, 10, "weight", 1.0f); josh.addEdge("created", lop, Element.ID, 11, "weight", 0.4f); peter.addEdge("created", lop, Element.ID, 12, "weight", 0.2f);
  • 24. TINKERPOP: EXAMPLE (CONT.) gremlin> g.V().has('name','marko') .out('knows') .values('name') (3) ==>vadas ==>josh
  • 25. SUMMARY Graph databases are the solution for highly scalable semi-structured connected data. Apache Tinkerpop is a generic API for graph databases to avoid DB vendor specific business logic code. Titan DB is a scalable distributed graph database on top of several other databases. It uses BerkeleyDB, HBase or BerkeleyDB as an end storage. This helps the database to be as linear or scalable you want it to be.