SlideShare a Scribd company logo
1 of 26
Download to read offline
A Journey From
Relational to Graph
Trials and Tribulations on the Path to Graph
Introduction
● Nakul Jeirath
● Senior security engineer at WellAware (wellaware.us)
● WellAware: Oil & gas startup building a SaaS monitoring & analytics platform
Wikipedia List of Graph DBs
https://en.wikipedia.org/wiki/Graph_database
Wikipedia List of Graph DBs
We use Titan+Cassandra
Transitioned ~2 years ago
Why Switch?
Graph model allowed modeling of well pad and derived calculations
Why Switch?
Graph model allowed modeling of well pad and derived calculations
Visualization built with http://js.cytoscape.org/
Overview
● Quick graph overview + toy example
● Our journey
○ Episode I: Development
○ Episode II: Migration
○ Episode III: Operation
Property Graph
Label:
employee
name: Nakul
Label:
company
name:
WellAware
label: works for
hired: 9/13
A Toy Example
http://coachesbythenumbers.com/sportsource-college-football-data-packages/
2005 College Football Data
● Team names & conferences
● Game record with dates and scores
● Interesting questions:
○ Records for all teams in conference X
○ Top 25 ranking using record + strength of opponents
○ Three team loop (A beat B beat C beat A)
● Source code: https://github.com/njeirath/titan-perf-tester
Toy Models
Label: team
name: Purdue
conf: Big 10
Label: team
name: IU
conf: Big 10
label: beat
date: 11/19/05
score: 41-14
Teams
team_id
conference
name
Beat
winner
loser
win_score
lose_score
SQL
Graph
Episode I: Development
SQL vs Gremlin
Developer Opinion
Example: Get Big 10 Records
SQL
SELECT win_record.NAME,
win_record.wins,
Count(l)
FROM (SELECT teams.team_id,
teams.NAME AS NAME,
Count(w) AS wins
FROM teams
JOIN beat AS w
ON teams.team_id = w.winner
WHERE conference = 'Big Ten Conference'
GROUP BY teams.NAME,
teams.team_id) AS win_record
JOIN beat AS l
ON team_id = l.loser
GROUP BY win_record.NAME,
win_record.wins
ORDER BY win_record.wins DESC;
Gremlin
g.V().order().by(__.outE().count(), decr)
.has('conference', 'Big Ten Conference')
.as('team', 'wins', 'losses')
.select('team', 'wins', 'losses')
.by('name')
.by(__.outE().count())
.by(__.inE().count())
Example: Top 25 Ranking
SQL
SELECT teams.name,
ranks.rank
FROM (SELECT beat.winner,
Sum(rec.wins) AS rank
FROM (SELECT teams.team_id,
Count(w) AS wins
FROM teams
JOIN beat AS w
ON w.winner = teams.
team_id
GROUP BY teams.team_id) AS rec
JOIN beat
ON beat.loser = rec.team_id
GROUP BY beat.winner
ORDER BY rank DESC
LIMIT 25) AS ranks
JOIN teams
ON teams.team_id = ranks.winner
ORDER BY ranks.rank DESC;
Gremlin
g.V().order().by(__.out().out().count(), decr)
.as('team', 'score', 'wins', 'losses')
.select('team', 'score', 'wins', 'losses')
.by('name')
.by(__.out().out().count())
.by(__.outE().count())
.by(__.inE().count())
.limit(25)
/r/mildlyinteresting/
1. Texas
2. USC
3. Penn State
4. Ohio State
5. Virginia Tech
6. TCU
7. West Virginia
8. Lousianna State
9. Alabama
10. Oregon
11. Louisville
12. Georgia
13. UCLA
14. Miami (FL)
1. Texas
2. USC
3. Penn State
4. Virginia Tech
5. LSU
6. Ohio State
7. Georgia
8. TCU
9. West Virginia
10. Alabama
11. Boston College
12. Oklahoma
13. Florida
14. UCLA
http://www.collegefootballpoll.com/2005_archive_computer_rankings.html
2005 End of
Season
Computer
Rankings
Our Query
Results
Developer Opinion
● ORMs
○ Move to graph, lost Django ORM
○ ORM/OGM option at the time was Totorom
● Query Language
○ Gremlin seems more intuitive
Episode II: Migration
Essentially an ETL operation:
1. Export tables (table name --> vertex label, columns --> vertex properties)
2. Export FK/Join tables (FK/Join table name --> edge label)
team_id conference name
559 Big 10 Purdue
306 Big 10 Indiana
...
winner loser win_score lose_score
559 306 41 14
...
Challenges:
● Dealing with indices
● Migrating a production DB
Challenges with Index
Relational DB indices are local per table, graph IDs are global
ID Name Teacher
1 Kyle 1
2 Stan 1
3 Kenny 1
...
ID Teacher
1 Garrison
...
student
pg_id: 1
teacher
pg_id: 1
Unique key is
Vertex label + pg_id
Migrating a Production DB
Potentially large amounts of data - batch loading optimizations
Static
Time series
Step 1: Move static
Step 2: Reroute requests and data
Step 3: Move old TS
Episode III: Operating Graph
Usual benefits of NoSQL
● Designed for scalability - built in sharding, redundancy, etc.
○ Ex: Titan pluggable with Cassandra/HBase
● Usually allows on the fly schema changes
○ Flexible migrations avoid DB downtime
Underlying DB technology requires expertise, tuning, monitoring, etc
Performance
If not considered early, OLTP performance can potentially be an issue
Consider Titan architecture:
Server
Titan JVM
Storage Backend
Gremlin evaluated
here
g.V().has('name', 'Purdue')
.out('beat')
.values('name')
Index retrieval
Edge traversal
Vertex property retrieval
Dealing with Performance
● Understand storage structures
● Understand Cassandra characteristics
○ Ex: Generally deletes are bad
● Talks on Titan+Cassandra tuning:
○ Ted Wilmes - Cassandra Summit 2015:
■ Slides: http://www.slideshare.net/twilmes/modeling-the-iot-with-titandb-and-cassandra
■ Video: https://vimeopro.com/user35188327/cassandra-summit-2015/video/143695770
○ Nakul Jeirath - Graph Day TX:
http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html
Our Approach
Lots of real-time data, tiny bit of relatively static data
Some optimization, mostly caching of static data
Heavily optimized real-time
Static
Time series
Code Optimization + caching
Model changes + code optimization
Maturity of Graph
● Query languages
○ SQL allows relatively ease of switching relational DB vendors
○ Tinkerpop for graph but not universally supported today
● Version upgrades
○ Currently on Titan 0.4.4
○ 0.4.4 --> 0.5.*: not storage compatible (require ETL to upgrade)
○ 0.4.4 --> 1.*: not storage compatible, query code rewrite
Summary
● Development
○ Gremlin easier to work with than SQL (opinion)
○ Tools for SQL more mature and varied but graph is catching up
● Migration
○ Relational --> Graph generally requires ETL
● Operation
○ NoSQL benefits of distributed, scalable, schemaless DBs
○ Performance can be an issue if not considered early
○ Graph vendor/version coupling but will improve with maturity
Thanks For Watching
Questions
Nakul Jeirath
@njeirath
Senior Security Engineer - WellAware

More Related Content

Viewers also liked

Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
Chicago Hadoop Users Group
 
Gephi Tutorial Visualization
Gephi Tutorial VisualizationGephi Tutorial Visualization
Gephi Tutorial Visualization
Gephi Consortium
 

Viewers also liked (10)

Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Gephi Tutorial Visualization
Gephi Tutorial VisualizationGephi Tutorial Visualization
Gephi Tutorial Visualization
 
Gephi Quick Start
Gephi Quick StartGephi Quick Start
Gephi Quick Start
 
Webinar: RDBMS to Graphs
Webinar: RDBMS to GraphsWebinar: RDBMS to Graphs
Webinar: RDBMS to Graphs
 

Similar to A Journey from Relational to Graph

Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013
Valeriy Kravchuk
 
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
yishengxi
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
n5712036
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
Jerome Eteve
 

Similar to A Journey from Relational to Graph (20)

Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandra
 
Deeplearning in production
Deeplearning in productionDeeplearning in production
Deeplearning in production
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Javascript & SQL within database management system
Javascript & SQL within database management systemJavascript & SQL within database management system
Javascript & SQL within database management system
 
Recent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future PrestoRecent Changes and Challenges for Future Presto
Recent Changes and Challenges for Future Presto
 
Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time Machine
 
Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
 
Machine learning using TensorFlow on DSX
Machine learning using TensorFlow on DSX Machine learning using TensorFlow on DSX
Machine learning using TensorFlow on DSX
 
Titan and Cassandra at WellAware
Titan and Cassandra at WellAwareTitan and Cassandra at WellAware
Titan and Cassandra at WellAware
 
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
 
R and data mining
R and data miningR and data mining
R and data mining
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
Streaming SQL for Data Engineers: The Next Big Thing? With Yaroslav Tkachenko...
Streaming SQL for Data Engineers: The Next Big Thing? With Yaroslav Tkachenko...Streaming SQL for Data Engineers: The Next Big Thing? With Yaroslav Tkachenko...
Streaming SQL for Data Engineers: The Next Big Thing? With Yaroslav Tkachenko...
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 
Bogdan Kecman Advanced Databasing
Bogdan Kecman Advanced DatabasingBogdan Kecman Advanced Databasing
Bogdan Kecman Advanced Databasing
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

A Journey from Relational to Graph

  • 1. A Journey From Relational to Graph Trials and Tribulations on the Path to Graph
  • 2. Introduction ● Nakul Jeirath ● Senior security engineer at WellAware (wellaware.us) ● WellAware: Oil & gas startup building a SaaS monitoring & analytics platform
  • 3. Wikipedia List of Graph DBs https://en.wikipedia.org/wiki/Graph_database
  • 4. Wikipedia List of Graph DBs We use Titan+Cassandra
  • 6. Why Switch? Graph model allowed modeling of well pad and derived calculations
  • 7. Why Switch? Graph model allowed modeling of well pad and derived calculations Visualization built with http://js.cytoscape.org/
  • 8. Overview ● Quick graph overview + toy example ● Our journey ○ Episode I: Development ○ Episode II: Migration ○ Episode III: Operation
  • 10. A Toy Example http://coachesbythenumbers.com/sportsource-college-football-data-packages/ 2005 College Football Data ● Team names & conferences ● Game record with dates and scores ● Interesting questions: ○ Records for all teams in conference X ○ Top 25 ranking using record + strength of opponents ○ Three team loop (A beat B beat C beat A) ● Source code: https://github.com/njeirath/titan-perf-tester
  • 11. Toy Models Label: team name: Purdue conf: Big 10 Label: team name: IU conf: Big 10 label: beat date: 11/19/05 score: 41-14 Teams team_id conference name Beat winner loser win_score lose_score SQL Graph
  • 12. Episode I: Development SQL vs Gremlin Developer Opinion
  • 13. Example: Get Big 10 Records SQL SELECT win_record.NAME, win_record.wins, Count(l) FROM (SELECT teams.team_id, teams.NAME AS NAME, Count(w) AS wins FROM teams JOIN beat AS w ON teams.team_id = w.winner WHERE conference = 'Big Ten Conference' GROUP BY teams.NAME, teams.team_id) AS win_record JOIN beat AS l ON team_id = l.loser GROUP BY win_record.NAME, win_record.wins ORDER BY win_record.wins DESC; Gremlin g.V().order().by(__.outE().count(), decr) .has('conference', 'Big Ten Conference') .as('team', 'wins', 'losses') .select('team', 'wins', 'losses') .by('name') .by(__.outE().count()) .by(__.inE().count())
  • 14. Example: Top 25 Ranking SQL SELECT teams.name, ranks.rank FROM (SELECT beat.winner, Sum(rec.wins) AS rank FROM (SELECT teams.team_id, Count(w) AS wins FROM teams JOIN beat AS w ON w.winner = teams. team_id GROUP BY teams.team_id) AS rec JOIN beat ON beat.loser = rec.team_id GROUP BY beat.winner ORDER BY rank DESC LIMIT 25) AS ranks JOIN teams ON teams.team_id = ranks.winner ORDER BY ranks.rank DESC; Gremlin g.V().order().by(__.out().out().count(), decr) .as('team', 'score', 'wins', 'losses') .select('team', 'score', 'wins', 'losses') .by('name') .by(__.out().out().count()) .by(__.outE().count()) .by(__.inE().count()) .limit(25)
  • 15. /r/mildlyinteresting/ 1. Texas 2. USC 3. Penn State 4. Ohio State 5. Virginia Tech 6. TCU 7. West Virginia 8. Lousianna State 9. Alabama 10. Oregon 11. Louisville 12. Georgia 13. UCLA 14. Miami (FL) 1. Texas 2. USC 3. Penn State 4. Virginia Tech 5. LSU 6. Ohio State 7. Georgia 8. TCU 9. West Virginia 10. Alabama 11. Boston College 12. Oklahoma 13. Florida 14. UCLA http://www.collegefootballpoll.com/2005_archive_computer_rankings.html 2005 End of Season Computer Rankings Our Query Results
  • 16. Developer Opinion ● ORMs ○ Move to graph, lost Django ORM ○ ORM/OGM option at the time was Totorom ● Query Language ○ Gremlin seems more intuitive
  • 17. Episode II: Migration Essentially an ETL operation: 1. Export tables (table name --> vertex label, columns --> vertex properties) 2. Export FK/Join tables (FK/Join table name --> edge label) team_id conference name 559 Big 10 Purdue 306 Big 10 Indiana ... winner loser win_score lose_score 559 306 41 14 ... Challenges: ● Dealing with indices ● Migrating a production DB
  • 18. Challenges with Index Relational DB indices are local per table, graph IDs are global ID Name Teacher 1 Kyle 1 2 Stan 1 3 Kenny 1 ... ID Teacher 1 Garrison ... student pg_id: 1 teacher pg_id: 1 Unique key is Vertex label + pg_id
  • 19. Migrating a Production DB Potentially large amounts of data - batch loading optimizations Static Time series Step 1: Move static Step 2: Reroute requests and data Step 3: Move old TS
  • 20. Episode III: Operating Graph Usual benefits of NoSQL ● Designed for scalability - built in sharding, redundancy, etc. ○ Ex: Titan pluggable with Cassandra/HBase ● Usually allows on the fly schema changes ○ Flexible migrations avoid DB downtime Underlying DB technology requires expertise, tuning, monitoring, etc
  • 21. Performance If not considered early, OLTP performance can potentially be an issue Consider Titan architecture: Server Titan JVM Storage Backend Gremlin evaluated here g.V().has('name', 'Purdue') .out('beat') .values('name') Index retrieval Edge traversal Vertex property retrieval
  • 22. Dealing with Performance ● Understand storage structures ● Understand Cassandra characteristics ○ Ex: Generally deletes are bad ● Talks on Titan+Cassandra tuning: ○ Ted Wilmes - Cassandra Summit 2015: ■ Slides: http://www.slideshare.net/twilmes/modeling-the-iot-with-titandb-and-cassandra ■ Video: https://vimeopro.com/user35188327/cassandra-summit-2015/video/143695770 ○ Nakul Jeirath - Graph Day TX: http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html
  • 23. Our Approach Lots of real-time data, tiny bit of relatively static data Some optimization, mostly caching of static data Heavily optimized real-time Static Time series Code Optimization + caching Model changes + code optimization
  • 24. Maturity of Graph ● Query languages ○ SQL allows relatively ease of switching relational DB vendors ○ Tinkerpop for graph but not universally supported today ● Version upgrades ○ Currently on Titan 0.4.4 ○ 0.4.4 --> 0.5.*: not storage compatible (require ETL to upgrade) ○ 0.4.4 --> 1.*: not storage compatible, query code rewrite
  • 25. Summary ● Development ○ Gremlin easier to work with than SQL (opinion) ○ Tools for SQL more mature and varied but graph is catching up ● Migration ○ Relational --> Graph generally requires ETL ● Operation ○ NoSQL benefits of distributed, scalable, schemaless DBs ○ Performance can be an issue if not considered early ○ Graph vendor/version coupling but will improve with maturity
  • 26. Thanks For Watching Questions Nakul Jeirath @njeirath Senior Security Engineer - WellAware