A Journey from Relational to Graph

A Journey From
Relational to Graph
Trials and Tribulations on the Path to Graph

Introduction
● Nakul Jeirath
● Senior security engineer at WellAware (wellaware.us)
● WellAware: Oil & gas startup building a SaaS monitoring & analytics platform

Wikipedia List of Graph DBs
https://en.wikipedia.org/wiki/Graph_database

Wikipedia List of Graph DBs
We use Titan+Cassandra

Why Switch?
Graph model allowed modeling of well pad and derived calculations

Why Switch?
Graph model allowed modeling of well pad and derived calculations
Visualization built with http://js.cytoscape.org/

Overview
● Quick graph overview + toy example
● Our journey
○ Episode I: Development
○ Episode II: Migration
○ Episode III: Operation

Property Graph
Label:
employee
name: Nakul
Label:
company
name:
WellAware
label: works for
hired: 9/13

A Toy Example
http://coachesbythenumbers.com/sportsource-college-football-data-packages/
2005 College Football Data
● Team names & conferences
● Game record with dates and scores
● Interesting questions:
○ Records for all teams in conference X
○ Top 25 ranking using record + strength of opponents
○ Three team loop (A beat B beat C beat A)
● Source code: https://github.com/njeirath/titan-perf-tester

Toy Models
Label: team
name: Purdue
conf: Big 10
Label: team
name: IU
conf: Big 10
label: beat
date: 11/19/05
score: 41-14
Teams
team_id
conference
name
Beat
winner
loser
win_score
lose_score
SQL
Graph

Episode I: Development
SQL vs Gremlin
Developer Opinion

Example: Get Big 10 Records
SQL
SELECT win_record.NAME,
win_record.wins,
Count(l)
FROM (SELECT teams.team_id,
teams.NAME AS NAME,
Count(w) AS wins
FROM teams
JOIN beat AS w
ON teams.team_id = w.winner
WHERE conference = 'Big Ten Conference'
GROUP BY teams.NAME,
teams.team_id) AS win_record
JOIN beat AS l
ON team_id = l.loser
GROUP BY win_record.NAME,
win_record.wins
ORDER BY win_record.wins DESC;
Gremlin
g.V().order().by(__.outE().count(), decr)
.has('conference', 'Big Ten Conference')
.as('team', 'wins', 'losses')
.select('team', 'wins', 'losses')
.by('name')
.by(__.outE().count())
.by(__.inE().count())

Example: Top 25 Ranking
SQL
SELECT teams.name,
ranks.rank
FROM (SELECT beat.winner,
Sum(rec.wins) AS rank
FROM (SELECT teams.team_id,
Count(w) AS wins
FROM teams
JOIN beat AS w
ON w.winner = teams.
team_id
GROUP BY teams.team_id) AS rec
JOIN beat
ON beat.loser = rec.team_id
GROUP BY beat.winner
ORDER BY rank DESC
LIMIT 25) AS ranks
JOIN teams
ON teams.team_id = ranks.winner
ORDER BY ranks.rank DESC;
Gremlin
g.V().order().by(__.out().out().count(), decr)
.as('team', 'score', 'wins', 'losses')
.select('team', 'score', 'wins', 'losses')
.by('name')
.by(__.out().out().count())
.by(__.outE().count())
.by(__.inE().count())
.limit(25)

/r/mildlyinteresting/
1. Texas
2. USC
3. Penn State
4. Ohio State
5. Virginia Tech
6. TCU
7. West Virginia
8. Lousianna State
9. Alabama
10. Oregon
11. Louisville
12. Georgia
13. UCLA
14. Miami (FL)
1. Texas
2. USC
3. Penn State
4. Virginia Tech
5. LSU
6. Ohio State
7. Georgia
8. TCU
9. West Virginia
10. Alabama
11. Boston College
12. Oklahoma
13. Florida
14. UCLA
http://www.collegefootballpoll.com/2005_archive_computer_rankings.html
2005 End of
Season
Computer
Rankings
Our Query
Results

Developer Opinion
● ORMs
○ Move to graph, lost Django ORM
○ ORM/OGM option at the time was Totorom
● Query Language
○ Gremlin seems more intuitive

Episode II: Migration
Essentially an ETL operation:
1. Export tables (table name --> vertex label, columns --> vertex properties)
2. Export FK/Join tables (FK/Join table name --> edge label)
team_id conference name
559 Big 10 Purdue
306 Big 10 Indiana
...
winner loser win_score lose_score
559 306 41 14
...
Challenges:
● Dealing with indices
● Migrating a production DB

Challenges with Index
Relational DB indices are local per table, graph IDs are global
ID Name Teacher
1 Kyle 1
2 Stan 1
3 Kenny 1
...
ID Teacher
1 Garrison
...
student
pg_id: 1
teacher
pg_id: 1
Unique key is
Vertex label + pg_id

Migrating a Production DB
Potentially large amounts of data - batch loading optimizations
Static
Time series
Step 1: Move static
Step 2: Reroute requests and data
Step 3: Move old TS

Episode III: Operating Graph
Usual benefits of NoSQL
● Designed for scalability - built in sharding, redundancy, etc.
○ Ex: Titan pluggable with Cassandra/HBase
● Usually allows on the fly schema changes
○ Flexible migrations avoid DB downtime
Underlying DB technology requires expertise, tuning, monitoring, etc

Performance
If not considered early, OLTP performance can potentially be an issue
Consider Titan architecture:
Server
Titan JVM
Storage Backend
Gremlin evaluated
here
g.V().has('name', 'Purdue')
.out('beat')
.values('name')
Index retrieval
Edge traversal
Vertex property retrieval

Dealing with Performance
● Understand storage structures
● Understand Cassandra characteristics
○ Ex: Generally deletes are bad
● Talks on Titan+Cassandra tuning:
○ Ted Wilmes - Cassandra Summit 2015:
■ Slides: http://www.slideshare.net/twilmes/modeling-the-iot-with-titandb-and-cassandra
■ Video: https://vimeopro.com/user35188327/cassandra-summit-2015/video/143695770
○ Nakul Jeirath - Graph Day TX:
http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html

Our Approach
Lots of real-time data, tiny bit of relatively static data
Some optimization, mostly caching of static data
Heavily optimized real-time
Static
Time series
Code Optimization + caching
Model changes + code optimization

Maturity of Graph
● Query languages
○ SQL allows relatively ease of switching relational DB vendors
○ Tinkerpop for graph but not universally supported today
● Version upgrades
○ Currently on Titan 0.4.4
○ 0.4.4 --> 0.5.*: not storage compatible (require ETL to upgrade)
○ 0.4.4 --> 1.*: not storage compatible, query code rewrite

Summary
● Development
○ Gremlin easier to work with than SQL (opinion)
○ Tools for SQL more mature and varied but graph is catching up
● Migration
○ Relational --> Graph generally requires ETL
● Operation
○ NoSQL benefits of distributed, scalable, schemaless DBs
○ Performance can be an issue if not considered early
○ Graph vendor/version coupling but will improve with maturity

Thanks For Watching
Questions
Nakul Jeirath
@njeirath
Senior Security Engineer - WellAware

A Journey from Relational to Graph

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to A Journey from Relational to Graph

Similar to A Journey from Relational to Graph (20)

Recently uploaded

Recently uploaded (20)

A Journey from Relational to Graph