SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Powering a Graph Data
System with Scylla +
JanusGraph
Ryan Stauffer, Founder & CEO
Presenter
Ryan Stauffer, Founder & CEO
Ryan founded Enharmonic to change the way we interact with
data. He has experience building modern data solutions for fast-
moving companies, both as a consultant and as the leader of
Data Strategy and Analytics at Private Equity-backed Driven
Brands. He received his MBA from Washington University in St.
Louis, and has additional experience in Investment Banking and
as a U.S. Army Infantry Officer. In his free time, he makes music
and tries to set PRs running up Potrero Hill.
Graph Data System?
What?
Graph Data System
We can break down the concept of a “Graph Data System” into 2 pieces:
■ Graph - we’re modelling our data as a property graph
● Vertices model logical entities (Customer, Product, Order)
● Edges model logical relationships between entities (PURCHASED, IN_ORDER)
● Properties model attributes of entities/relationships (name, purchaseDate)
■ Data System - we use several components in a single system to store
and retrieve our data
JanusGraph & Scylla Overview
Why?
3 Core Benefits
■ Flexibility
■ Schema support
■ OLTP & OLAP support (Distinct from Scylla Workload Prioritization)
Flexibility
The “killer feature” of a graph data model is flexibility
■ Changing database schemas to support new business logic and data
sources is tough!
■ The nature of a graph’s data model makes it easier to evolve the data
model over time
■ Iterate on our model to match our understanding as we learn,
without having to start from scratch
■ In practice
● Incorporate fresh data sources without breaking existing workloads
● Write query results directly to the graph as new vertices & edges
● Share production-quality data between teams
Schema Support
By supporting a defined schema, our data system can enforce business
logic, and minimize duplicative application code
■ Flexible schema support out-of-the-box
■ We can pre-define the properties and datatypes that are possible for
a given vertex or edge, without requiring that each vertex/edge
contain every property
■ We can pre-define which edge types are allowed to connect a pair of
vertices, without requiring every pair of vertices to have this edge
■ Simplifies testing on new use cases
■ Separates data integrity maintenance from business logic
OLTP + OLAP
■ Transactional (graph-local) workloads
● Begin with a small number of vertices (found with the help of an index)
● Traverse across a reasonably small number of edges and vertices
● Goal is to minimize latency
● With Scylla, we can achieve scalable, single-digit millisecond response
■ Analytical (graph-global) workloads
● Travel to all (or a substantial portion) of the vertices and edges
● Includes many classic graph algorithms
● Goal is to maximize throughput (might leverage Spark)
■ The same traversal language (Gremlin) can be used to write both
types of workloads
■ At the graph level -> distinct from Scylla workload prioritization
Deployment
Where to Deploy?
VMs
Bare
Metal
Kubernetes
■ Open-source system for managing containerized applications
■ Groups application containers into logical units
■ Builds abstractions on top of the basic resources
● Compute
● Memory
● Disk
● Network
Deployment Overview
Stateful SetDeployment Storage Class
Headless
Service
Load
Balancer
Client
■ The “stateful” components of our system are Scylla & Elasticsearch
■ JanusGraph is deployed as a stateless server that stores and
retrieves data to and from the stateful systems
Scylla
■ Use your existing deployment == Zero lift!
■ New keyspace for JanusGraph data
Elasticsearch
Stateful Set Storage ClassHeadless Service
Elasticsearch - Manifest Summary
Storage Class kind: StatefulSet
metadata: ...
spec:
serviceName: es
replicas: 3
selector: { matchLabels: { app: es }}
template:
metadata: { labels: { app: es }}
spec:
containers:
- name: elasticsearch
image: .../elasticsearch-oss:6.6.0
env:
- name: discovery.zen.ping.unicast.hosts
value: "es-0.es.default.svc.cluster.local,..."
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata: { name: data }
spec:
accessModes: [ ReadWriteOnce ]
storageClassName: elasticsearch-ssd
kind: Service
metadata:
name: es
labels: { app: es }
spec:
clusterIP: None
ports:
- port: 9200
- port: 9300
selector:
app: es
Headless Service
Stateful Set
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: elasticsearch-ssd
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
Elasticsearch - Deploy
$ kubectl apply -f elasticsearch.yaml
storageclass.storage.k8s.io/elasticsearch-ssd created
service/es created
statefulset.apps/elasticsearch created
$ kubectl get all -l app=elasticsearch
NAME READY AGE
statefulset.apps/elasticsearch 3/3 2m10s
NAME READY STATUS RESTARTS AGE
pod/elasticsearch-0 1/1 Running 0 2m9s
pod/elasticsearch-1 1/1 Running 0 87s
pod/elasticsearch-2 1/1 Running 0 44s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/es ClusterIP None <none> 9200/TCP,9300/TCP 2m9s
JanusGraph
JanusGraph Image
$ git clone https://github.com/JanusGraph/janusgraph-docker.git
$ cd janusgraph-docker
$ sudo ./build-images.sh 0.4
# Push the image to your private project repository
$ docker tag janusgraph/janusgraph:0.4.0 gcr.io/$PROJECT/janusgraph:0.4.0
$ gcloud auth configure-docker
$ docker push gcr.io/$PROJECT/janusgraph:0.4.0
■ There are already official JanusGraph images on Docker Hub
■ You can also build your own using the JanusGraph project build
scripts and push it to a private image repository (ex: GCP)
$ docker pull janusgraph/janusgraph:0.4.0
JanusGraph Console
(Just a Pod…)
JanusGraph Console - Manifest Summary
■ Run JanusGraph in a Pod, and connect to it directly
● Graph is only accessible through this console connection, but actions are persisted
in Scylla and Elasticsearch
kind: Pod
spec:
containers:
- name: janusgraph
image: .../janusgraph:0.4.0
env:
- name: JANUS_PROPS_TEMPLATE
value: cql-es
- name: janusgraph.storage.hostname
value: 10.138.0.3
- name: janusgraph.storage.cql.keyspace
value: graphdev
- name: janusgraph.index.search.hostname
value: "es-0.es.default.svc.cluster.local,..."
graph = JanusGraphFactory.open('/etc/opt/janusgraph/janusgraph.properties')
mgmt = graph.openManagement()
JanusGraph Console - Deploy & Define Schema
$ kubectl create -f janusgraph-gremlin-console.yaml
$ kubectl exec -it janusgraph-gremlin-console -- bin/gremlin.sh
,,,/
(o o)
-----oOOo-(3)-oOOo-----
...
gremlin>
// Define Schema for a Product Vertex and Properties
Product = mgmt.makeVertexLabel("Product").make()
name = mgmt.makePropertyKey("name").
dataType(String.class).cardinality(Cardinality.SINGLE).make()
productId = mgmt.makePropertyKey("productId").
dataType(Integer.class).cardinality(Cardinality.SINGLE).make()
mgmt.addProperties(Product, name, productId)
mgmt.commit()
JanusGraph Server
DeploymentLoad Balancer
JanusGraph Server - Manifest Summary
■ Deploy JanusGraph as a standalone server
Service
kind: Deployment
labels:
app: janusgraph
spec:
replicas: 1
template:
spec:
containers:
- name: janusgraph
image: .../janusgraph:0.4.0
env:
- name: JANUS_PROPS_TEMPLATE
value: cql-es
- name: janusgraph.storage.hostname
value: 10.138.0.3
- name: janusgraph.storage.cql.keyspace
value: graphdev
- name: janusgraph.index.search.hostname
value: "es-0.es.default.svc.cluster.local,..."
Deployment
kind: Service
metadata:
name: janusgraph-service-lb
spec:
type: LoadBalancer
selector:
app: janusgraph
ports:
- name: gremlin-server-websocket
protocol: TCP
port: 8182
targetPort: 8182
● Uses TinkerPop Gremlin Server
● Graph will be accessible to a wide range of client languages (Python, Java, JS, etc.)
JanusGraph Server - Deploy
$ kubectl apply -f janusgraph.yaml
service/janusgraph-service-lb created
deployment.apps/janusgraph-server created
$ kubectl get all -l app=janusgraph
NAME READY STATUS RESTARTS AGE
pod/janusgraph-server-5d77dd9ddf-nc87p 1/1 Running 0 1m2s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/janusgraph-service-lb LoadBalancer 10.0.12.109 35.121.171.101 8182/TCP 1m3s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/janusgraph-server 1/1 1 1 1m3s
NAME DESIRED CURRENT READY AGE
replicaset.apps/janusgraph-server-5d77dd9ddf 1 1 1 1m2s
A Better Way - Helm Charts
■ Nobody has time to manage all of these individual manifest files!
■ Use Helm (https://helm.sh) - the “package manager” for k8s
■ Makes it easy to define, deploy & upgrade Kubernetes applications
■ You can find our opinionated take on deploying JanusGraph with
Helm at https://github.com/EnharmonicAI/janusgraph-helm
With Kubernetes, it’s easy
to deploy JanusGraph on
top of Scylla
Flexible, scalable graph
data system for building
applications
Thank you Stay in touch
Any questions?
Ryan Stauffer
ryan@enharmonic.ai
@RyantheStauffer

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraScyllaDB
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Altinity Ltd
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMongoDB
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraDataStax Academy
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEOAltinity Ltd
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index FileNavis Ryu
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptxSurya937648
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB ShardingRob Walters
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLYoshinori Matsunobu
 
PostgreSQL continuous backup and PITR with Barman
 PostgreSQL continuous backup and PITR with Barman PostgreSQL continuous backup and PITR with Barman
PostgreSQL continuous backup and PITR with BarmanEDB
 
Transparent Hugepages in RHEL 6
Transparent Hugepages in RHEL 6 Transparent Hugepages in RHEL 6
Transparent Hugepages in RHEL 6 Raghu Udiyar
 

Was ist angesagt? (20)

Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Graph Databases at Netflix
Graph Databases at NetflixGraph Databases at Netflix
Graph Databases at Netflix
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache Cassandra
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache Cassandra
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index File
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
 
Airflow and supervisor
Airflow and supervisorAirflow and supervisor
Airflow and supervisor
 
MapReduce
MapReduceMapReduce
MapReduce
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Sharding
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQL
 
PostgreSQL continuous backup and PITR with Barman
 PostgreSQL continuous backup and PITR with Barman PostgreSQL continuous backup and PITR with Barman
PostgreSQL continuous backup and PITR with Barman
 
Transparent Hugepages in RHEL 6
Transparent Hugepages in RHEL 6 Transparent Hugepages in RHEL 6
Transparent Hugepages in RHEL 6
 

Ähnlich wie Powering a Graph Data System with Scylla + JanusGraph

Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data gridBogdan Dina
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Honey I Shrunk the Database
Honey I Shrunk the DatabaseHoney I Shrunk the Database
Honey I Shrunk the DatabaseVanessa Hurst
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profilerIhor Bobak
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streamingAdam Doyle
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Databricks
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And DesignYaroslav Tkachenko
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1Stefanie Zhao
 
Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)Lucas Jellema
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Jen Aman
 
Intro to SnappyData Webinar
Intro to SnappyData WebinarIntro to SnappyData Webinar
Intro to SnappyData WebinarSnappyData
 
Odtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youOdtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youLuc Bors
 
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...DataWorks Summit
 

Ähnlich wie Powering a Graph Data System with Scylla + JanusGraph (20)

Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
Hot tutorials
Hot tutorialsHot tutorials
Hot tutorials
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Honey I Shrunk the Database
Honey I Shrunk the DatabaseHoney I Shrunk the Database
Honey I Shrunk the Database
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streaming
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And Design
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
 
Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
Intro to SnappyData Webinar
Intro to SnappyData WebinarIntro to SnappyData Webinar
Intro to SnappyData Webinar
 
Big Data on the Cloud
Big Data on the CloudBig Data on the Cloud
Big Data on the Cloud
 
Odtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for youOdtug2011 adf developers make the database work for you
Odtug2011 adf developers make the database work for you
 
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 

Mehr von ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

Mehr von ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Powering a Graph Data System with Scylla + JanusGraph

  • 1. Powering a Graph Data System with Scylla + JanusGraph Ryan Stauffer, Founder & CEO
  • 2. Presenter Ryan Stauffer, Founder & CEO Ryan founded Enharmonic to change the way we interact with data. He has experience building modern data solutions for fast- moving companies, both as a consultant and as the leader of Data Strategy and Analytics at Private Equity-backed Driven Brands. He received his MBA from Washington University in St. Louis, and has additional experience in Investment Banking and as a U.S. Army Infantry Officer. In his free time, he makes music and tries to set PRs running up Potrero Hill.
  • 3.
  • 5. Graph Data System We can break down the concept of a “Graph Data System” into 2 pieces: ■ Graph - we’re modelling our data as a property graph ● Vertices model logical entities (Customer, Product, Order) ● Edges model logical relationships between entities (PURCHASED, IN_ORDER) ● Properties model attributes of entities/relationships (name, purchaseDate) ■ Data System - we use several components in a single system to store and retrieve our data
  • 8. 3 Core Benefits ■ Flexibility ■ Schema support ■ OLTP & OLAP support (Distinct from Scylla Workload Prioritization)
  • 9. Flexibility The “killer feature” of a graph data model is flexibility ■ Changing database schemas to support new business logic and data sources is tough! ■ The nature of a graph’s data model makes it easier to evolve the data model over time ■ Iterate on our model to match our understanding as we learn, without having to start from scratch ■ In practice ● Incorporate fresh data sources without breaking existing workloads ● Write query results directly to the graph as new vertices & edges ● Share production-quality data between teams
  • 10. Schema Support By supporting a defined schema, our data system can enforce business logic, and minimize duplicative application code ■ Flexible schema support out-of-the-box ■ We can pre-define the properties and datatypes that are possible for a given vertex or edge, without requiring that each vertex/edge contain every property ■ We can pre-define which edge types are allowed to connect a pair of vertices, without requiring every pair of vertices to have this edge ■ Simplifies testing on new use cases ■ Separates data integrity maintenance from business logic
  • 11. OLTP + OLAP ■ Transactional (graph-local) workloads ● Begin with a small number of vertices (found with the help of an index) ● Traverse across a reasonably small number of edges and vertices ● Goal is to minimize latency ● With Scylla, we can achieve scalable, single-digit millisecond response ■ Analytical (graph-global) workloads ● Travel to all (or a substantial portion) of the vertices and edges ● Includes many classic graph algorithms ● Goal is to maximize throughput (might leverage Spark) ■ The same traversal language (Gremlin) can be used to write both types of workloads ■ At the graph level -> distinct from Scylla workload prioritization
  • 14. Kubernetes ■ Open-source system for managing containerized applications ■ Groups application containers into logical units ■ Builds abstractions on top of the basic resources ● Compute ● Memory ● Disk ● Network
  • 15. Deployment Overview Stateful SetDeployment Storage Class Headless Service Load Balancer Client ■ The “stateful” components of our system are Scylla & Elasticsearch ■ JanusGraph is deployed as a stateless server that stores and retrieves data to and from the stateful systems
  • 16. Scylla ■ Use your existing deployment == Zero lift! ■ New keyspace for JanusGraph data
  • 17. Elasticsearch Stateful Set Storage ClassHeadless Service
  • 18. Elasticsearch - Manifest Summary Storage Class kind: StatefulSet metadata: ... spec: serviceName: es replicas: 3 selector: { matchLabels: { app: es }} template: metadata: { labels: { app: es }} spec: containers: - name: elasticsearch image: .../elasticsearch-oss:6.6.0 env: - name: discovery.zen.ping.unicast.hosts value: "es-0.es.default.svc.cluster.local,..." volumeMounts: - name: data mountPath: /usr/share/elasticsearch/data volumeClaimTemplates: - metadata: { name: data } spec: accessModes: [ ReadWriteOnce ] storageClassName: elasticsearch-ssd kind: Service metadata: name: es labels: { app: es } spec: clusterIP: None ports: - port: 9200 - port: 9300 selector: app: es Headless Service Stateful Set kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: elasticsearch-ssd provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd
  • 19. Elasticsearch - Deploy $ kubectl apply -f elasticsearch.yaml storageclass.storage.k8s.io/elasticsearch-ssd created service/es created statefulset.apps/elasticsearch created $ kubectl get all -l app=elasticsearch NAME READY AGE statefulset.apps/elasticsearch 3/3 2m10s NAME READY STATUS RESTARTS AGE pod/elasticsearch-0 1/1 Running 0 2m9s pod/elasticsearch-1 1/1 Running 0 87s pod/elasticsearch-2 1/1 Running 0 44s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/es ClusterIP None <none> 9200/TCP,9300/TCP 2m9s
  • 21. JanusGraph Image $ git clone https://github.com/JanusGraph/janusgraph-docker.git $ cd janusgraph-docker $ sudo ./build-images.sh 0.4 # Push the image to your private project repository $ docker tag janusgraph/janusgraph:0.4.0 gcr.io/$PROJECT/janusgraph:0.4.0 $ gcloud auth configure-docker $ docker push gcr.io/$PROJECT/janusgraph:0.4.0 ■ There are already official JanusGraph images on Docker Hub ■ You can also build your own using the JanusGraph project build scripts and push it to a private image repository (ex: GCP) $ docker pull janusgraph/janusgraph:0.4.0
  • 23. JanusGraph Console - Manifest Summary ■ Run JanusGraph in a Pod, and connect to it directly ● Graph is only accessible through this console connection, but actions are persisted in Scylla and Elasticsearch kind: Pod spec: containers: - name: janusgraph image: .../janusgraph:0.4.0 env: - name: JANUS_PROPS_TEMPLATE value: cql-es - name: janusgraph.storage.hostname value: 10.138.0.3 - name: janusgraph.storage.cql.keyspace value: graphdev - name: janusgraph.index.search.hostname value: "es-0.es.default.svc.cluster.local,..."
  • 24. graph = JanusGraphFactory.open('/etc/opt/janusgraph/janusgraph.properties') mgmt = graph.openManagement() JanusGraph Console - Deploy & Define Schema $ kubectl create -f janusgraph-gremlin-console.yaml $ kubectl exec -it janusgraph-gremlin-console -- bin/gremlin.sh ,,,/ (o o) -----oOOo-(3)-oOOo----- ... gremlin> // Define Schema for a Product Vertex and Properties Product = mgmt.makeVertexLabel("Product").make() name = mgmt.makePropertyKey("name"). dataType(String.class).cardinality(Cardinality.SINGLE).make() productId = mgmt.makePropertyKey("productId"). dataType(Integer.class).cardinality(Cardinality.SINGLE).make() mgmt.addProperties(Product, name, productId) mgmt.commit()
  • 26. JanusGraph Server - Manifest Summary ■ Deploy JanusGraph as a standalone server Service kind: Deployment labels: app: janusgraph spec: replicas: 1 template: spec: containers: - name: janusgraph image: .../janusgraph:0.4.0 env: - name: JANUS_PROPS_TEMPLATE value: cql-es - name: janusgraph.storage.hostname value: 10.138.0.3 - name: janusgraph.storage.cql.keyspace value: graphdev - name: janusgraph.index.search.hostname value: "es-0.es.default.svc.cluster.local,..." Deployment kind: Service metadata: name: janusgraph-service-lb spec: type: LoadBalancer selector: app: janusgraph ports: - name: gremlin-server-websocket protocol: TCP port: 8182 targetPort: 8182 ● Uses TinkerPop Gremlin Server ● Graph will be accessible to a wide range of client languages (Python, Java, JS, etc.)
  • 27. JanusGraph Server - Deploy $ kubectl apply -f janusgraph.yaml service/janusgraph-service-lb created deployment.apps/janusgraph-server created $ kubectl get all -l app=janusgraph NAME READY STATUS RESTARTS AGE pod/janusgraph-server-5d77dd9ddf-nc87p 1/1 Running 0 1m2s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/janusgraph-service-lb LoadBalancer 10.0.12.109 35.121.171.101 8182/TCP 1m3s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/janusgraph-server 1/1 1 1 1m3s NAME DESIRED CURRENT READY AGE replicaset.apps/janusgraph-server-5d77dd9ddf 1 1 1 1m2s
  • 28. A Better Way - Helm Charts ■ Nobody has time to manage all of these individual manifest files! ■ Use Helm (https://helm.sh) - the “package manager” for k8s ■ Makes it easy to define, deploy & upgrade Kubernetes applications ■ You can find our opinionated take on deploying JanusGraph with Helm at https://github.com/EnharmonicAI/janusgraph-helm
  • 29. With Kubernetes, it’s easy to deploy JanusGraph on top of Scylla
  • 30. Flexible, scalable graph data system for building applications
  • 31. Thank you Stay in touch Any questions? Ryan Stauffer ryan@enharmonic.ai @RyantheStauffer

Hinweis der Redaktion

  1. Let's give another round of applause to Brian.  Everything he said applies here – now we'll just dig into the technical pieces a bit more.
  2. I'm Ryan Stauffer, I'm the founder and CEO of a Bay Area startup called Enharmonic.  I first got excited about graph databases several years back when I was leading data analytics and strategy for a large automotive aftermarket company.  We were trying to build a unified model of data for the automotive aftermarket that combined data from across our different verticals.  Using the source data in its existing form – hundreds of tables, and hundreds of millions of rows & columns - was leading us down a really bad path.  It became clear that insights would be much easier if we used a graph data model, where we can explicitly model our data as real-world business concepts.  Ever since then, I’ve viewed graph data systems as a core part of the solution for how to ask and answer better questions about our businesses.
  3. For a litle backdrop about what we'll be talking about – what do we do at Enharmonic?  Well, we're working to solve the problem of how companies interact with their data. We provide a clean, visual interface that let's business decision makers directly access their data with free-text search and point-click-and-drag actions.  Data is modeled and retrieved as logical business concepts like Customers, Products, and Orders.  Our system recommends analyses that make sense based on the data, and then goes ahead and executes those with just a few clicks.  To make this possible, we use lots of automation on the backend – and sitting behind everything, we use a graph data system.
  4. Brian discussed graphs in the last session, so I'm not going to rehash everything, but I do want to do a brief level-set.  So what do I mean when I say "Graph Data System"?
  5. We can break that into 2 parts: "Graph" & "Data System" By "graph" we mean that we're modelling our data as a property graph, using Vertices, Edges & Properties. Vertices model entities like Customers or Products Edges model relationships between entities, like how one Customer KNOWS another Customer, or a Customer HAS PURCHASED a Product. Properties model attributesof entites and relationships, like the name and age of a Customer. By "Data System" we mean that several distinct components combine to form a single, logical system.
  6. There are several options for graph databases out there on the market, but when we need a combination of scalability, flexibility, and performance, we can look to a system built of JanusGraph, Scylla, and Elasticsearch. This is a single logical data system is structured into 3 parts: - In the center we have JanusGraph, a Java application that clients communicate with directly. - It serves as the abstraction layer that let's us interact with our data as a graph. - JG will write to and read from Scylla, where our data is ultimately persisted. - We can optionally add Elasticsearch to help us with advanced indexing and text search capabilities
  7. So that sounds interestnig, but why do we want to do this at all?
  8. I think there are 3 core benefits of this graph data system. - Flexibility - Schema support - Support for both transactional & analytical workloads
  9. The killer feature of using a graph is its flexibility - Business logic changes, application requirements change, and it can often be a real problem trying to support that with traditional databases - Using a graph means our data model isn't set in stone. - We can iterate and evolve the data model by adding additional vertices and edges to meet our new needs, without throwing out everything that already works. - We can also write analytics results directly back to the graph, explicitly connecting to our primary data. - This simplifies the ways that teams can collaborate and share insights, while allowing for powerful data provenance capabilities.
  10. Schema support is a real "nice-to-have" when it comes to separating business logic from lower-level database integrity issues. JanusGraph, unlike some other graph databases out there, supports defining a schema for data, but doesn't require that we do this. Basically, we can apply useful constraints to what is allowed and disallowed on our graph. For example, we can ensure that name and age properties are only allowed to be written to a Customer vertex, but we don't required that every Customer vertex have all of these properties (minimizes the need for pointless null field values!) We can also specify that a Product and Customer vertex are allowed to be related with a HAS_PURCHASED edge, bu we don't required that each Product vertex must have that edge. This sort of clear schema flexibility is difficult to replicate outside of a graph environment. Separates data integrity mantenance from our business logic – letting our DB take over DB tasks, without offloading them onto the application layer.
  11. - Finally, with this graph data system, we can execute both transactional and analytical workloads with the same data systtem and same query language – Gremlin. - We access data by “traversing” our graph, travelling from vertex to vertex by means of connecting edges. - We can think of a transactional workload to be one where we travel to a small number of vertices and edge, and where our goal is to minimize latency. - An analytical workload, on the other hand, is one where we travel to all, or a substantial portion, of our vertices and edges.  Our goal here is to maximize throughput. - Backed by the high-IO performance of Scylla, we can achieve scalable, single-digit millisecond response for transactional workloads.  We can also leverage Spark to handle large scale analytical workloads
  12. It's easy to talk about all of this in theory, but how do we go about actually deploying it
  13. 1st of all, WHERE are we going to deploy this? In a production environment, it makes sense to deploy Scylla on either VMs or bare metal. For JanusGraph & ES, there are many advantages to deploying on Kubernetes Q – Quick show of hands, who is using Kubernetes today? Q – Who has tried deploying Scylla on top of Kubernetes? (Yannis Zarkadas gave a great talk earlier today on using the Scylla Operator to manage Scylla on K8s – if you missed it I highly recommend checking out the talk online.)
  14. Kubernetes is an open source system for managing containerized applications Allows you to group and manage application containers as logical units Fundamentally, its about building and interacting with abstractions on top of basic resources (Compute, memory, disk, network) Not going to touch every last detail of the k8s manifests, but I want really dive into the low-level fundamental of the k8s resources you'll be using. Now even when setting up our pieces on k8s seems pedantic, remember that this greatly simplifies the process of installing and managing a complex application.  As many of you probably know, it's significantly easier to do it this way versus installing and upgrading each app and their dependencies manually at the VM level.
  15. Walkthrough the details of deploying the whole system. Big picture, we have 2 types of components – stateful and stateless Stateful components are Scylla and Elasticsearch, where we'll actually persists our data.  Everything else is stateless and ephemeral.  Our actual JanusGraph app pods for instance are stateless, and if one dies, we simply spin up a new one in its place. The what does this looks like? A client (maybe an app, maybe our little Scylla monster up here) and she'll issue queries to JanusGraph.  - Those queries hit a load balancer and are passed to 1 or more pods managed as part of a JanusGraph deployment. JanusGraph app is what presents the "graph" view of data, and it does it by intermediating between the client and stateful apps. Most data is put in Scylla, over here on the left. For more advanced indexing, we use Elasticsearch, which we deploy as a Stateful Set and Headless Service.
  16. Diving into more detail, we start with Scylla. We can actually use your existing Scylla cluster, meaning there's 0 lift! The one thing we'll do is create a new keyspace to hold graph data.
  17. To give us more advanced indexing capabilities, we'll deploy Elasticsearch as well. We deploy it on Kubernetes in 3 parts. - Headless Service - Stateful Set - Storage Class ES is stateful, so needs to persist data, which we'll accomplish this by means of a stateful set. Now, a stateful set is just used to manage 1 or more replica pods, which are the nodes in our ES cluster.  But it does this in a unique way.  It assigns numbers to each pod and the disks that are mounted to it.  This way, we consistently mount the same disk to the same pod #. This gives us a reliably stateful system, where even if individual pods fail, they're safely recreated automatically by Kubernetes.
  18. We define a storage class – what type of disks do we want to mount to our Elasticsearch nodes?  In this case, we'll choose SSDs. We'll define a headless service.  We set clusterIP to None, specify our standard ES ports, and provide a selctor to target our stateful set pods. The last step is to define our stateful set.  This references the Storage Class and Headless Service we just defined, so I color-coded the important bits. For storage, shown in blue, our goal is to define a disk from our elasticsearch-ssd storage class for each ES node, and mount it to that node.  To do this, we'll define a Volume Clam Template, and define a volume mount that mounts the disk at our ES data path. For networking, shown in red, we specify the Headless Service name.  We'll also define 1 environment variable, that allows for ES node discovery. Q – I THINK THERE'S A TYPO HERE ON THE SELECTOR FOR THE HEADLESS SERVICE.
  19. Assuming we put all of this into a single manifest file, we can deploy Elasticsearch to our Kubernetes cluster with a single "apply" command After a little bit of initialization, we can see the Ready status of our stateful set, the 3 pods it controls, and the services that routes network traffic to these pods.
  20. Now, for the last and most important piece of the puzzle – JanusGraph. We'll deploy this on Kubernetes as well.
  21. There are already official JanusGraph images available on Docker Hub, and for these examples we'll be using version 0.4.0 You could also build your own using the JanusGraph project build scripts, and push that image to a private image repository (for example, Google Cloud Platform)
  22. Now how do we use JanusGraph?  Let's start with a minimal example.  Not for production use - but illustrates how this all works. We'll deploy a single pod to get console access to our system.
  23. We'll run JanusGraph in a single pod, and connect to it directly. That means that the graph is only accessible through the console connection, but all of our actions are still persisted in Scylla and Elasticsearch. Now, the standard JanusGraph docker image includes some great templateing and presets, which allow us to configure out connection to our storage and indexing backends with just a few environment variables. We're using Scylla * Elasticsearch, so we set cql-es as our JanusGraph properties template. We set the hostname as 1 or more of the Scylla cluster hostnames We set the keyspace as a new, clean Scylla keyspace where we'll store all of our graph data. Finally, supply the K8s cluster hostnames for our Elasticsearch nodes.
  24. With that manifest file, we can create a pod, then connect to it with an interactive terminal. This will bring up a Gremlin Console. The JG Docker image will prepopuate a standard janusgraph.properties file that will reflect the env var configuration we just setup. We use a factory to create a graph instance, and then we can do whatever we'd like to! For example, we can start by defining a schema for a Product vertex with name and productId properties.
  25. If we want to actually move to a real environment, we need to support multiple users and applications, probably written in different languages. To handle this we deploy JanusGraph server. On Kubernetes, we'll do this as a Deployment, which manages 1 or more stateless replica pods. We put a load balancer in front of it, exposed on an external or internal IP depending on the use case.
  26. When we deploy JanusGraph as a standalone server, we're actually using the Apache TinkerPop Gremlin Server underneath the hood, which will accept Gremlin language queries issued from applications written in multiple languages (Python, Java, JS, etc.) The Service is pretty simple just a LoadBalancer that will route network requests to our pods.  We're using port 8182 because that's the standard gremlin websocket port. We manage those pods as a single deployment.  We specify the number of replicas, the image, and setup the environment variables just like we did before.
  27. We apply our manifest, and check that everything is running.  The key parts are the Load Balancer and Deployment. Once our LB has its IP assigned, we're able to connect to our JG pods with a client application.  Now we can issue queries, store data – do whatever we want! Now, some of that description of K8s manifest got pretty pedantic.  There's got to be a better way, right?
  28. There is – Helm Charts! Q – With a show of hands, who uses Helm Charts? Awesome.  We can think of Helm as a package manager for k8s.  It lets us template out and group related manifest files into logical packages called Charts. This makes it easy to define, deploy and upgrade Kubernetes applications with single commands. We just released our own opinionate take on how to deploy JanusGraph as a Helm Chart on Github.  If you like saving time and energy, please check it out and use it
  29. Kubernetes gives us tremendous power, and makes it easy to deploy JanusGraph on top of Scylla.
  30. With our deployment up and running, we have a flexible, scalable graph data system that we can use as the bedrock for an exciting new generation of applications.
  31. Thank you for your time. If you'd like to stay in touch, you can follow me on Twitter or connect with me on LinkedIn.  You can also contact me directly via email. I think we have a few more minutes, so what questions do you have?