Cassandra's Odyssey @ Netflix

Cassandra’s
Odyssey @ Netflix
Roopa Tangirala
Engineering Manager

Contents
● Brief History
● Cassandra Use Cases
● Supporting Infrastructure
● What’s Next?
● Q & A

▪ HIGHLY AVAILABLE
▪ MULTI DATACENTER SUPPORT
▪ PREDICTABLE PERFORMANCE AT SCALE
REQUIREMENTS

▪Massively scalable architecture
▪Multi-datacenter, multi-directional replication
▪Linear scale performance
▪Transparent fault detection and recovery
▪Flexible, dynamic schema data
▪Guaranteed data safety
▪Tunable data consistency
WHY CASSANDRA?

Us-west-2
Oregon
Us-east-1
North Virginia
Eu-west-1
Ireland

Current Membership data model
Current [soon to be old model]
1. Thrift based
2. Schema less data model
3. One account info - results in multiple reads
[from multiple nodes]

Membership data model - evolved
New data model
1. CQL based
2. Well defined schema
3. A few UDTs for schema flexibility
4. One account with multiple profiles - just one call

Membership data model - evolved
New data model
1. Primary key definition
a. Account_id -> part key and
b. profile_id -> clustering column
2. Partition size <= 64k for all the profile
CQL - Table definition

Global Ratings - I
Two main usage -
1. Full sweep of advisories and rating
countrywide
2. Get advisories and rating for a
given movie_id, country_code

● Full sweep countrywide
Global Ratings - II

● Get advisories and rating for a given movie_id, country_code
Global Ratings - III

Observability
● Analysis of logs
● Traces
● Metrics

● Slow Rate of ingestion
● Slow reads
● On ingestion need to parse message and build
index
● Data for each request not continuous on disk
Elasticsearch Issues

CDE Service
“Empowering CDE to provide datastores as a
service”

Motivation
So. Many. Tools.
...and more

A Central Hub for Persistence
Self Service
Management
Insights

Before:
1. Users sent requests via email, Slack, or JIRA tickets.
2. CDE on-call translated their requirements into appropriate resource
requirements, and kicked off automation to create clusters.
3. Once created, CDE on-call notified users via email, including information
about how to access the cluster, set up security group rules, useful links,
etc.
Elasticsearch Cluster Creation

* Cost shown is not real, for illustration only.

Goals:
1. Allow Repairs to be enabled/disabled quickly for a cluster.
2. Allow fine-tuning of table-specific settings for subrange repairs,
parallelism, etc.
Cassandra Repairs

Goals:
1. Allow Backups to be enabled/disabled quickly for a cluster.
2. Allow Backup schedules, retention periods, S3 buckets/locations, etc. to
be customized.
Cluster Backups

Goal:
Provide cluster owners insight into the costs of running their clusters so that
they can make more informed choices/tradeoffs with respect to efficiency in
resource usage.
Cluster Costs

* Costs shown are not real, for illustration only.

Goal:
Aggregate and display on-instance stats/metadata for all nodes in a cluster,
including:
○ Datastore versions
○ Sidecar versions
○ OS versions
○ AWS instance types
○ etc.
Node Inventory

Web UI
CDE
Service
API
C* Store
(multiregion)
Bolt
Bolt
Datastore Nodes
Bolt
CDE
Service
Backend
(eu-west-1)
CDE
Service
Backend
(us-east-1)
CDE
Service
Backend
(us-west-2)
Datastore Nodes
Datastore Nodes

CDE Service Data Model
CDE service -
● Hierarchical model
● Manages hundreds of clusters
● Manages tens of thousands of nodes

Sample inserts -
INSERT INTO "eunomia"."eunomia_cass_app_node_info"( "appname", "env", "region",
"nodeuid", "jason_val",)
VALUES('cass_app', 'NA', 'NA', 'NA','app_level_info');
VALUES('cass_app', 'PROD', 'NA', 'NA','app_env_level_info');
VALUES('cass_app', 'PROD', 'US-EAST-1', 'NA','app_env_region_level_info');
VALUES('cass_app', 'PROD', 'US-EAST-1','NODE1','app_env_region_node_level_info');
PRIMARY KEY Definition -
PRIMARY KEY (appname, env, region, nodeuid)
) WITH CLUSTERING ORDER BY (env ASC, region ASC);

Sample Select - Get all the data for an app

Get all the data at app and ENV level

Environment-level data (PROD vs TEST)

Get data only app and ENV level

Region-level data (within an environment)

▪ Unify access to our customer’s persisted data
▪ Codify the CDE team’s best-practices
▪ Provide cluster-level ACLs to avoid unnecessary
access
▪ Provide a paved path for common database
operations
▪ Work at Netflix-scale
Netflix Data Explorer

Single Entrypoint for Clusters

How do we expand beyond three
regions without increasing the
replicas or cost dramatically?

Cassandra CDC to get data in
sync between cassandra and
derived stores

Materialized views production
ready

Cassandra's Odyssey @ Netflix

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Cassandra's Odyssey @ Netflix

Ähnlich wie Cassandra's Odyssey @ Netflix (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Cassandra's Odyssey @ Netflix