SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How GumGum Migrated
from Cassandra to Amazon
DynamoDB
Anirban Roy
Lead Engineer
GumGum
D A T 3 4 5
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Introduction
Background
Alternatives and comparison
About the data
Migration strategy
Observations and benefits
Q&A
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High traffic with surges
90% of our traffic
involves
our programmatic
partners
Introduction: Background
Low response time
Maintaining low latency
is key to revenue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Apache Cassandra
We use to run 106 nodes
of i3.2xlarge instances
on AWS
Introduction: The problem
Scaling
Required adding nodes
manually to the cluster
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data center outrages
Introduction: The problem
Revenue loss Engineering fatigue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alternatives
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alternatives
More than 225 available
(source: nosql-database.org)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alternatives
GumGum’s Blogpost: https://techblog.gumgum.com/articles/moving-to-amazon-
dynamodb-from-hosted-cassandra
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benchmarking
DynamoDB
• YCSB
benchmarked
• Loaded ~20
million items
(~22 GB)
GumGum’s blogpost: https://techblog.gumgum.com/articles/moving-to-
amazon-dynamodb-from-hosted-cassandra
YCSB https://github.com/brianfrankcooper/YCSB
Apache Cassandra
• Achieved ~125,000
reads per second and
~40,000 writes per
second
• ~3-5ms read latency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Behavioral targeting data
DMP partners DSP partners
Cookie syncing
30 days TTL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GumGum Metadata Store (replicated across all four data centers of GG)
Contextual targeting data
Image URL Page
URL
30 Days to one
year TTL for
images
Seven days to one year
TTL for pages
GumGum TaPas (NLP)GumGum Vertex (CV)
ECS spot ECS spot ECS spot
images_metadata pages_metadata
Vertex spot
node
Vertex spot
node
Vertex spot
node
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Behavioral targeting data migration
Migration involved the following
• Data volume is considerably bigger
• No ETL operation required for migration
• WRITE -> WAIT -> READ approach
• Exploit the fact that TTL is short (30 days
- WAIT phase) Visitors keyspace
visitors
Ad server Ad server Ad server
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Contextual targeting data migration
images_metadata
pages_metadata
Extract data Transform data Load dataCassandra
keyspace
images_metadata
pages_metadata
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Caching: DAX or Memcached
When using DAX (only with DynamoDB)
AWS DAX
When using Memcached
GumGum ad
servers
Memcached
node
Memcached
node
DAX node
DAX node
NOSQL store
GumGum ad
servers
Ad server
Ad server
Ad server
Ad Server
Ad Server
Ad Server
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data replication requirements
• Behavioral targeting
• Data is required to be replicated
between the US east and US west
data centers
• Global replication is not required
• Contextual targeting
• Data replication is required across all
the four data centers of GumGum
• Global Tables was used to achieve
replication
During development for behavioral targeting
data, replication was not yet supported by
DynamoDB
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data replication architecture: Master-Master
Modified dynamodb-cross-region-library to perform Master-Master replication. Changes can be found
at https://github.com/awslabs/dynamodb-cross-region-library/pull/53
AWS Region US East 1
AWS Cloud
VPC
AWS Region US West 2
VPC
Auto
scaling
replicator
replicator
Auto
scaling
replicator
replicator
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits: performance
• 4-5ms read
latency
• No throttles
• Zero outages so
far
• Less timeouts
than Cassandra
4-5ms read latency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits: Cost
• Cassandra hosting cost
• 80 i3.2xlarge instances
• Total hosting cost: 0.624000 x 24 x 365
x 80 = $437299.2 USD
• DynamoDB running cost
• Per month = ~450 x 30 = ~13500 USD
• Estimated annual cost = 14100 x 12 =
$162000 USD
• % Saving
• {(437299.2 - 162000) x 100}/ 437299.2 =
62.95%
65-70%
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Operational stats
2 TB data
16.2 billion
items
~ 8 million reads
per minute
All at <3ms read latency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
But wait - There’s more about DynamoDB
A list of all DynamoDB sessions, workshops, and chalk talks
• Migrating Apache Cassandra to DynamoDB
• What’s new with DynamoDB
• Purpose-built databases in AWS
• DynamoDB service level agreement
• Adaptive capacity
• Point-in-time recovery (PITR)
• Global tables
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Anirban Roy
LinkedIn: anirban51roy
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011Tim Y
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)NAVER D2
 
Survey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsSurvey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsScyllaDB
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Identity and Access Management: The First Step in AWS Security
Identity and Access Management: The First Step in AWS SecurityIdentity and Access Management: The First Step in AWS Security
Identity and Access Management: The First Step in AWS SecurityAmazon Web Services
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database ServicesAmazon Web Services
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
 
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...Amazon Web Services
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 

Was ist angesagt? (20)

Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
Survey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsSurvey of High Performance NoSQL Systems
Survey of High Performance NoSQL Systems
 
Dynamodb ppt
Dynamodb pptDynamodb ppt
Dynamodb ppt
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Identity and Access Management: The First Step in AWS Security
Identity and Access Management: The First Step in AWS SecurityIdentity and Access Management: The First Step in AWS Security
Identity and Access Management: The First Step in AWS Security
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
 
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Google Spanner
Google SpannerGoogle Spanner
Google Spanner
 

Ähnlich wie How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Invent 2018

Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Amazon Web Services
 
Why GE Aviation Migrated from Cassandra to Amazon DynamoDB (DAT332) - AWS re:...
Why GE Aviation Migrated from Cassandra to Amazon DynamoDB (DAT332) - AWS re:...Why GE Aviation Migrated from Cassandra to Amazon DynamoDB (DAT332) - AWS re:...
Why GE Aviation Migrated from Cassandra to Amazon DynamoDB (DAT332) - AWS re:...Amazon Web Services
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Amazon Web Services
 
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech TalksWhat’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech TalksAmazon Web Services
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Amazon Web Services
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of usersAmazon Web Services
 
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Amazon Web Services
 
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...Amazon Web Services
 
Cost Optimisation Using Modern Cloud Architectures - AWS Summit Sydney 2018
Cost Optimisation Using Modern Cloud Architectures - AWS Summit Sydney 2018Cost Optimisation Using Modern Cloud Architectures - AWS Summit Sydney 2018
Cost Optimisation Using Modern Cloud Architectures - AWS Summit Sydney 2018Amazon Web Services
 
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS SummitOptimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS SummitAmazon Web Services
 
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...Amazon Web Services
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksAmazon Web Services
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Amazon Web Services
 
深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service 深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service Amazon Web Services
 
Getting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration ServiceGetting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration ServiceAmazon Web Services
 
Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018
Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018
Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018Amazon Web Services
 

Ähnlich wie How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Invent 2018 (20)

Migrating database to cloud
Migrating database to cloudMigrating database to cloud
Migrating database to cloud
 
What's New with Amazon DynamoDB
What's New with Amazon DynamoDBWhat's New with Amazon DynamoDB
What's New with Amazon DynamoDB
 
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
 
Why GE Aviation Migrated from Cassandra to Amazon DynamoDB (DAT332) - AWS re:...
Why GE Aviation Migrated from Cassandra to Amazon DynamoDB (DAT332) - AWS re:...Why GE Aviation Migrated from Cassandra to Amazon DynamoDB (DAT332) - AWS re:...
Why GE Aviation Migrated from Cassandra to Amazon DynamoDB (DAT332) - AWS re:...
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
 
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech TalksWhat’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
 
Database NoSQL gestiti
Database NoSQL gestitiDatabase NoSQL gestiti
Database NoSQL gestiti
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of users
 
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
 
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
 
Cost Optimisation Using Modern Cloud Architectures - AWS Summit Sydney 2018
Cost Optimisation Using Modern Cloud Architectures - AWS Summit Sydney 2018Cost Optimisation Using Modern Cloud Architectures - AWS Summit Sydney 2018
Cost Optimisation Using Modern Cloud Architectures - AWS Summit Sydney 2018
 
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS SummitOptimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
 
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
 
Best of AWS re:Invent 2017
Best of AWS re:Invent 2017Best of AWS re:Invent 2017
Best of AWS re:Invent 2017
 
深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service 深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service
 
Getting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration ServiceGetting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration Service
 
Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018
Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018
Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How GumGum Migrated from Cassandra to Amazon DynamoDB Anirban Roy Lead Engineer GumGum D A T 3 4 5
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Introduction Background Alternatives and comparison About the data Migration strategy Observations and benefits Q&A
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 7.
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 9.
  • 10.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. High traffic with surges 90% of our traffic involves our programmatic partners Introduction: Background Low response time Maintaining low latency is key to revenue
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Apache Cassandra We use to run 106 nodes of i3.2xlarge instances on AWS Introduction: The problem Scaling Required adding nodes manually to the cluster
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data center outrages Introduction: The problem Revenue loss Engineering fatigue
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alternatives
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alternatives More than 225 available (source: nosql-database.org)
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alternatives GumGum’s Blogpost: https://techblog.gumgum.com/articles/moving-to-amazon- dynamodb-from-hosted-cassandra
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benchmarking DynamoDB • YCSB benchmarked • Loaded ~20 million items (~22 GB) GumGum’s blogpost: https://techblog.gumgum.com/articles/moving-to- amazon-dynamodb-from-hosted-cassandra YCSB https://github.com/brianfrankcooper/YCSB Apache Cassandra • Achieved ~125,000 reads per second and ~40,000 writes per second • ~3-5ms read latency
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Behavioral targeting data DMP partners DSP partners Cookie syncing 30 days TTL
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. GumGum Metadata Store (replicated across all four data centers of GG) Contextual targeting data Image URL Page URL 30 Days to one year TTL for images Seven days to one year TTL for pages GumGum TaPas (NLP)GumGum Vertex (CV) ECS spot ECS spot ECS spot images_metadata pages_metadata Vertex spot node Vertex spot node Vertex spot node
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Behavioral targeting data migration Migration involved the following • Data volume is considerably bigger • No ETL operation required for migration • WRITE -> WAIT -> READ approach • Exploit the fact that TTL is short (30 days - WAIT phase) Visitors keyspace visitors Ad server Ad server Ad server
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Contextual targeting data migration images_metadata pages_metadata Extract data Transform data Load dataCassandra keyspace images_metadata pages_metadata
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Caching: DAX or Memcached When using DAX (only with DynamoDB) AWS DAX When using Memcached GumGum ad servers Memcached node Memcached node DAX node DAX node NOSQL store GumGum ad servers Ad server Ad server Ad server Ad Server Ad Server Ad Server
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data replication requirements • Behavioral targeting • Data is required to be replicated between the US east and US west data centers • Global replication is not required • Contextual targeting • Data replication is required across all the four data centers of GumGum • Global Tables was used to achieve replication During development for behavioral targeting data, replication was not yet supported by DynamoDB
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data replication architecture: Master-Master Modified dynamodb-cross-region-library to perform Master-Master replication. Changes can be found at https://github.com/awslabs/dynamodb-cross-region-library/pull/53 AWS Region US East 1 AWS Cloud VPC AWS Region US West 2 VPC Auto scaling replicator replicator Auto scaling replicator replicator
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits: performance • 4-5ms read latency • No throttles • Zero outages so far • Less timeouts than Cassandra 4-5ms read latency
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits: Cost • Cassandra hosting cost • 80 i3.2xlarge instances • Total hosting cost: 0.624000 x 24 x 365 x 80 = $437299.2 USD • DynamoDB running cost • Per month = ~450 x 30 = ~13500 USD • Estimated annual cost = 14100 x 12 = $162000 USD • % Saving • {(437299.2 - 162000) x 100}/ 437299.2 = 62.95% 65-70%
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Operational stats 2 TB data 16.2 billion items ~ 8 million reads per minute All at <3ms read latency
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. But wait - There’s more about DynamoDB A list of all DynamoDB sessions, workshops, and chalk talks • Migrating Apache Cassandra to DynamoDB • What’s new with DynamoDB • Purpose-built databases in AWS • DynamoDB service level agreement • Adaptive capacity • Point-in-time recovery (PITR) • Global tables
  • 35. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Anirban Roy LinkedIn: anirban51roy
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.