Redis Tames The Caching Herd: Jon Hyman

•

0 likes•170 views

Redis Labs

RedisConf19

Technology

Presenting Today
Jon Hyman 
CTO & Cofounder, Braze
@jon_hyman

It started with an  
Apdex page against
our API at 2:22 AM.

We saw high CPU utilization on an API layer for one of our clusters

Throughput sampled at ~33%, response time was ~5x normal
This computation was taking up most of the API call

Triage
• Our on-call engineer increased the API
autoscale group server count per runbook
• Starting with 57 c4.4xlarge servers, we
added capacity to try to resolve Apdex
• Despite 123 more servers ($71,669/mo
additional cost!), Apdex did not go away
• Adding more servers made things worse
• API continued to throw errors

14 seconds to compute?!?!
Happening ~6k times every 90 seconds?!?!

What was going on?
• High volume of API requests (~20,000/second)
• The customer had added a lot of new IAMs with sophisticated targeting
rules
• Every 90 seconds, ~6,000 API calls took 14 seconds to complete
• Cache stampeding herd issue: once the cache expired, ~6,000 requests
immediately attempted to populate it back
• Computation is CPU-intensive
• Of course this won’t scale!

Redis cache control
• We used Redis to control a refresh of the cache using SETNX
locks
• We extended Memcached TTL to 180 seconds, with 1 process
refreshing the cache every 90 seconds
Full code available at https://github.com/jonhyman/redisconf2019

Success!
API requests loading IAMs dropped to 1 per 90 seconds

Computation now only took 3–4 seconds instead of 14 due to decreased concurrency
Success!

With latency stabilized, we were able to drop back down to 57 API servers
Success!

Thank you! We're hiring!
braze.com/careers
Code available at https://github.com/jonhyman/redisconf2019

What's hot

HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...Michael Stack

RedisConf18 - Remote Monitoring & Controlling Scienific InstrumentsRedis Labs

RedisConf17 - Home Depot - Turbo charging existing applications with RedisRedis Labs

HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataMichael Stack

RedisConf17 - Smartwaiver - Using Redis for Kiosk Registration Command and Co...Redis Labs

RedisConf17 - Building Large High Performance Redis Databases with Redis Ente...Redis Labs

RedisConf18 - Redis Enterprise on Cloud Native Platforms Redis Labs

How to Set Up ApsaraDB for RDS on Alibaba CloudAlibaba Cloud

Redis Reliability, Performance & InnovationRedis Labs

RedisConf17 - Redis Enterprise on IBM Power SystemsRedis Labs

Running Analytics at the Speed of Your BusinessRedis Labs

Zabbix at scale with ElasticsearchLeandro Totino Pereira

Big Data Quickstart Series 3: Perform Data IntegrationAlibaba Cloud

HBaseConAsia2018 Track3-6: HBase at MeituanMichael Stack

AliCloud Object Storage Service (OSS) Core FeaturesAlibaba Cloud

RedisConf18 - Scalable Microservices with Event Sourcing and Redis Redis Labs

Logging infrastructure for Microservices using StreamSets Data CollectorCask Data

From Kafka to BigQuery - Strata SingaporeOfir Sharony

SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020Redis Labs

RedisConf18 - The Versatility of Redis - Powering our critical business using...Redis Labs

What's hot (20)

HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...

RedisConf18 - Remote Monitoring & Controlling Scienific Instruments

RedisConf17 - Home Depot - Turbo charging existing applications with Redis

HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data

RedisConf17 - Smartwaiver - Using Redis for Kiosk Registration Command and Co...

RedisConf17 - Building Large High Performance Redis Databases with Redis Ente...

RedisConf18 - Redis Enterprise on Cloud Native Platforms

How to Set Up ApsaraDB for RDS on Alibaba Cloud

Redis Reliability, Performance & Innovation

RedisConf17 - Redis Enterprise on IBM Power Systems

Running Analytics at the Speed of Your Business

Zabbix at scale with Elasticsearch

Big Data Quickstart Series 3: Perform Data Integration

HBaseConAsia2018 Track3-6: HBase at Meituan

AliCloud Object Storage Service (OSS) Core Features

RedisConf18 - Scalable Microservices with Event Sourcing and Redis

Logging infrastructure for Microservices using StreamSets Data Collector

From Kafka to BigQuery - Strata Singapore

SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020

RedisConf18 - The Versatility of Redis - Powering our critical business using...

Similar to Redis Tames The Caching Herd: Jon Hyman

Solving Problems At Scale With RedisRedis Labs

(BDT403) Best Practices for Building Real-time Streaming Applications with Am...Amazon Web Services

Stephen Liedig: Building Serverless Backends with AWS Lambda and API GatewaySteve Androulakis

Building serverless backends - Tech talk 5 May 2017ARDC

Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services

AWS re:Invent 2016: The State of Serverless Computing (SVR311)Amazon Web Services

Serverlessusecase workshop feb3_v2kartraj

Scaling APIs: Predict, Prepare for, Overcome the ChallengesApigee | Google Cloud

More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...Amazon Web Services

AWS re:Invent 2016: AWS Database State of the Union (DAT320)Amazon Web Services

AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...Amazon Web Services

AWS APAC Webinar Week - Real Time Data Processing with KinesisAmazon Web Services

AWS Kinesis - Streams, Firehose, AnalyticsSerhat Can

What's new in AWS?Amazon Web Services

The Journey To Serverless At Home24 - reflections and insights AWS Germany

How Disney Streaming Services and TrueCar Deliver Web Applications for Scale,...Amazon Web Services

Architecture for Scale [AppFirst]AppFirst

Serverless on AWS : Understanding the hard parts at Froscon 2019Vadym Kazulkin

Scaling habits of ASP.NETDavid Giard

Metrics driven development with dedicated Observability TeamLINE Corporation

Similar to Redis Tames The Caching Herd: Jon Hyman (20)

Solving Problems At Scale With Redis

(BDT403) Best Practices for Building Real-time Streaming Applications with Am...

Stephen Liedig: Building Serverless Backends with AWS Lambda and API Gateway

Building serverless backends - Tech talk 5 May 2017

Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...

AWS re:Invent 2016: The State of Serverless Computing (SVR311)

Serverlessusecase workshop feb3_v2

Scaling APIs: Predict, Prepare for, Overcome the Challenges

More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...

AWS re:Invent 2016: AWS Database State of the Union (DAT320)

AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...

AWS APAC Webinar Week - Real Time Data Processing with Kinesis

AWS Kinesis - Streams, Firehose, Analytics

What's new in AWS?

The Journey To Serverless At Home24 - reflections and insights

How Disney Streaming Services and TrueCar Deliver Web Applications for Scale,...

Architecture for Scale [AppFirst]

Serverless on AWS : Understanding the hard parts at Froscon 2019

Scaling habits of ASP.NET

Metrics driven development with dedicated Observability Team

Recently uploaded

DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity

DBX First Quarter 2024 Investor PresentationDropbox

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Why Teams call analytics are critical to your entire businesspanagenda

Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Understanding the FAA Part 107 License ..Christopher Logan Kennedy

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Exploring Multimodal Embeddings with MilvusZilliz

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Architecting Cloud Native ApplicationsWSO2

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Elevate Developer Efficiency & build GenAI Application with Amazon QBhuvaneswari Subramani

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

DBX First Quarter 2024 Investor Presentation

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Boost Fertility New Invention Ups Success Rates.pdf

Why Teams call analytics are critical to your entire business

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Understanding the FAA Part 107 License ..

AWS Community Day CPH - Three problems of Terraform

Exploring Multimodal Embeddings with Milvus

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Architecting Cloud Native Applications

presentation ICT roal in 21st century education

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Elevate Developer Efficiency & build GenAI Application with Amazon Q

[BuildWithAI] Introduction to Gemini.pdf

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

Redis Tames The Caching Herd: Jon Hyman

1. Presenting Today Jon Hyman  CTO & Cofounder, Braze @jon_hyman

2. Redis Tames the Caching Herd

3. Braze empowers you to humanize your brand’s relationships with your customers at scale. 1 Trillion DATA POINTS PROCESSED PER QUARTER 1B+ MESSAGES SENT DAILY 1.6B MONTHLY ACTIVE USERS

4. It started with an   Apdex page against our API at 2:22 AM.

5. We saw high CPU utilization on an API layer for one of our clusters

6. Throughput sampled at ~33%, response time was ~5x normal This computation was taking up most of the API call

7. Triage • Our on-call engineer increased the API autoscale group server count per runbook • Starting with 57 c4.4xlarge servers, we added capacity to try to resolve Apdex • Despite 123 more servers ($71,669/mo additional cost!), Apdex did not go away • Adding more servers made things worse • API continued to throw errors

8. Braze in-app messaging architecture • Braze SDKs have a business rule engine for when to show in-app messages (“IAM”s) • The client requests IAMs from the API on the app open for that session • The API reads possible IAMs from the database or Memcached • The API computes IAM target criteria against user profile and stores calculated target criteria in Memcached with a TTL of 90 seconds • The API returns a set of possible IAMs to the client device Client Device User 123 IAMs API Servers Database CACHE

9. 14 seconds to compute?!?! Happening ~6k times every 90 seconds?!?!

10. What was going on? • High volume of API requests (~20,000/second) • The customer had added a lot of new IAMs with sophisticated targeting rules • Every 90 seconds, ~6,000 API calls took 14 seconds to complete • Cache stampeding herd issue: once the cache expired, ~6,000 requests immediately attempted to populate it back • Computation is CPU-intensive • Of course this won’t scale!

11. How to fix this? Redis.

12. Redis cache control • We used Redis to control a refresh of the cache using SETNX locks • We extended Memcached TTL to 180 seconds, with 1 process refreshing the cache every 90 seconds Full code available at https://github.com/jonhyman/redisconf2019

13. Success! API requests loading IAMs dropped to 1 per 90 seconds

14. Computation now only took 3–4 seconds instead of 14 due to decreased concurrency Success!

15. With latency stabilized, we were able to drop back down to 57 API servers Success!

16. Thank you! We're hiring! braze.com/careers Code available at https://github.com/jonhyman/redisconf2019

Redis Tames The Caching Herd: Jon Hyman

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Redis Tames The Caching Herd: Jon Hyman

Similar to Redis Tames The Caching Herd: Jon Hyman (20)

More from Redis Labs

More from Redis Labs (20)

Recently uploaded

Recently uploaded (20)

Redis Tames The Caching Herd: Jon Hyman