"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

Fwdays
FwdaysFwdays
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
https://nbomber.com
https://github.com/stereodb
AGENDA
- intro
- why stateless is slow and less reliable
- tools for building stateful services
PART I
intro to sportsbook domain
and
how we come to stateful
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
- quite big payloads: 30 KB compressed data (1.5 MB uncompressed)
- update rate: 2K RPS (per tenant)
- user query rate: 3-4K RPS (per tenant)
- live data is very dynamic: no much sense to cache it
- data should be queryable: simple KV is not enough
- we need secondary indexes
20 KB payload for concurrent read and write
Redis, single node: 4vcpu - 8gb
redis_write: 4K RPS, p75 = 543, p95 = 688, p99 = 842
redis read: 7K RPS, p75 = 970, p95 = 1278, p99 = 1597
API API API
DB
API
Cache
DB
but Cache is not queryable
API + DB
Stateful Service
state
state
state
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
CDC (Debezium)
DB
How to handle a case when
your data is larger than RAM?
10 GB 30 GB
Solution 1: use memory DB that supports data larger than RAM
10 GB
20 GB
UA PL FR
Solution 2: use partition by tenant
Solution 3: use range-based sharding
users
(1-500)
users
(501-1000)
shard A shard B
PART II
why stateless is slow
API + DB
Stateful Service
API
Cache
DB
network latency
network latency
Latency Numbers
Latency
2010 2020
Compress 1KB with Zippy 2μs 2μs
Read 1 MB sequentially from RAM 30μs 3μs
Read 1 MB sequentially from SSD 494μs 49μs
Read 1 MB sequentially from disk 3ms 825μs
Round trip within same datacenter 500μs 500μs
Send packet CA -> Netherlands -> CA 150ms 150ms
https://colin-scott.github.io/personal_website/research/interactive_latency.html
API
Cache
DB
CPU: for serialize/deserialize
CPU: serialize/deserialize
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: serialize/deserialize
CPU: ASYNC request handling
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
CPU for handling query (very cheap compared to
serialization)
API
Cache
DB
CPU: for serialize/deserialize
CPU for ASYNC request handling
CPU: managing sockets
Overreads
CPU: serialize/deserialize
CPU: ASYNC request handling
CPU: managing sockets
API + DB
Stateful Service
CPU for serialize (we don’t need to deserialize)
CPU for managing sockets (only clients sockets )
CPU for handling query (very cheap compared to
serialization)
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
Object hit rate / Transactional hit rate
A B
C
API
In order to fulfill our transactional flow we need to
fetch records: A, B, C
Record A and B will not impact our latency
Overall Latency = Latency of record C
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
Most existing cache eviction algorithms focus on maximizing
object hit rate, or the fraction of single object requests served
from cache. However, this approach fails to capture the
inter-object dependencies within transactions.
async / await
async / await
Imagine that we run Redis on localhost. Even with such setup we
usually use async request handling.
public void SimpleMethod()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = Add(i, i);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
private int Add(int a, int b) => a + b;
public async Task SimpleMethodAsync()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private Task<int> AddAsync(int a, int b)
{
return Task.FromResult(a + b);
}
public async Task SimpleMethodAsyncYield()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private async Task<int> AddAsync(int a, int b)
{
await Task.Yield();
return a + b;
}
public async Task SimpleMethodAsyncYield()
{
var k = 0;
for (int i = 0; i < Iterations; i++)
{
k = await AddAsync(i, i);
}
}
private async Task<int> AddAsync(int a, int b)
{
await Task.Yield();
return await Task.Run(() => a + b);
}
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
PART III
why stateless is less reliable
API
Cache
DB
API + DB
Stateful Service
We have a higher probability of failure
API
Cache
DB
circuit breaker
retry
fallback
timeout
bulkhead isolation
circuit breaker
retry
fallback
timeout
bulkhead isolation
API + DB
Stateful Service
API
Cache
DB
What about cache invalidation
and data consistency?
API + DB
Stateful Service
API
Cache
DB
What about the predictable scale-out?
Will your RPS increase if you add an
additional API or Cache node?
API + DB
Stateful Service
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
- Metastable failures occur in open systems with an uncontrolled source of
load where a trigger causes the system to enter a bad state that persists
even when the trigger is removed.
- Paradoxically, the root cause of these failures is often features that
improve the efficiency or reliability of the system.
- The characteristic of a metastable failure is that the sustaining effect keeps
the system in the metastable failure state even after the trigger is
removed.
At least 4 out of 15 major outages in the
last decade at Amazon Web Services
were caused by metastable failures.
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
PART IV
tools for building stateful services
distributed log with sync replication
In-process memory DB
SQL OLAP
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan
Dynamo Kyiv vs Chelsea
2 : 1
Red card
Score
changed
Odds
changed
PUSH
PULL
- quite big payloads: 30 KB compressed data (1.5 MB uncompressed)
- update rate: 2K RPS (per tenant)
- user query rate: 3-4K RPS (per tenant)
- live data is very dynamic: no much sense to cache it
- data should be queryable: simple KV is not enough
- we need secondary indexes
At pick to handle big load for 1 tenant we have:
5-10 nodes, 0.5-2 CPU, 6GB RAM
THANKS
always benchmark
https://twitter.com/antyadev
1 von 60

Recomendados

Load Balancing MySQL with HAProxy - Slides von
Load Balancing MySQL with HAProxy - SlidesLoad Balancing MySQL with HAProxy - Slides
Load Balancing MySQL with HAProxy - SlidesSeveralnines
11.3K views25 Folien
Synapse 2018 Guarding against failure in a hundred step pipeline von
Synapse 2018 Guarding against failure in a hundred step pipelineSynapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipelineCalvin French-Owen
120 views112 Folien
Anton Moldovan "Building an efficient replication system for thousands of ter... von
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Fwdays
150 views114 Folien
A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent... von
A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent...A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent...
A Day in the Life of a Cloud Network Engineer at Netflix - NET303 - re:Invent...Amazon Web Services
9.4K views89 Folien
Fighting Against Chaotically Separated Values with Embulk von
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
2.1K views45 Folien
Microservice bus tutorial von
Microservice bus tutorialMicroservice bus tutorial
Microservice bus tutorialHuabing Zhao
708 views19 Folien

Más contenido relacionado

Similar a "Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

L’odyssée d’une requête HTTP chez Scaleway von
L’odyssée d’une requête HTTP chez ScalewayL’odyssée d’une requête HTTP chez Scaleway
L’odyssée d’une requête HTTP chez ScalewayScaleway
293 views41 Folien
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF von
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFWebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFAlexandre Gouaillard
760 views19 Folien
DPC2007 PHP And Oracle (Kuassi Mensah) von
DPC2007 PHP And Oracle (Kuassi Mensah)DPC2007 PHP And Oracle (Kuassi Mensah)
DPC2007 PHP And Oracle (Kuassi Mensah)dpc
849 views34 Folien
Cassandra at teads von
Cassandra at teadsCassandra at teads
Cassandra at teadsRomain Hardouin
6.2K views86 Folien
StrongLoop Overview von
StrongLoop OverviewStrongLoop Overview
StrongLoop OverviewShubhra Kar
2.3K views44 Folien
Getting Started with Amazon Redshift von
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
1K views59 Folien

Similar a "Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan (20)

L’odyssée d’une requête HTTP chez Scaleway von Scaleway
L’odyssée d’une requête HTTP chez ScalewayL’odyssée d’une requête HTTP chez Scaleway
L’odyssée d’une requête HTTP chez Scaleway
Scaleway293 views
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF von Alexandre Gouaillard
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFWebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
DPC2007 PHP And Oracle (Kuassi Mensah) von dpc
DPC2007 PHP And Oracle (Kuassi Mensah)DPC2007 PHP And Oracle (Kuassi Mensah)
DPC2007 PHP And Oracle (Kuassi Mensah)
dpc849 views
StrongLoop Overview von Shubhra Kar
StrongLoop OverviewStrongLoop Overview
StrongLoop Overview
Shubhra Kar2.3K views
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai... von François-Guillaume Ribreau
Choisir entre une API  RPC, SOAP, REST, GraphQL?  
Et si le problème était ai...Choisir entre une API  RPC, SOAP, REST, GraphQL?  
Et si le problème était ai...
Choisir entre une API RPC, SOAP, REST, GraphQL? 
Et si le problème était ai...
EEDC 2010. Scaling Web Applications von Expertos en TI
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web Applications
Expertos en TI593 views
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland von Fuenteovejuna
Shared Personalization Service - How To Scale to 15K RPS, Patrice PellandShared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Fuenteovejuna 551 views
Kafka elastic search meetup 09242018 von Ying Xu
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018
Ying Xu175 views
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark von Michael Stack
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
Michael Stack742 views
초보자를 위한 분산 캐시 이야기 von OnGameServer
초보자를 위한 분산 캐시 이야기초보자를 위한 분산 캐시 이야기
초보자를 위한 분산 캐시 이야기
OnGameServer12K views
Sql sever engine batch mode and cpu architectures von Chris Adkin
Sql sever engine batch mode and cpu architecturesSql sever engine batch mode and cpu architectures
Sql sever engine batch mode and cpu architectures
Chris Adkin1K views
How To Set Up SQL Load Balancing with HAProxy - Slides von Severalnines
How To Set Up SQL Load Balancing with HAProxy - SlidesHow To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - Slides
Severalnines21.3K views
Oracle Client Failover - Under The Hood von Ludovico Caldara
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The Hood
Ludovico Caldara1.9K views
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D... von Vadym Kazulkin
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D..."Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
"Production-ready Serverless Java Applications in 3 weeks" at AWS Community D...
Vadym Kazulkin84 views
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호 von Amazon Web Services Korea
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели... von Ontico
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Ontico3.2K views
Extending Piwik At R7.com von Leo Lorieri
Extending Piwik At R7.comExtending Piwik At R7.com
Extending Piwik At R7.com
Leo Lorieri2.8K views

Más de Fwdays

"The role of CTO in a classical early-stage startup", Eugene Gusarov von
"The role of CTO in a classical early-stage startup", Eugene Gusarov"The role of CTO in a classical early-stage startup", Eugene Gusarov
"The role of CTO in a classical early-stage startup", Eugene GusarovFwdays
32 views43 Folien
"Cross-functional teams: what to do when a new hire doesn’t solve the busines... von
"Cross-functional teams: what to do when a new hire doesn’t solve the busines..."Cross-functional teams: what to do when a new hire doesn’t solve the busines...
"Cross-functional teams: what to do when a new hire doesn’t solve the busines...Fwdays
32 views29 Folien
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... von
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...Fwdays
43 views30 Folien
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur von
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
"Thriving Culture in a Product Company — Practical Story", Volodymyr TsukurFwdays
43 views31 Folien
"Fast Start to Building on AWS", Igor Ivaniuk von
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor IvaniukFwdays
46 views76 Folien
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ... von
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...Fwdays
37 views39 Folien

Más de Fwdays(20)

"The role of CTO in a classical early-stage startup", Eugene Gusarov von Fwdays
"The role of CTO in a classical early-stage startup", Eugene Gusarov"The role of CTO in a classical early-stage startup", Eugene Gusarov
"The role of CTO in a classical early-stage startup", Eugene Gusarov
Fwdays32 views
"Cross-functional teams: what to do when a new hire doesn’t solve the busines... von Fwdays
"Cross-functional teams: what to do when a new hire doesn’t solve the busines..."Cross-functional teams: what to do when a new hire doesn’t solve the busines...
"Cross-functional teams: what to do when a new hire doesn’t solve the busines...
Fwdays32 views
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... von Fwdays
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
Fwdays43 views
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur von Fwdays
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
Fwdays43 views
"Fast Start to Building on AWS", Igor Ivaniuk von Fwdays
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor Ivaniuk
Fwdays46 views
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ... von Fwdays
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
Fwdays37 views
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi von Fwdays
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
Fwdays28 views
"How we switched to Kanban and how it integrates with product planning", Vady... von Fwdays
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...
Fwdays67 views
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ... von Fwdays
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ..."Bringing Flutter to Tide: a case study of a leading fintech platform in the ...
"Bringing Flutter to Tide: a case study of a leading fintech platform in the ...
Fwdays24 views
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov von Fwdays
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov
"Shape Up: How to Develop Quickly and Avoid Burnout", Dmytro Popov
Fwdays64 views
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy von Fwdays
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
Fwdays46 views
From “T” to “E”, Dmytro Gryn von Fwdays
From “T” to “E”, Dmytro GrynFrom “T” to “E”, Dmytro Gryn
From “T” to “E”, Dmytro Gryn
Fwdays36 views
"Why I left React in my TypeScript projects and where ", Illya Klymov von Fwdays
"Why I left React in my TypeScript projects and where ",  Illya Klymov"Why I left React in my TypeScript projects and where ",  Illya Klymov
"Why I left React in my TypeScript projects and where ", Illya Klymov
Fwdays249 views
"KillTech project: through innovation to a winning capability", Yelyzaveta B... von Fwdays
"KillTech project: through innovation to a winning capability",  Yelyzaveta B..."KillTech project: through innovation to a winning capability",  Yelyzaveta B...
"KillTech project: through innovation to a winning capability", Yelyzaveta B...
Fwdays230 views
"Dude, where’s my boilerplate? ", Oleksii Makodzeba von Fwdays
"Dude, where’s my boilerplate? ", Oleksii Makodzeba"Dude, where’s my boilerplate? ", Oleksii Makodzeba
"Dude, where’s my boilerplate? ", Oleksii Makodzeba
Fwdays116 views
"Pixel-Pushing Pundit Challenges in 2023, or Non-functional Requirements for ... von Fwdays
"Pixel-Pushing Pundit Challenges in 2023, or Non-functional Requirements for ..."Pixel-Pushing Pundit Challenges in 2023, or Non-functional Requirements for ...
"Pixel-Pushing Pundit Challenges in 2023, or Non-functional Requirements for ...
Fwdays90 views
"Do you really need your test environment?", Vlad Kampov von Fwdays
"Do you really need your test environment?", Vlad Kampov "Do you really need your test environment?", Vlad Kampov
"Do you really need your test environment?", Vlad Kampov
Fwdays213 views
"Crafting a Third-Party Banking Library with Web Components and React", Germa... von Fwdays
"Crafting a Third-Party Banking Library with Web Components and React", Germa..."Crafting a Third-Party Banking Library with Web Components and React", Germa...
"Crafting a Third-Party Banking Library with Web Components and React", Germa...
Fwdays180 views
"Generating Types without climbing a tree", Matteo Collina von Fwdays
"Generating Types without climbing a tree", Matteo Collina "Generating Types without climbing a tree", Matteo Collina
"Generating Types without climbing a tree", Matteo Collina
Fwdays89 views
"You Keep Using That Word", Sam Newman von Fwdays
"You Keep Using That Word", Sam Newman"You Keep Using That Word", Sam Newman
"You Keep Using That Word", Sam Newman
Fwdays38 views

Último

Top 10 Strategic Technologies in 2024: AI and Automation von
Top 10 Strategic Technologies in 2024: AI and AutomationTop 10 Strategic Technologies in 2024: AI and Automation
Top 10 Strategic Technologies in 2024: AI and AutomationAutomationEdge Technologies
18 views14 Folien
virtual reality.pptx von
virtual reality.pptxvirtual reality.pptx
virtual reality.pptxG036GaikwadSnehal
11 views15 Folien
Black and White Modern Science Presentation.pptx von
Black and White Modern Science Presentation.pptxBlack and White Modern Science Presentation.pptx
Black and White Modern Science Presentation.pptxmaryamkhalid2916
16 views21 Folien
Attacking IoT Devices from a Web Perspective - Linux Day von
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day Simone Onofri
15 views68 Folien
Igniting Next Level Productivity with AI-Infused Data Integration Workflows von
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
257 views86 Folien
Vertical User Stories von
Vertical User StoriesVertical User Stories
Vertical User StoriesMoisés Armani Ramírez
12 views16 Folien

Último(20)

Black and White Modern Science Presentation.pptx von maryamkhalid2916
Black and White Modern Science Presentation.pptxBlack and White Modern Science Presentation.pptx
Black and White Modern Science Presentation.pptx
maryamkhalid291616 views
Attacking IoT Devices from a Web Perspective - Linux Day von Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri15 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows von Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software257 views
Spesifikasi Lengkap ASUS Vivobook Go 14 von Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang37 views
Five Things You SHOULD Know About Postman von Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman30 views
Case Study Copenhagen Energy and Business Central.pdf von Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... von James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson66 views
Special_edition_innovator_2023.pdf von WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 views
HTTP headers that make your website go faster - devs.gent November 2023 von Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn21 views
handbook for web 3 adoption.pdf von Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex22 views
The details of description: Techniques, tips, and tangents on alternative tex... von BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada126 views
6g - REPORT.pdf von Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
DALI Basics Course 2023 von Ivory Egg
DALI Basics Course  2023DALI Basics Course  2023
DALI Basics Course 2023
Ivory Egg16 views

"Intro to Stateful Services or How to get 1 million RPS from a single node", Anton Moldovan

  • 4. AGENDA - intro - why stateless is slow and less reliable - tools for building stateful services
  • 5. PART I intro to sportsbook domain and how we come to stateful
  • 6. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL
  • 7. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL - quite big payloads: 30 KB compressed data (1.5 MB uncompressed) - update rate: 2K RPS (per tenant) - user query rate: 3-4K RPS (per tenant) - live data is very dynamic: no much sense to cache it - data should be queryable: simple KV is not enough - we need secondary indexes
  • 8. 20 KB payload for concurrent read and write Redis, single node: 4vcpu - 8gb redis_write: 4K RPS, p75 = 543, p95 = 688, p99 = 842 redis read: 7K RPS, p75 = 970, p95 = 1278, p99 = 1597
  • 10. API Cache DB but Cache is not queryable
  • 11. API + DB Stateful Service
  • 15. How to handle a case when your data is larger than RAM? 10 GB 30 GB
  • 16. Solution 1: use memory DB that supports data larger than RAM 10 GB 20 GB
  • 17. UA PL FR Solution 2: use partition by tenant
  • 18. Solution 3: use range-based sharding users (1-500) users (501-1000) shard A shard B
  • 20. API + DB Stateful Service API Cache DB network latency network latency
  • 21. Latency Numbers Latency 2010 2020 Compress 1KB with Zippy 2μs 2μs Read 1 MB sequentially from RAM 30μs 3μs Read 1 MB sequentially from SSD 494μs 49μs Read 1 MB sequentially from disk 3ms 825μs Round trip within same datacenter 500μs 500μs Send packet CA -> Netherlands -> CA 150ms 150ms https://colin-scott.github.io/personal_website/research/interactive_latency.html
  • 22. API Cache DB CPU: for serialize/deserialize CPU: serialize/deserialize API + DB Stateful Service CPU for serialize (we don’t need to deserialize)
  • 23. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: serialize/deserialize CPU: ASYNC request handling API + DB Stateful Service CPU for serialize (we don’t need to deserialize)
  • 24. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets )
  • 25. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets ) CPU for handling query (very cheap compared to serialization)
  • 26. API Cache DB CPU: for serialize/deserialize CPU for ASYNC request handling CPU: managing sockets Overreads CPU: serialize/deserialize CPU: ASYNC request handling CPU: managing sockets API + DB Stateful Service CPU for serialize (we don’t need to deserialize) CPU for managing sockets (only clients sockets ) CPU for handling query (very cheap compared to serialization)
  • 32. Object hit rate / Transactional hit rate
  • 33. A B C API In order to fulfill our transactional flow we need to fetch records: A, B, C Record A and B will not impact our latency Overall Latency = Latency of record C
  • 36. Most existing cache eviction algorithms focus on maximizing object hit rate, or the fraction of single object requests served from cache. However, this approach fails to capture the inter-object dependencies within transactions.
  • 38. async / await Imagine that we run Redis on localhost. Even with such setup we usually use async request handling.
  • 39. public void SimpleMethod() { var k = 0; for (int i = 0; i < Iterations; i++) { k = Add(i, i); } } [MethodImpl(MethodImplOptions.NoInlining)] private int Add(int a, int b) => a + b;
  • 40. public async Task SimpleMethodAsync() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private Task<int> AddAsync(int a, int b) { return Task.FromResult(a + b); }
  • 41. public async Task SimpleMethodAsyncYield() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private async Task<int> AddAsync(int a, int b) { await Task.Yield(); return a + b; }
  • 42. public async Task SimpleMethodAsyncYield() { var k = 0; for (int i = 0; i < Iterations; i++) { k = await AddAsync(i, i); } } private async Task<int> AddAsync(int a, int b) { await Task.Yield(); return await Task.Run(() => a + b); }
  • 45. PART III why stateless is less reliable
  • 46. API Cache DB API + DB Stateful Service We have a higher probability of failure
  • 47. API Cache DB circuit breaker retry fallback timeout bulkhead isolation circuit breaker retry fallback timeout bulkhead isolation API + DB Stateful Service
  • 48. API Cache DB What about cache invalidation and data consistency? API + DB Stateful Service
  • 49. API Cache DB What about the predictable scale-out? Will your RPS increase if you add an additional API or Cache node? API + DB Stateful Service
  • 51. - Metastable failures occur in open systems with an uncontrolled source of load where a trigger causes the system to enter a bad state that persists even when the trigger is removed. - Paradoxically, the root cause of these failures is often features that improve the efficiency or reliability of the system. - The characteristic of a metastable failure is that the sustaining effect keeps the system in the metastable failure state even after the trigger is removed.
  • 52. At least 4 out of 15 major outages in the last decade at Amazon Web Services were caused by metastable failures.
  • 54. PART IV tools for building stateful services
  • 55. distributed log with sync replication
  • 59. Dynamo Kyiv vs Chelsea 2 : 1 Red card Score changed Odds changed PUSH PULL - quite big payloads: 30 KB compressed data (1.5 MB uncompressed) - update rate: 2K RPS (per tenant) - user query rate: 3-4K RPS (per tenant) - live data is very dynamic: no much sense to cache it - data should be queryable: simple KV is not enough - we need secondary indexes At pick to handle big load for 1 tenant we have: 5-10 nodes, 0.5-2 CPU, 6GB RAM