SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Scaling Slack during
explosive growth
Javier Turegano (he/him)
@setoide
Agenda ● Intro
● Explosive growth
● Investing in scalability
● When things go wrong
Javier Turegano
@setoide
(he/him)
Sr. Engineering Manager -
Cloud Eng - APAC
It all started as
a game
ORIGIN
Today
SLACK
IT and securityMarketing Design HR FinanceFile sharing
Dev tools Communications Analytics Support ProductivitySales
Slack integrates with 1800+ tools teams use daily
SOME STATS
130K+ paid customers
150+ countries
65 Fortune 100 customers
2,000+ employees
Explosive growth
A good portion
of the planet
starts to move
to WFH due to
covid-19
ORIGIN
Off-course we
have a bot to
track this in
Slack
ORIGIN
Investing in scalability
Hack-HHVM
Load testing
frameworks
On-boarding
bigger
enterprises
Data Stores
improvements
Disaster-theatre
pieces
Client Re-
architecture
Cloud Infra
Advancements
New network
architecture
Scalability investments over time
* This is a made-up timeline as an example. Scaling Slack - The Good, the Unexpected, and the Road Ahead
Reacting to explosive growth
👩
🏿
💻
Reacting to explosive growth
Reacting to explosive growth
#help-cloud-econ
Edge
POPs
Load
Balancing
API
Data
Stores
Basic layered architecture
Engineering stats as of April 2020
At the edge
Client to edge
AT THE EDGE
Edge POP
Apps / Chatbots
API
Edge POP
Websocket
HTTPS
HTTPS
WS
HTTPS
Achieving massive scale in a brave new (front-end)
world
Message
handling and
caching
AT THE EDGE
Edge
Cache
Golang
Edge POP
Websocket
Websocket
Flannel: An Application-Level Edge Cache
to Make Slack Scale
Message
handling and
caching
AT THE EDGE
Edge
Cache
Golang
Edge POP
Websocket
Websocket
Processing file
uploads at the
edge
AT THE EDGE
S3
Edge POP
File upload
Rate limiting &
degraded
mode
AT THE EDGE
HTTP/1.1 429 Too Many Requests
Accept: application/json, text/plain
Content-Type: application/x-www-form-urlencoded
Date: Tue, 29 Jan 2019 18:41:22 GMT
retry-after: 3
{
"ok": false,
"error": "ratelimited"
}
🚧
Handling Rate Limits with Slack APIs
Edge
POPs
Load
Balancing
API - B
Cache
Data
Stores
150M messages
sent per minute
during peak hours
13M simultaneous websocket
connections per day
At the edge
Scaling compute
High performance
Scaling compute
● Hacklang / HHVM
Hacklang at Slack: A Better PHP
Horizontal Auto
Scaling
Scaling compute
● Stateless
● Predictive
Scaling
● Dynamic
Scaling
https://aws.amazon.com/blogs/aws/new-predictive-scaling-for-ec2-powered-by-
machine-learning/
API farms
Scaling compute
● Different API groups depending
on the characteristics of the work API - B API -CAPI - A
Load
Balancing
Load
Balancing
API - B API -CAPI - A
Edge
POPs
Data
Stores
14B HTTP requests per day
Dedicated API farms
Async
processing +
decoupling
Scaling compute
API - B
Job
Executors
Queue (s)
Async
processing at
Slack
Scaling compute
https://slack.engineering/scaling-slacks-job-queue/
Edge
POPs
Load
Balancing
API - B API -CAPI - A
Cache
Data
Stores
Job
Executors
Queue (s)
5B background jobs enqueued
per day
Async jobs
Data engines
Specialized
data engines
Data Engines
● Search
Rebuilding Message Search at Slack
Scaling your stateful tier
Data Engines
● Separate reads and writes
● Caching
Data
Stores
(reads)
API - B
Caching
Data
Stores
(writes)
Sharding
databases with
Vitess
Data Engines
A Journey into Slack’s Database Service
Edge
POPs
Load
Balancing
API - B API -CAPI - A
Search
Engine
Cache
Data
Shards
Cache
Job
Executors
Job
Queue
65b database
queries per day
9 PB of database storage
Scaling our data engines
When things go wrong
Incident management at Slack (in Slack)
When things go wrong
Incident bot and incident channels at the centre of our response
#incd-53716
Adam Pretti 3:47PM
joined #incd-53716
#incd-53716
Thread
#incd-53716
But we also have a couple of backup strategies if/when Slack is not
available.
Users are unable to connect to Slack
When things go wrong
How it felt that
day
When things go wrong
All Hands on Deck
Edge
POPs
Load
Balancing
API - B API -CAPI - A
Search
Engine
Cache
Data
Shards
Cache
Job
Executors
Queue (s)
A problem in any of your tiers has the
potential to affect your whole service
Houston, we’ve got a problem.
Technical
details
When things go wrong
Load
Balancing
A Terrible, Horrible, No-Good, Very Bad Day at Slack
Incident
Review
process
When things go wrong
Thank you
@setoide

Weitere ähnliche Inhalte

Was ist angesagt?

Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesisJampp
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processJampp
 
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...Spark Summit
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiBrian Olsen
 
Spark Summit EU: IBM Keynote
Spark Summit EU: IBM KeynoteSpark Summit EU: IBM Keynote
Spark Summit EU: IBM Keynotesparktc
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
Connecting Apache Kafka to Cash
Connecting Apache Kafka to CashConnecting Apache Kafka to Cash
Connecting Apache Kafka to Cashconfluent
 
KantanFest: Tony O'Dowd
KantanFest: Tony O'DowdKantanFest: Tony O'Dowd
KantanFest: Tony O'Dowdkantanmt
 
Blueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data PlatformBlueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data PlatformMatt Stubbs
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Matt Stubbs
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to ProductionMostafa Majidpour
 
DataProphet Building with AI/ML - AWS Startup Day Johannesburg.pdf
DataProphet Building with AI/ML - AWS Startup Day Johannesburg.pdfDataProphet Building with AI/ML - AWS Startup Day Johannesburg.pdf
DataProphet Building with AI/ML - AWS Startup Day Johannesburg.pdfAmazon Web Services
 
Zalando Tech: From Java to Scala in Less Than Three Months
Zalando Tech: From Java to Scala in Less Than Three MonthsZalando Tech: From Java to Scala in Less Than Three Months
Zalando Tech: From Java to Scala in Less Than Three MonthsZalando Technology
 
MongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB World 2019: Streaming ETL on the Shoulders of GiantsMongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB World 2019: Streaming ETL on the Shoulders of GiantsMongoDB
 
SITIST 2017 Dev - Alexa Custom Skill Development with SAP HANA XSA
SITIST 2017 Dev - Alexa Custom Skill Development with SAP HANA XSASITIST 2017 Dev - Alexa Custom Skill Development with SAP HANA XSA
SITIST 2017 Dev - Alexa Custom Skill Development with SAP HANA XSAsitist
 
Spark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul BhambhriSpark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul BhambhriJen Aman
 
Micro services and devops on aws to accelerate innovation cwin18-toulouse
Micro services and devops on aws to accelerate innovation cwin18-toulouseMicro services and devops on aws to accelerate innovation cwin18-toulouse
Micro services and devops on aws to accelerate innovation cwin18-toulouseCapgemini
 

Was ist angesagt? (20)

Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
 
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel Pedreschi
 
Spark Summit EU: IBM Keynote
Spark Summit EU: IBM KeynoteSpark Summit EU: IBM Keynote
Spark Summit EU: IBM Keynote
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Connecting Apache Kafka to Cash
Connecting Apache Kafka to CashConnecting Apache Kafka to Cash
Connecting Apache Kafka to Cash
 
EVOLVE'13 | Maximize | Migration | Stephen Moore
EVOLVE'13 | Maximize | Migration | Stephen MooreEVOLVE'13 | Maximize | Migration | Stephen Moore
EVOLVE'13 | Maximize | Migration | Stephen Moore
 
KantanFest: Tony O'Dowd
KantanFest: Tony O'DowdKantanFest: Tony O'Dowd
KantanFest: Tony O'Dowd
 
Blueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data PlatformBlueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data Platform
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
DataProphet Building with AI/ML - AWS Startup Day Johannesburg.pdf
DataProphet Building with AI/ML - AWS Startup Day Johannesburg.pdfDataProphet Building with AI/ML - AWS Startup Day Johannesburg.pdf
DataProphet Building with AI/ML - AWS Startup Day Johannesburg.pdf
 
Zalando Tech: From Java to Scala in Less Than Three Months
Zalando Tech: From Java to Scala in Less Than Three MonthsZalando Tech: From Java to Scala in Less Than Three Months
Zalando Tech: From Java to Scala in Less Than Three Months
 
MongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB World 2019: Streaming ETL on the Shoulders of GiantsMongoDB World 2019: Streaming ETL on the Shoulders of Giants
MongoDB World 2019: Streaming ETL on the Shoulders of Giants
 
Instrumenting your Instruments
Instrumenting your Instruments Instrumenting your Instruments
Instrumenting your Instruments
 
Kafka at trivago
Kafka at trivagoKafka at trivago
Kafka at trivago
 
SITIST 2017 Dev - Alexa Custom Skill Development with SAP HANA XSA
SITIST 2017 Dev - Alexa Custom Skill Development with SAP HANA XSASITIST 2017 Dev - Alexa Custom Skill Development with SAP HANA XSA
SITIST 2017 Dev - Alexa Custom Skill Development with SAP HANA XSA
 
Spark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul BhambhriSpark Summit East Keynote by Anjul Bhambhri
Spark Summit East Keynote by Anjul Bhambhri
 
Micro services and devops on aws to accelerate innovation cwin18-toulouse
Micro services and devops on aws to accelerate innovation cwin18-toulouseMicro services and devops on aws to accelerate innovation cwin18-toulouse
Micro services and devops on aws to accelerate innovation cwin18-toulouse
 

Ähnlich wie Scaling Slack during explosive growth

Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsInside Analysis
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Automate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business ImpactAutomate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business ImpactCA Technologies
 
Sanath pabba hadoop resume 1.0
Sanath pabba hadoop resume 1.0Sanath pabba hadoop resume 1.0
Sanath pabba hadoop resume 1.0Pabba Gupta
 
2309 sap enterprise architecture in the era of sap hana, infrastructure, plat...
2309 sap enterprise architecture in the era of sap hana, infrastructure, plat...2309 sap enterprise architecture in the era of sap hana, infrastructure, plat...
2309 sap enterprise architecture in the era of sap hana, infrastructure, plat...Dao Van Hang
 
Be the Data Hero in Your Organization with SAP and CA Analytic Solutions
Be the Data Hero in Your Organization with SAP and CA Analytic SolutionsBe the Data Hero in Your Organization with SAP and CA Analytic Solutions
Be the Data Hero in Your Organization with SAP and CA Analytic SolutionsCA Technologies
 
Discussing strategies for building the next gen data centre
Discussing strategies for building the next gen data centreDiscussing strategies for building the next gen data centre
Discussing strategies for building the next gen data centreICT-Partners
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsAnant Corporation
 
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Precisely
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Triggr In
 
AI talk: 傳統軟體開發組織如何轉型落地成為AI團隊
AI talk: 傳統軟體開發組織如何轉型落地成為AI團隊AI talk: 傳統軟體開發組織如何轉型落地成為AI團隊
AI talk: 傳統軟體開發組織如何轉型落地成為AI團隊William Tai
 
Heroku - developer playground
Heroku - developer playground Heroku - developer playground
Heroku - developer playground Troy Sellers
 
Deep dive session - sap and aws - extend and innovate
Deep dive session - sap and aws - extend and innovateDeep dive session - sap and aws - extend and innovate
Deep dive session - sap and aws - extend and innovateRitesh Toshniwal
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 

Ähnlich wie Scaling Slack during explosive growth (20)

Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both Worlds
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Automate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business ImpactAutomate Hadoop Jobs with Real World Business Impact
Automate Hadoop Jobs with Real World Business Impact
 
Sanath pabba hadoop resume 1.0
Sanath pabba hadoop resume 1.0Sanath pabba hadoop resume 1.0
Sanath pabba hadoop resume 1.0
 
Salesforce platform session 2
 Salesforce platform session 2 Salesforce platform session 2
Salesforce platform session 2
 
Opening Keynote
Opening KeynoteOpening Keynote
Opening Keynote
 
2309 sap enterprise architecture in the era of sap hana, infrastructure, plat...
2309 sap enterprise architecture in the era of sap hana, infrastructure, plat...2309 sap enterprise architecture in the era of sap hana, infrastructure, plat...
2309 sap enterprise architecture in the era of sap hana, infrastructure, plat...
 
Be the Data Hero in Your Organization with SAP and CA Analytic Solutions
Be the Data Hero in Your Organization with SAP and CA Analytic SolutionsBe the Data Hero in Your Organization with SAP and CA Analytic Solutions
Be the Data Hero in Your Organization with SAP and CA Analytic Solutions
 
DNS in the Cloud
DNS in the CloudDNS in the Cloud
DNS in the Cloud
 
DNS is Sexy
DNS is SexyDNS is Sexy
DNS is Sexy
 
Varun-CV-J
Varun-CV-JVarun-CV-J
Varun-CV-J
 
Discussing strategies for building the next gen data centre
Discussing strategies for building the next gen data centreDiscussing strategies for building the next gen data centre
Discussing strategies for building the next gen data centre
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
 
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April
 
FG Work
FG WorkFG Work
FG Work
 
AI talk: 傳統軟體開發組織如何轉型落地成為AI團隊
AI talk: 傳統軟體開發組織如何轉型落地成為AI團隊AI talk: 傳統軟體開發組織如何轉型落地成為AI團隊
AI talk: 傳統軟體開發組織如何轉型落地成為AI團隊
 
Heroku - developer playground
Heroku - developer playground Heroku - developer playground
Heroku - developer playground
 
Deep dive session - sap and aws - extend and innovate
Deep dive session - sap and aws - extend and innovateDeep dive session - sap and aws - extend and innovate
Deep dive session - sap and aws - extend and innovate
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 

Mehr von Javier Turégano Molina

Sprinkle your Devops platform with product thinking.pdf
Sprinkle your Devops platform with product thinking.pdfSprinkle your Devops platform with product thinking.pdf
Sprinkle your Devops platform with product thinking.pdfJavier Turégano Molina
 
Building Slack's internal developer platform as a product.pdf
Building Slack's internal developer platform as a product.pdfBuilding Slack's internal developer platform as a product.pdf
Building Slack's internal developer platform as a product.pdfJavier Turégano Molina
 
Scaling the culture of ownership at realestate.com.au
Scaling the culture of ownership at realestate.com.auScaling the culture of ownership at realestate.com.au
Scaling the culture of ownership at realestate.com.auJavier Turégano Molina
 
Introduction to Devops (Melbourne University)
Introduction to Devops (Melbourne University)Introduction to Devops (Melbourne University)
Introduction to Devops (Melbourne University)Javier Turégano Molina
 
Configuration management - A "love" story
Configuration management - A "love" storyConfiguration management - A "love" story
Configuration management - A "love" storyJavier Turégano Molina
 
Experiencias en la administración de sistemas con Software LIbre en empresas TIC
Experiencias en la administración de sistemas con Software LIbre en empresas TICExperiencias en la administración de sistemas con Software LIbre en empresas TIC
Experiencias en la administración de sistemas con Software LIbre en empresas TICJavier Turégano Molina
 

Mehr von Javier Turégano Molina (20)

Sprinkle your Devops platform with product thinking.pdf
Sprinkle your Devops platform with product thinking.pdfSprinkle your Devops platform with product thinking.pdf
Sprinkle your Devops platform with product thinking.pdf
 
Building Slack's internal developer platform as a product.pdf
Building Slack's internal developer platform as a product.pdfBuilding Slack's internal developer platform as a product.pdf
Building Slack's internal developer platform as a product.pdf
 
Scaling the culture of ownership at realestate.com.au
Scaling the culture of ownership at realestate.com.auScaling the culture of ownership at realestate.com.au
Scaling the culture of ownership at realestate.com.au
 
Introduction to Devops (Melbourne University)
Introduction to Devops (Melbourne University)Introduction to Devops (Melbourne University)
Introduction to Devops (Melbourne University)
 
Devopsgirls bootcamp3-next
Devopsgirls bootcamp3-nextDevopsgirls bootcamp3-next
Devopsgirls bootcamp3-next
 
FinOps
FinOpsFinOps
FinOps
 
The devops laboratory - 1 year later
The devops laboratory - 1 year laterThe devops laboratory - 1 year later
The devops laboratory - 1 year later
 
Performance beyond IT
Performance beyond ITPerformance beyond IT
Performance beyond IT
 
Devops101
Devops101Devops101
Devops101
 
The Devops Laboratory
The Devops LaboratoryThe Devops Laboratory
The Devops Laboratory
 
Web performance101
Web performance101Web performance101
Web performance101
 
The Ops dojo
The Ops dojoThe Ops dojo
The Ops dojo
 
Configuration management - A "love" story
Configuration management - A "love" storyConfiguration management - A "love" story
Configuration management - A "love" story
 
Velocity and DevopsDays 2013 takeaways
Velocity and DevopsDays 2013 takeawaysVelocity and DevopsDays 2013 takeaways
Velocity and DevopsDays 2013 takeaways
 
Mcollective introduction
Mcollective introductionMcollective introduction
Mcollective introduction
 
Devopsen tic
Devopsen ticDevopsen tic
Devopsen tic
 
Experiencias en la administración de sistemas con Software LIbre en empresas TIC
Experiencias en la administración de sistemas con Software LIbre en empresas TICExperiencias en la administración de sistemas con Software LIbre en empresas TIC
Experiencias en la administración de sistemas con Software LIbre en empresas TIC
 
Gestionando servidores con Puppet
Gestionando servidores con PuppetGestionando servidores con Puppet
Gestionando servidores con Puppet
 
Saas For Public Administration
Saas For Public AdministrationSaas For Public Administration
Saas For Public Administration
 
Administrando Jboss
Administrando JbossAdministrando Jboss
Administrando Jboss
 

Kürzlich hochgeladen

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Scaling Slack during explosive growth

Hinweis der Redaktion

  1. Thanks for having me! It’s a pleasure to be here and I hope everybody is doing alright in these difficult days we are living. This is probably going to be the less Security focus talk in the conference but I hope still relevant to you.
  2. Spaniard Moved to Melbourne 8 years ago after 1 year in freezing Cambridge in the UK I am an Open Source, IT leadership and Devops enthusiast I’ve been with Slack for a year, running the Cloud Engineering team split between Melbourne and San Francisco I’m a co-organizer of the Devopsgirls community.
  3. Maybe move to the other section
  4. This is a made-up timeline to use as a talking point. Our capacity to scale is due to the improvements and investments we have done over the years, and those on the timeline can be some examples. One thing that has help is growing with our customers. As we are taking bigger orgs we had to scale our systems to meet their needs. Number of users, messages, etc… @channel example Capacity planning / load testing Disaster piece - Chaos engineering Recommend in detail talk from Demmer, principal engineer on how we’ve implemented some of this improvements.
  5. Guidance from leadership: look after ourselves, keep Slack up and keep delivering value to customers Engineering teams focus: Looking for ceilings, bottlenecks and known problems in your architecture/infrastructure/services and figure out where we had to scale our systems
  6. As an example: Some systems had to scale from supporting 100,000 rpms to sometimes double that capacity at peak There are many different techniques that allowed us for the flexibility required to be able to react. We will cover some today.
  7. Also an important component was managing the increase cost of our infra and how to optimize that #help-cloud-econ
  8. Even if lego pieces are quite cool, I’ll be using a super simplified version of a system architecture view to walk through the different scalability improvements we applied over time Due to time constraints on my talk a lot of the slides will have a link that expands on the particular topics for those interested All the engineering stats presented are based on figures of April 2020
  9. We have different types of clients: Apps / Chat-bots (our own and third parties), Slack client for laptops and mobile and web clients. We deploy our own Edge POPs (Points of presence) to terminate those connections closer to our customers. We have Edge POPs in most AWS regions over the world. Two ways of connecting: Websockets and HTTPS
  10. We also do a lot of processing at the Edge making it faster for users but also offloading a lot of the work from the main APIs. For example, we do a lot of the message massaging/processing/enrichment at the edge. A previous version of our Edge stack was composed of series of Java processes handling Real Time messaging In order to scale an initial process that handles websocket connection and caching was created in Go called Flannel A good example is when a user connects first to a particular team in the morning, we preload a lot of information There is information required for almost any message transaction that is loaded from cache close to 100% times of the times (for example team_ids)
  11. We can see in this slide how one super common action benefits from caching channels and users when we start typing in the quick-switcher.
  12. Another example is file upload processing. Instead of doing this centrally we delegate it to our Edge Pops. Upload happens faster as its closer to the user and we can to all the image processing (metadata additions, security checks, setting up permissions, etc…)
  13. Apps and chatbots and some time even normal clients may have the capability of overloading your API. Two techniques can be used here to control this behaviour: Rate limiting Client in degraded mode We can also implement rate limiting between services using native language capabilities or more recently Envoy
  14. A good portion of the Slack APIs were originally written in PHP. At some point these were migrated to a combination of: Hack: a strongly typed version of PHP HHVM: a highly performance execution engine for Hack Both originally developed by Facebook and then opensourced. One of the motivations to move to Hack was developer productivity.
  15. We can divide our API into different farms depending on the tasks that they are performing and that will allow us to adapt the underlying infrastructure. For example: API for collecting statistics For background jobs
  16. This is the current implementation We are working in simplifying some of the components and using SQS
  17. Originally developed by YouToube and opensourced. It provides a sharding and topology management solution that sits on top of Mysql You don’t need to modify your application as they connect to VTGates as they were accessing a single database But in the background allows us to shard data not only by team but also by different tables: user, channel, etc… regardless of the database servers serving it We moved from master-master to master-slave and manage failover ---- Kubernetes
  18. In a nutshell Anyone can trigger an incident using the /assemble command We have a 24x7 rotation of Incident commander and Customer Experience liaisons When paged by assemble they join the channel from where someone called assemble and a video call for rapid categorization is setup Depending on severity we use incident bot to create a new incident channel and notify relevant parties when a high severity Required responders and SMEs are added to the channel /escalate command A lot of the context of how we responded lives in the channel and it’s super valuable to extract lessons from the incident Automated processes to create CAN reports (Conditions, Actions, Needs) and a Jira ticket to coordinate Incident commander makes sure that once we are All Clear someone is assigned to run an Incident Review If Slack is not available for response (Separate Slack workspace, Zoom, Google group)