SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Data
observability
Agenda
Here’s our next :30 minutes
- Intro
- Why?
- How?
- Problems?
Who am I?
Anastasia Khlebnikova
Senior backend/data engineer at Spotify.
What do I do?
Part of the Data and Insights tribe at Spotify
Our team owns: one of the biggest services at Spotify (~1M rps) and
one of the biggest pipelines at Spotify: anonymization of event
delivery
Data observability. WHY?
Where is the data coming from?
Event
Delivery
System
Pseudonymization
Cloud Storage
pseudonymization pipelines
for every event
run every hour
8 million events per second
A bit of scale
● 8.000.000 events per second
● Largest events are around 8 billions events per
hour
● Over 400 unique event types published in
separate datasets
● 500TB of data a day
● We used to own the largest Hadoop cluster in
Europe
How does it feel to be on-call?
If you need to hot fix something in production it is like changing a flat
tire in a car going 200 km/h on a highway without stopping it! The
longer your system is stopped, the longer it will take to catch up. The
time to catch up for downstream consumers will increase
exponentially.
Who needs that much the data?
Once delivered, events are processed by numerous data
jobs currently running in Spotify. There are many different
use cases for which the delivered data is used. Data can
be used to produce music recommendations, analyse our
A/B tests or analyse our client crashes. Most importantly,
delivered data is used to calculate royalties which are paid
to artists based on generated streams.
Data observability. HOW?
Make sure your data is discoverable
To annotate your data is the key to avoid piles of mess!!!
Upsides:
➢ Other people can use/find your data
➢ Sensitive data in the dataset? Encrypt based on annotations
➢ Easy mapping in your code like schema <-> case class
➢ Easier to find which key to join on
Downsides:
➢ you have to do it. Once
Schema Example
Monitor your pipelines. Execution time
Monitor your pipelines. Count it!
Never produce corrupt data! Implement as much sanity-checks as possible.
Example: if your pipeline encrypts the row in the dataset, based on user_id.
And use the random key otherwise (impossible to decrypt)
Count it: Count the %ge of rows where the user_id have not been found or
parsed, thus alert if it increased more than ….10percent?
Sanity check your data and alert!
Monitor your pipelines. Money
GCP DataFlow
GCP BigQuery
GCP storage
Monitor your pipelines. Money. Real incident
$
time
Incident
Monitor your pipelines. Money. Per System
Taking GDPR as an example. How much does it cost to “Download your
data” ? How much should you put inside?
What to monitor:
● How much do we pay for every request?
● The cost above: what is the cost of every pipeline that contributes to
it?
● How many requests do people actually open?
Set up a retention! Storage is ⅓ of the cost
● Setup the default retention. Remove the partitions after the
expiration date
● Profile the storage. Can the cold storage be used (cheap to store,
expensive to access)
● It adds up: multi-regional vs regional buckets. Where the data is
accessed from?
● How the data is used? BigQuery or pipelines
Monitor your pipelines. Alerts on failures
SLAs for your partitions
Concept of low, normal and high priority for
events. It gives us different SLAs for different events
(depending on importance, 6h, 24h, 72h). Thanks to
that we know which events recover first when shit hits
the fan. This also made our life better as normal
priority events will not alert during nights and low
priority events will not alert during weekends.
Dashboards!
Does your infra lose the BCD?
Business critical data: royalty calculations, user accounts, ads etc
➔ High SLO
➔ “Special treatment” when recovering from the incident
➔ …..and special observability since the amount of “BCD” events
are limited
How to prove that no data is lost?
SDK
Service
ReceiverS
ervice P/S
Make hourly
partitions,
dedup,
anonymize
Hourly
partitions
Who is watching the watcher
SDK
Service
ReceiverS
ervice P/S
Make hourly
partitions,
dedup,
anonymize
Hourly
partitions
Streamig
job
NACK
Rejec
ted
Counting
Service
Compare
Data observability. Bottom line
Why to bother?
Data is using a lot and processing and storage is EXPENSIVE. How much
profit does it bring though?
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Spotify architecture - Pressing play
Spotify architecture - Pressing playSpotify architecture - Pressing play
Spotify architecture - Pressing playNiklas Gustavsson
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
The Observability Pipeline
The Observability PipelineThe Observability Pipeline
The Observability PipelineTyler Treat
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...LibbySchulze
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Brian Brazil
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Observability, what, why and how
Observability, what, why and howObservability, what, why and how
Observability, what, why and howNeeraj Bagga
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDatabricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With PrometheusKnoldus Inc.
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes Minio
 
Performance Engineering Masterclass: Efficient Automation with the Help of SR...
Performance Engineering Masterclass: Efficient Automation with the Help of SR...Performance Engineering Masterclass: Efficient Automation with the Help of SR...
Performance Engineering Masterclass: Efficient Automation with the Help of SR...ScyllaDB
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsNilesh Gule
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introductionRico Chen
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 

Was ist angesagt? (20)

Spotify architecture - Pressing play
Spotify architecture - Pressing playSpotify architecture - Pressing play
Spotify architecture - Pressing play
 
Elastic-Engineering
Elastic-EngineeringElastic-Engineering
Elastic-Engineering
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
The Observability Pipeline
The Observability PipelineThe Observability Pipeline
The Observability Pipeline
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Observability, what, why and how
Observability, what, why and howObservability, what, why and how
Observability, what, why and how
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Observability
Observability Observability
Observability
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
Performance Engineering Masterclass: Efficient Automation with the Help of SR...
Performance Engineering Masterclass: Efficient Automation with the Help of SR...Performance Engineering Masterclass: Efficient Automation with the Help of SR...
Performance Engineering Masterclass: Efficient Automation with the Help of SR...
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 

Ähnlich wie Observability at Spotify

CAP Big Data analytics detects anomalies in server log files
CAP Big Data analytics detects anomalies in server log filesCAP Big Data analytics detects anomalies in server log files
CAP Big Data analytics detects anomalies in server log filesMarkku Ranta
 
Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...
Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...
Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...CIO Edge
 
Cap intro oct2014 pdf
Cap intro oct2014 pdfCap intro oct2014 pdf
Cap intro oct2014 pdfMarkku Ranta
 
Filtering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media StreamingFiltering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media StreamingCloud Elements
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Big Data Spain
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Kai Wähner
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightEduardo Castro
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleAmazon Web Services
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...GetInData
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessInside Analysis
 
WebAction In-Memory Computing Summit 2015
WebAction In-Memory Computing Summit 2015WebAction In-Memory Computing Summit 2015
WebAction In-Memory Computing Summit 2015WebAction
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your EnterpriseWSO2
 
Introduction to OVH Analytics Data Platform
Introduction to OVH Analytics Data PlatformIntroduction to OVH Analytics Data Platform
Introduction to OVH Analytics Data PlatformOVHcloud
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life RevolutionCapgemini
 

Ähnlich wie Observability at Spotify (20)

CAP Big Data analytics detects anomalies in server log files
CAP Big Data analytics detects anomalies in server log filesCAP Big Data analytics detects anomalies in server log files
CAP Big Data analytics detects anomalies in server log files
 
Cap server log file analytics
Cap server log file analyticsCap server log file analytics
Cap server log file analytics
 
Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...
Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...
Barcelona Digital Festival 28th Nov 2019 - Data Analytics in eSports. UbeatCa...
 
Cap intro oct2014 pdf
Cap intro oct2014 pdfCap intro oct2014 pdf
Cap intro oct2014 pdf
 
Filtering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media StreamingFiltering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media Streaming
 
Analysing Data in Real-time
Analysing Data in Real-timeAnalysing Data in Real-time
Analysing Data in Real-time
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
WebAction In-Memory Computing Summit 2015
WebAction In-Memory Computing Summit 2015WebAction In-Memory Computing Summit 2015
WebAction In-Memory Computing Summit 2015
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your Enterprise
 
Introduction to OVH Analytics Data Platform
Introduction to OVH Analytics Data PlatformIntroduction to OVH Analytics Data Platform
Introduction to OVH Analytics Data Platform
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 

Kürzlich hochgeladen

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfMahamudul Hasan
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.thamaeteboho94
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...amilabibi1
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...ZurliaSoop
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxlionnarsimharajumjf
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...David Celestin
 

Kürzlich hochgeladen (17)

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 

Observability at Spotify

  • 2. Agenda Here’s our next :30 minutes - Intro - Why? - How? - Problems?
  • 3. Who am I? Anastasia Khlebnikova Senior backend/data engineer at Spotify.
  • 4. What do I do? Part of the Data and Insights tribe at Spotify Our team owns: one of the biggest services at Spotify (~1M rps) and one of the biggest pipelines at Spotify: anonymization of event delivery
  • 6. Where is the data coming from? Event Delivery System Pseudonymization Cloud Storage pseudonymization pipelines for every event run every hour 8 million events per second
  • 7. A bit of scale ● 8.000.000 events per second ● Largest events are around 8 billions events per hour ● Over 400 unique event types published in separate datasets ● 500TB of data a day ● We used to own the largest Hadoop cluster in Europe
  • 8. How does it feel to be on-call? If you need to hot fix something in production it is like changing a flat tire in a car going 200 km/h on a highway without stopping it! The longer your system is stopped, the longer it will take to catch up. The time to catch up for downstream consumers will increase exponentially.
  • 9. Who needs that much the data? Once delivered, events are processed by numerous data jobs currently running in Spotify. There are many different use cases for which the delivered data is used. Data can be used to produce music recommendations, analyse our A/B tests or analyse our client crashes. Most importantly, delivered data is used to calculate royalties which are paid to artists based on generated streams.
  • 11. Make sure your data is discoverable To annotate your data is the key to avoid piles of mess!!! Upsides: ➢ Other people can use/find your data ➢ Sensitive data in the dataset? Encrypt based on annotations ➢ Easy mapping in your code like schema <-> case class ➢ Easier to find which key to join on Downsides: ➢ you have to do it. Once
  • 13. Monitor your pipelines. Execution time
  • 14. Monitor your pipelines. Count it! Never produce corrupt data! Implement as much sanity-checks as possible. Example: if your pipeline encrypts the row in the dataset, based on user_id. And use the random key otherwise (impossible to decrypt) Count it: Count the %ge of rows where the user_id have not been found or parsed, thus alert if it increased more than ….10percent? Sanity check your data and alert!
  • 15. Monitor your pipelines. Money GCP DataFlow GCP BigQuery GCP storage
  • 16. Monitor your pipelines. Money. Real incident $ time Incident
  • 17. Monitor your pipelines. Money. Per System Taking GDPR as an example. How much does it cost to “Download your data” ? How much should you put inside? What to monitor: ● How much do we pay for every request? ● The cost above: what is the cost of every pipeline that contributes to it? ● How many requests do people actually open?
  • 18. Set up a retention! Storage is ⅓ of the cost ● Setup the default retention. Remove the partitions after the expiration date ● Profile the storage. Can the cold storage be used (cheap to store, expensive to access) ● It adds up: multi-regional vs regional buckets. Where the data is accessed from? ● How the data is used? BigQuery or pipelines
  • 19. Monitor your pipelines. Alerts on failures
  • 20. SLAs for your partitions Concept of low, normal and high priority for events. It gives us different SLAs for different events (depending on importance, 6h, 24h, 72h). Thanks to that we know which events recover first when shit hits the fan. This also made our life better as normal priority events will not alert during nights and low priority events will not alert during weekends.
  • 22. Does your infra lose the BCD? Business critical data: royalty calculations, user accounts, ads etc ➔ High SLO ➔ “Special treatment” when recovering from the incident ➔ …..and special observability since the amount of “BCD” events are limited
  • 23. How to prove that no data is lost? SDK Service ReceiverS ervice P/S Make hourly partitions, dedup, anonymize Hourly partitions
  • 24. Who is watching the watcher SDK Service ReceiverS ervice P/S Make hourly partitions, dedup, anonymize Hourly partitions Streamig job NACK Rejec ted Counting Service Compare
  • 26. Why to bother? Data is using a lot and processing and storage is EXPENSIVE. How much profit does it bring though?