SlideShare a Scribd company logo
1 of 31
Download to read offline
1
What’s an SRE at Criteo?
Clément Michaud
c.michaud@criteo.com
June 6th, 2018
1
About me
Clément Michaud
SRE building the PaaS at Criteo for 9 months.
Previously C++ software engineer in Finance for 3 years.
clems4ever @clementmichaud1clement.michaud
2
Agenda
3
1
W
hatis
Criteo?
2
Em
bracing
DevOps
philosophy
3
Conclusion
4
About Criteo
4
1.4B
Shoppers per Month
What’s Criteo?
Criteo is a global tech
company
Leader in online
advertisement
Manages its own data
centers
600TB
Shopper Data per Day
5
Criteo’s partners
6
Publishers & Exchanges Advertisers
Bid for ad spaces to
display advertisers
products
100 ms max to bid
Manage e-commerce
campaigns
Why did Criteo choose the SRE model?
7
https://www.youtube.com/watch?v=ncf80_ZvBpo
Great presentation done in 2017 by Nicolas Helleringer at Devoxx
8
Embracing the
DevOps philosophy
8
9
9
Contention between agility
and stability
Developers
Agility
Operators
Stability
THE WALL
10
10
A real life example before
feeling DevOps
Laura decides to release a
performance patch on
Friday.
Dev Ops50% / 50%
Laura Alex
Skills scale
11
11
Alex is called during the night because CPUs are
burning...
BOOOM!
12
12
Reorganize and break the
wall
Developers
Agility
Operators
Stability
13
13
What does it solve?
Alex could have
helped Laura with
the patch using
his expertise
Dev Ops50% / 50%
Laura Alex
Skills scale
Laura and Alex
could have
evaluated the risk
together
Alex could have
reacted faster if
he was aware of
Laura’s intention
14
9
DevOps is a philosophy and a set of practices designed
to break organizational barriers
1. Promote
collaboration
2. Failure is
normal
3. Make gradual
changes
5. Measure
everything
4. Leverage
tooling & automation
14
15
9
class SRE implements DevOps
15
Plenty of technical challenges
16
Host and power up more
than 20k servers
Hosting teams.
Build & maintain a
datacenter network
Network teams.
Build & maintain platform
running apps
Infra & Core teams, Observability team.
Build & delivery efficiently
(CI/CD)
DevTool team, Deployment team.
Ingest & process big
amount of data
NoSQL team, DBA team.
05
01
02 03
04
More than 100 people
Organization of SRE teams
17
InfraTools
Network
PaaS
Observabil
ity
DBA
NoSQL
Infra
LB
Small teams
Service provider
Reduced scope
Expertise
IDM
Lake
Rivers
Meta vision provided by EPMs (Engineering Product Manager)
18
Ensure coherence of
the whole
InfraTools
Network
PaaS
Observabil
ity
DBA
NoSQL
Infra
LB
IDM
Lake
Rivers
EPM
EPM
EPM
EPM
Close connections with dev teams
19
Network
Observabil
ity
DBA
NoSQL
Infra
LB
IDM
Prediction
Reco
Creator
RTB
PaaS
InfraTools
Dev
Team
SRE
Team
Escalation
What’s the
role of an
SRE team?
20
Maintenance &
Evolution
Maintain & evolve the platform
to ease the life of our users.
Dev & Tech
Promote and use production
grade technologies.
Ensure technological watch
to stay competitive.
Ownership &
Responsibility
As a provider of services, the
team should assume ownership
on the services it provides.
Automation
Installation of 20k servers
managed with Chef.
Infra as code. Automatic
failover.
20k+
servers
What SRE means at Criteo
21
Support
Provide the right level of
documentation.
Answer user requests on the
service we provide.
Testability
Ensure new deployment will
not break the platform. Made
easy with Terraform.
On-call
Participate to level-2 on-call
rotations for entire days and
during weekends.
Consulting
Help your colleagues build a
resilient and performant
system by accompanying.
8data centers in the world
What SRE means at Criteo
22
Typical
journey in my
team
23
Standard work day
24
- Do code reviews
- Check tasks in Jira board
- Read emails
9:00
- Write code to upgrade Consul
- Test the setup in AWS
- Send code reviews
10:00
Lunch break
12:00
- Deploy code in production
- Make sure the deployment is going well
- Do code reviews
13:00
- Meeting with deployment team to
define SLAs
16:00
Day 1
Standard
Work day while being on-call
25
- Do bug fixes in a tool we provide
- Send a code review
17:00
- Check emails
- Do code review
- Fix few server failures
9:00
12:00
- Write code to install new servers
- Write code to install new apps.
- Write some documentation
- Do deployment in prod
13:00
18:00
Lunch break
Day 2
On-Call
Start of
on-call shift
Work day as interrupt guy
26
18:00
- Got paged because of incident with Mesos
- Investigate and find issue is related to LBs
- Call the on-call guy from the LB team
- Write down timeline
23:00
- Report incidents to my team
- Write a post-mortem and create tickets
to address improvements.
9:00
10:00
12:00
- Provide some support to users in slack
- Do code reviews
Day 3
Interrupt
Go to gym & have lunch!
End of on-call
shift
Start of interrupt
shift
Work day as interrupt guy
27
- Improve our documentation around Mesos
- Handle a Jira ticket & send reviews
13:00
- Fix bug in a library reported by user
- Send reviews
- Deploy fix in prod
15:00
- Do code reviews
- Prepare a wheel of misfortune
17:30
Day 3
Interrupt
18:30
And the sprint goes on….
1 sprint == 2 weeks
28
Conclusion
28
29
● DevOps is a philosophy
● SRE is an implementation of
DevOps
● SRE comes after breaking the
silos between Dev and Ops
● This model has allowed Criteo to
scale well over the years.
Summary
30
● Complex problems need various
skills to be solved.
● Devs and Ops WILL solve those
problems by interacting together.
So, take the plunge,
Implement the DevOps interface
Conclusion
31
Thank you - Questions?
31
https://www.youtube.com/watch?v=uTEL8Ff1Zvk
https://github.com/apache/mesos
https://github.com/clems4ever

More Related Content

What's hot

Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Abeer R
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2Chris Huang
 
A Crash Course in Building Site Reliability
A Crash Course in Building Site ReliabilityA Crash Course in Building Site Reliability
A Crash Course in Building Site ReliabilityAcquia
 
Site (Service) Reliability Engineering
Site (Service) Reliability EngineeringSite (Service) Reliability Engineering
Site (Service) Reliability EngineeringMark Underwood
 
How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)Setyo Legowo
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)Hussain Mansoor
 
Reconstructing the SRE
Reconstructing the SREReconstructing the SRE
Reconstructing the SREBob Wise
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...DevClub_lv
 
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...SlideTeam
 
Rapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsRapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsMarc Hornbeek
 
SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLADr Ganesh Iyer
 
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaSite Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaKeet Sugathadasa
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsRauno De Pasquale
 
DevOps Powerpoint Presentation Slides
DevOps Powerpoint Presentation SlidesDevOps Powerpoint Presentation Slides
DevOps Powerpoint Presentation SlidesSlideTeam
 
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...DevOpsDays Tel Aviv
 

What's hot (20)

Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
 
A Crash Course in Building Site Reliability
A Crash Course in Building Site ReliabilityA Crash Course in Building Site Reliability
A Crash Course in Building Site Reliability
 
Site (Service) Reliability Engineering
Site (Service) Reliability EngineeringSite (Service) Reliability Engineering
Site (Service) Reliability Engineering
 
How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)
 
Reconstructing the SRE
Reconstructing the SREReconstructing the SRE
Reconstructing the SRE
 
SRE & Kubernetes
SRE & KubernetesSRE & Kubernetes
SRE & Kubernetes
 
SRE in Startup
SRE in StartupSRE in Startup
SRE in Startup
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
 
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
 
Rapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsRapid Strategic SRE Assessments
Rapid Strategic SRE Assessments
 
SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLA
 
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaSite Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
 
DevOps & SRE at Google Scale
DevOps & SRE at Google ScaleDevOps & SRE at Google Scale
DevOps & SRE at Google Scale
 
SRE 101
SRE 101SRE 101
SRE 101
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE Concepts
 
DevOps Powerpoint Presentation Slides
DevOps Powerpoint Presentation SlidesDevOps Powerpoint Presentation Slides
DevOps Powerpoint Presentation Slides
 
Sre summary
Sre summarySre summary
Sre summary
 
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
 

Similar to What's an SRE at Criteo - Meetup SRE Paris

Breaking the 2 Pizza Paradox with your Platform as an Application
Breaking the 2 Pizza Paradox with your Platform as an ApplicationBreaking the 2 Pizza Paradox with your Platform as an Application
Breaking the 2 Pizza Paradox with your Platform as an ApplicationMark Rendell
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tshmustafa sarac
 
Serverless projects at Myplanet
Serverless projects at MyplanetServerless projects at Myplanet
Serverless projects at MyplanetDaniel Zivkovic
 
16370 cics project opening and project update f
16370  cics project opening and project update f16370  cics project opening and project update f
16370 cics project opening and project update fnick_garrod
 
Network Reliability Engineering and DevNetOps - Presented at ONS March 2018
Network Reliability Engineering and DevNetOps - Presented at ONS March 2018Network Reliability Engineering and DevNetOps - Presented at ONS March 2018
Network Reliability Engineering and DevNetOps - Presented at ONS March 2018James Kelly
 
Agile Code Reviews: Supporting collaboration and improving production uptime ...
Agile Code Reviews: Supporting collaboration and improving production uptime ...Agile Code Reviews: Supporting collaboration and improving production uptime ...
Agile Code Reviews: Supporting collaboration and improving production uptime ...Atlassian
 
The Need for Speed
The Need for SpeedThe Need for Speed
The Need for SpeedCapgemini
 
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...eZ Systems
 
11 Ways Microservices & Dynamic Clouds Break Your Monitoring
11 Ways Microservices & Dynamic Clouds Break Your Monitoring11 Ways Microservices & Dynamic Clouds Break Your Monitoring
11 Ways Microservices & Dynamic Clouds Break Your MonitoringAbner Germanow
 
Platform governance, gestire un ecosistema di microservizi a livello enterprise
Platform governance, gestire un ecosistema di microservizi a livello enterprisePlatform governance, gestire un ecosistema di microservizi a livello enterprise
Platform governance, gestire un ecosistema di microservizi a livello enterpriseGiulio Roggero
 
Cytoscape CI Chapter 2
Cytoscape CI Chapter 2Cytoscape CI Chapter 2
Cytoscape CI Chapter 2bdemchak
 
Contino Webinar - Migrating your Trading Workloads to the Cloud
Contino Webinar -  Migrating your Trading Workloads to the CloudContino Webinar -  Migrating your Trading Workloads to the Cloud
Contino Webinar - Migrating your Trading Workloads to the CloudBen Saunders
 
The Reality of Managing Microservices in Your CD Pipeline
The Reality of Managing Microservices in Your CD PipelineThe Reality of Managing Microservices in Your CD Pipeline
The Reality of Managing Microservices in Your CD PipelineDevOps.com
 
A DevOps Playbook at DraftKings Built with New Relic and AWS
 A DevOps Playbook at DraftKings Built with New Relic and AWS A DevOps Playbook at DraftKings Built with New Relic and AWS
A DevOps Playbook at DraftKings Built with New Relic and AWSAmazon Web Services
 
Data Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsData Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsAnant Corporation
 
DevOpsGuys FutureDecoded 2016 - is DevOps the Answer
DevOpsGuys FutureDecoded 2016 - is DevOps the AnswerDevOpsGuys FutureDecoded 2016 - is DevOps the Answer
DevOpsGuys FutureDecoded 2016 - is DevOps the AnswerDevOpsGroup
 
AWS Community Day: From Monolith to Microservices - What Could Go Wrong?
AWS Community Day: From Monolith to Microservices - What Could Go Wrong?AWS Community Day: From Monolith to Microservices - What Could Go Wrong?
AWS Community Day: From Monolith to Microservices - What Could Go Wrong?Phuong Mai Nguyen
 
The DevOps journey in an Enterprise - Continuous Lifecycle London 2016
The DevOps journey in an Enterprise - Continuous Lifecycle London 2016The DevOps journey in an Enterprise - Continuous Lifecycle London 2016
The DevOps journey in an Enterprise - Continuous Lifecycle London 2016Anders Lundsgård
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auevanbottcher
 

Similar to What's an SRE at Criteo - Meetup SRE Paris (20)

Breaking the 2 Pizza Paradox with your Platform as an Application
Breaking the 2 Pizza Paradox with your Platform as an ApplicationBreaking the 2 Pizza Paradox with your Platform as an Application
Breaking the 2 Pizza Paradox with your Platform as an Application
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tsh
 
Serverless projects at Myplanet
Serverless projects at MyplanetServerless projects at Myplanet
Serverless projects at Myplanet
 
16370 cics project opening and project update f
16370  cics project opening and project update f16370  cics project opening and project update f
16370 cics project opening and project update f
 
Network Reliability Engineering and DevNetOps - Presented at ONS March 2018
Network Reliability Engineering and DevNetOps - Presented at ONS March 2018Network Reliability Engineering and DevNetOps - Presented at ONS March 2018
Network Reliability Engineering and DevNetOps - Presented at ONS March 2018
 
Agile Code Reviews: Supporting collaboration and improving production uptime ...
Agile Code Reviews: Supporting collaboration and improving production uptime ...Agile Code Reviews: Supporting collaboration and improving production uptime ...
Agile Code Reviews: Supporting collaboration and improving production uptime ...
 
The Need for Speed
The Need for SpeedThe Need for Speed
The Need for Speed
 
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
 
11 Ways Microservices & Dynamic Clouds Break Your Monitoring
11 Ways Microservices & Dynamic Clouds Break Your Monitoring11 Ways Microservices & Dynamic Clouds Break Your Monitoring
11 Ways Microservices & Dynamic Clouds Break Your Monitoring
 
Platform governance, gestire un ecosistema di microservizi a livello enterprise
Platform governance, gestire un ecosistema di microservizi a livello enterprisePlatform governance, gestire un ecosistema di microservizi a livello enterprise
Platform governance, gestire un ecosistema di microservizi a livello enterprise
 
Cytoscape CI Chapter 2
Cytoscape CI Chapter 2Cytoscape CI Chapter 2
Cytoscape CI Chapter 2
 
Contino Webinar - Migrating your Trading Workloads to the Cloud
Contino Webinar -  Migrating your Trading Workloads to the CloudContino Webinar -  Migrating your Trading Workloads to the Cloud
Contino Webinar - Migrating your Trading Workloads to the Cloud
 
The Reality of Managing Microservices in Your CD Pipeline
The Reality of Managing Microservices in Your CD PipelineThe Reality of Managing Microservices in Your CD Pipeline
The Reality of Managing Microservices in Your CD Pipeline
 
A DevOps Playbook at DraftKings Built with New Relic and AWS
 A DevOps Playbook at DraftKings Built with New Relic and AWS A DevOps Playbook at DraftKings Built with New Relic and AWS
A DevOps Playbook at DraftKings Built with New Relic and AWS
 
Data Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsData Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps Fundamentals
 
DevOpsGuys FutureDecoded 2016 - is DevOps the Answer
DevOpsGuys FutureDecoded 2016 - is DevOps the AnswerDevOpsGuys FutureDecoded 2016 - is DevOps the Answer
DevOpsGuys FutureDecoded 2016 - is DevOps the Answer
 
AWS Community Day: From Monolith to Microservices - What Could Go Wrong?
AWS Community Day: From Monolith to Microservices - What Could Go Wrong?AWS Community Day: From Monolith to Microservices - What Could Go Wrong?
AWS Community Day: From Monolith to Microservices - What Could Go Wrong?
 
The DevOps journey in an Enterprise - Continuous Lifecycle London 2016
The DevOps journey in an Enterprise - Continuous Lifecycle London 2016The DevOps journey in an Enterprise - Continuous Lifecycle London 2016
The DevOps journey in an Enterprise - Continuous Lifecycle London 2016
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.au
 
DevOps demystified
DevOps demystifiedDevOps demystified
DevOps demystified
 

Recently uploaded

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 

Recently uploaded (20)

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 

What's an SRE at Criteo - Meetup SRE Paris

  • 1. 1 What’s an SRE at Criteo? Clément Michaud c.michaud@criteo.com June 6th, 2018 1
  • 2. About me Clément Michaud SRE building the PaaS at Criteo for 9 months. Previously C++ software engineer in Finance for 3 years. clems4ever @clementmichaud1clement.michaud 2
  • 5. 1.4B Shoppers per Month What’s Criteo? Criteo is a global tech company Leader in online advertisement Manages its own data centers 600TB Shopper Data per Day 5
  • 6. Criteo’s partners 6 Publishers & Exchanges Advertisers Bid for ad spaces to display advertisers products 100 ms max to bid Manage e-commerce campaigns
  • 7. Why did Criteo choose the SRE model? 7 https://www.youtube.com/watch?v=ncf80_ZvBpo Great presentation done in 2017 by Nicolas Helleringer at Devoxx
  • 9. 9 9 Contention between agility and stability Developers Agility Operators Stability THE WALL
  • 10. 10 10 A real life example before feeling DevOps Laura decides to release a performance patch on Friday. Dev Ops50% / 50% Laura Alex Skills scale
  • 11. 11 11 Alex is called during the night because CPUs are burning... BOOOM!
  • 12. 12 12 Reorganize and break the wall Developers Agility Operators Stability
  • 13. 13 13 What does it solve? Alex could have helped Laura with the patch using his expertise Dev Ops50% / 50% Laura Alex Skills scale Laura and Alex could have evaluated the risk together Alex could have reacted faster if he was aware of Laura’s intention
  • 14. 14 9 DevOps is a philosophy and a set of practices designed to break organizational barriers 1. Promote collaboration 2. Failure is normal 3. Make gradual changes 5. Measure everything 4. Leverage tooling & automation 14
  • 16. Plenty of technical challenges 16 Host and power up more than 20k servers Hosting teams. Build & maintain a datacenter network Network teams. Build & maintain platform running apps Infra & Core teams, Observability team. Build & delivery efficiently (CI/CD) DevTool team, Deployment team. Ingest & process big amount of data NoSQL team, DBA team. 05 01 02 03 04 More than 100 people
  • 17. Organization of SRE teams 17 InfraTools Network PaaS Observabil ity DBA NoSQL Infra LB Small teams Service provider Reduced scope Expertise IDM Lake Rivers
  • 18. Meta vision provided by EPMs (Engineering Product Manager) 18 Ensure coherence of the whole InfraTools Network PaaS Observabil ity DBA NoSQL Infra LB IDM Lake Rivers EPM EPM EPM EPM
  • 19. Close connections with dev teams 19 Network Observabil ity DBA NoSQL Infra LB IDM Prediction Reco Creator RTB PaaS InfraTools Dev Team SRE Team Escalation
  • 20. What’s the role of an SRE team? 20
  • 21. Maintenance & Evolution Maintain & evolve the platform to ease the life of our users. Dev & Tech Promote and use production grade technologies. Ensure technological watch to stay competitive. Ownership & Responsibility As a provider of services, the team should assume ownership on the services it provides. Automation Installation of 20k servers managed with Chef. Infra as code. Automatic failover. 20k+ servers What SRE means at Criteo 21
  • 22. Support Provide the right level of documentation. Answer user requests on the service we provide. Testability Ensure new deployment will not break the platform. Made easy with Terraform. On-call Participate to level-2 on-call rotations for entire days and during weekends. Consulting Help your colleagues build a resilient and performant system by accompanying. 8data centers in the world What SRE means at Criteo 22
  • 24. Standard work day 24 - Do code reviews - Check tasks in Jira board - Read emails 9:00 - Write code to upgrade Consul - Test the setup in AWS - Send code reviews 10:00 Lunch break 12:00 - Deploy code in production - Make sure the deployment is going well - Do code reviews 13:00 - Meeting with deployment team to define SLAs 16:00 Day 1 Standard
  • 25. Work day while being on-call 25 - Do bug fixes in a tool we provide - Send a code review 17:00 - Check emails - Do code review - Fix few server failures 9:00 12:00 - Write code to install new servers - Write code to install new apps. - Write some documentation - Do deployment in prod 13:00 18:00 Lunch break Day 2 On-Call Start of on-call shift
  • 26. Work day as interrupt guy 26 18:00 - Got paged because of incident with Mesos - Investigate and find issue is related to LBs - Call the on-call guy from the LB team - Write down timeline 23:00 - Report incidents to my team - Write a post-mortem and create tickets to address improvements. 9:00 10:00 12:00 - Provide some support to users in slack - Do code reviews Day 3 Interrupt Go to gym & have lunch! End of on-call shift Start of interrupt shift
  • 27. Work day as interrupt guy 27 - Improve our documentation around Mesos - Handle a Jira ticket & send reviews 13:00 - Fix bug in a library reported by user - Send reviews - Deploy fix in prod 15:00 - Do code reviews - Prepare a wheel of misfortune 17:30 Day 3 Interrupt 18:30 And the sprint goes on…. 1 sprint == 2 weeks
  • 29. 29 ● DevOps is a philosophy ● SRE is an implementation of DevOps ● SRE comes after breaking the silos between Dev and Ops ● This model has allowed Criteo to scale well over the years. Summary
  • 30. 30 ● Complex problems need various skills to be solved. ● Devs and Ops WILL solve those problems by interacting together. So, take the plunge, Implement the DevOps interface Conclusion
  • 31. 31 Thank you - Questions? 31 https://www.youtube.com/watch?v=uTEL8Ff1Zvk https://github.com/apache/mesos https://github.com/clems4ever