SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Operations Driven Web Services
-A Case Study of Service Evolution at Rent the
Runway
Camille Fournier, Head of Engineering @skamille
Carlo Barbara, Senior Systems Engineer
@CarloBarbara
In The Beginning, There Was
Drupal
There was also all of these
folks…
Can‟t Just Burn the World Down
Hollow It Out!
Hollow It Out!
Hollow It Out!
Hollow It Out!
Complexity
0
2
4
6
8
10
12
14
Dec-11
Jan-12
Feb-12
Mar-12
Apr-12
May-12
Jun-12
Jul-12
Aug-12
Sep-12
Oct-12
Nov-12
Dec-12
Jan-13
Feb-13
Mar-13
Apr-13
May-13
Jun-13
Jul-13
Number of Services in Production
Operations first…
 Availability and performance of our services is critical to
running our business
 The software we develop has to make delivering on our SLAs
possible
 How (besides sane design):
 Healthchecks + Nagios
 Measurements
 Historical Data with Graphs
Metrics
 Gauges – instantaneous value
 Counters – counter with +/-
 Meters – rate over time (mean, 1, 5, & 15 moving avg.)
 Histograms – distribution of data (mean, median, max, std.
div., 75th, 90th, 95th, 98th, 99th, & 99.9th percentiles)
 Timers – Meter of requests & Histogram of duration (frequency
& latency)
Metrics - Healthchecks
 Verify that your service is running correctly
Metrics - Reporting
 HTTP
 JMX
 Graphite
Dropwizard: What is it?
 Quality open source Java webservice components glued
together in a modular way
 Eliminates the need for picking a platform stack, it‟s all there
 It‟s opinionated. If you don‟t like a Dropwizard core
component, that‟s too bad, don‟t use Dropwizard
 Developers focus on business logic, not framework
 It‟s easy, maintainable, and it works!
A Few Words from Coda…
“I had no one I had to toss a WAR to. I had no one to
stand up a Tomcat server and fiddle with it until their
eyes bled. I had no one who didn't trust me to spin up
my own threads or connection pools. So I wrote
something which worked as simply and in as straight-
forward a manner as possible because my own ass
was on the line if it didn't work.”
Dropwizard: The Ingredients
 Jersey for REST
 Jackson for JSON
 Jetty for a webserver
 Metrics for measuring
 YAML for configuring
 Dropwizard for weaving everything together
Dropwizard – Healthchecks
 Register hooks that check the health of your app
 An HTTP endpoint that iterates over all the hooks
 “The meaning of healthy” is decided by you (i. e. Database
Connections, Client Connections, DeadLock Count)
Dropwizard + Metrics
 Dropwizard has lots of platform instrumentation baked in using
Metrics, happens for free! (i.e. Jetty, JVM, Log Counts, etc…)
 Ability to add Timers to your endpoints with @Timed
 Ability to add arbitrary metrics as you see fit
Other Frameworks
 Play 1.X
 Abandonware for Play 2.X, which was still beta
 Magic
 Glassfish
 OSGI hell
 “standards”
 Spring
 Everything and the kitchen sink
 Also I hate XML
What do I get out of it? Dev
agenda
 Story telling: causation & correlation
 Integral piece of the operational excellence puzzle
 State of the world – Dashboards
 Developers focus on features, operations is mostly free lunch
 Code review & demo
Disclaimer: You need graphite to really harness the value
Story telling
 The grid is slow why?
 Is it load?
 Is it dependent service latency?
 How does that compare to yesterday
 JVM throws out of memory, what‟s the problem?
 What does the GC jigsaw look?
 When did it change?
 Is it correlated with increased load?
 How is that new „performance‟ tweak?
 If you never measured, then you didn‟t tune. True story!
 What does my 5XX graph look like?
Operational Excellence: The ingredients
 Application Instrumentation (Dropwizard)
 Time Series Data & Graphing (Graphite, D3)
 Centralized logging & log parsing (Rsyslog, Logstash, Nagios)
 Automated alerting & escalation (Pagerduty)
DW & Graphite will get you very far, but if you want total control &
visibility you need the rest. This is the stack that RTR is moving
towards, rather than relying on basic java logging smtp appenders
OMG, we are on GMA, are we
OK?
 10+ services
 Each services runs in a cluster behind an LB
 „OK‟ is somewhat service specific
Basically you need a lot of info at your fingertips. Pictures are
worth a thousand words. Get yourself some dashboards!
Graphite Dashboard
Tasseo dashboard (D3)
• Red, Yellow, & Green Lights
• Realtime
• Endless cool things: graphite + D3
If we see yellow or red, start diagnosing
Free Lunch? Not really
 DB connection pool monitoring
 Http client connection pool monitoring
 JVM Heap & GC info
 Http Server response counts
 Http Server connection info
 Endpoint duration & throughput stats
Where do I sign up?
 You install Graphite, one time hit + some TLC. Medium
Difficulty
 You annotate your endpoints and maybe add finer telemetry.
Easy
 You configure so your service is feeding into graphite.
Hopefully consistently across services, via a „Bundle‟. Easy
Demo
 Show a simple dropwizard codebase
 Do some curls
 Show the admin endpoints
References
 dropwizard.codahale.com
 metrics.codahale.com
 graphite.wikidot.com
Presenters
 @CarloBarbara (www.cabkata.com)
 @Skamille (whilefalse.blogspot.com)
 Rent The Runway is hiring! (renttherunway.com/careers)

Weitere ähnliche Inhalte

Was ist angesagt?

How agile is rails
 How agile is rails How agile is rails
How agile is rails
José Mota
 

Was ist angesagt? (20)

Reducing Tickets and Crushing SLAs with StatusPage
Reducing Tickets and Crushing SLAs with StatusPageReducing Tickets and Crushing SLAs with StatusPage
Reducing Tickets and Crushing SLAs with StatusPage
 
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
BoS2015 Jeff Szczepanski – COO, Stack Exchange - Stack Overflow. Scaling a Te...
 
LKCE17 Katya Terekhova - A Siberian tale of two Kanban implementations
LKCE17 Katya Terekhova - A Siberian tale of two Kanban implementationsLKCE17 Katya Terekhova - A Siberian tale of two Kanban implementations
LKCE17 Katya Terekhova - A Siberian tale of two Kanban implementations
 
How agile is rails
 How agile is rails How agile is rails
How agile is rails
 
What's really going on? Essential delivery metrics for Product Managers
What's really going on? Essential delivery metrics for Product ManagersWhat's really going on? Essential delivery metrics for Product Managers
What's really going on? Essential delivery metrics for Product Managers
 
BoS2015 Rich Mironov - The Four Laws of Software Economics
BoS2015 Rich Mironov - The Four Laws of Software EconomicsBoS2015 Rich Mironov - The Four Laws of Software Economics
BoS2015 Rich Mironov - The Four Laws of Software Economics
 
Working without a Product Owner by Maaret Pyhajarvi at #AgileIndia2019
Working without a Product Owner by Maaret Pyhajarvi at #AgileIndia2019Working without a Product Owner by Maaret Pyhajarvi at #AgileIndia2019
Working without a Product Owner by Maaret Pyhajarvi at #AgileIndia2019
 
Scrum Fails?
Scrum Fails?Scrum Fails?
Scrum Fails?
 
Feedback Loops v4x3 Lightening
Feedback Loops v4x3 Lightening Feedback Loops v4x3 Lightening
Feedback Loops v4x3 Lightening
 
Great! another bug
Great! another bugGreat! another bug
Great! another bug
 
The ART of Avoiding a Train Wreck - European SAFe Summit 2020
The ART of Avoiding a Train Wreck - European SAFe Summit 2020The ART of Avoiding a Train Wreck - European SAFe Summit 2020
The ART of Avoiding a Train Wreck - European SAFe Summit 2020
 
Performance and Metrics at Lonely Planet
Performance and Metrics at Lonely PlanetPerformance and Metrics at Lonely Planet
Performance and Metrics at Lonely Planet
 
Self-Selection: An Agile Approach to Forming Teams @ Scale
Self-Selection: An Agile Approach to  Forming Teams @ ScaleSelf-Selection: An Agile Approach to  Forming Teams @ Scale
Self-Selection: An Agile Approach to Forming Teams @ Scale
 
Agile India: Working without Product Owner
Agile India: Working without Product OwnerAgile India: Working without Product Owner
Agile India: Working without Product Owner
 
Lean Scaling – From Lean Startup to Lean Enterprise - Itamar Goldminz
Lean Scaling – From Lean Startup to Lean Enterprise - Itamar GoldminzLean Scaling – From Lean Startup to Lean Enterprise - Itamar Goldminz
Lean Scaling – From Lean Startup to Lean Enterprise - Itamar Goldminz
 
Principles of Lean UX
Principles of Lean UXPrinciples of Lean UX
Principles of Lean UX
 
O product where art thou
O product where art thouO product where art thou
O product where art thou
 
LKCE16 - How Kanban saved a Salvation Army hospital in Indonesia by Marcus Ha...
LKCE16 - How Kanban saved a Salvation Army hospital in Indonesia by Marcus Ha...LKCE16 - How Kanban saved a Salvation Army hospital in Indonesia by Marcus Ha...
LKCE16 - How Kanban saved a Salvation Army hospital in Indonesia by Marcus Ha...
 
12 Ways To Improve the Web Developer & Account Manager Relationship
12 Ways To Improve the Web Developer & Account Manager Relationship12 Ways To Improve the Web Developer & Account Manager Relationship
12 Ways To Improve the Web Developer & Account Manager Relationship
 
How Talking Becomes Doing With Stride
How Talking Becomes Doing With StrideHow Talking Becomes Doing With Stride
How Talking Becomes Doing With Stride
 

Andere mochten auch

Andere mochten auch (11)

Deploying Ruby/Sinatra at Rent the Runway - Next Dev StackUp,May 6, 2014
Deploying Ruby/Sinatra at Rent the Runway - Next Dev StackUp,May 6, 2014Deploying Ruby/Sinatra at Rent the Runway - Next Dev StackUp,May 6, 2014
Deploying Ruby/Sinatra at Rent the Runway - Next Dev StackUp,May 6, 2014
 
How to build your own iOS framework
How to build your own iOS frameworkHow to build your own iOS framework
How to build your own iOS framework
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Zoo keeper for ricon
Zoo keeper for riconZoo keeper for ricon
Zoo keeper for ricon
 
Simple REST-APIs with Dropwizard and Swagger
Simple REST-APIs with Dropwizard and SwaggerSimple REST-APIs with Dropwizard and Swagger
Simple REST-APIs with Dropwizard and Swagger
 
The elements of scale
The elements of scaleThe elements of scale
The elements of scale
 
How to go from structureless to structured without losing your vibe
How to go from structureless to structured without losing your vibeHow to go from structureless to structured without losing your vibe
How to go from structureless to structured without losing your vibe
 
So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier
 
The Role of CTO: A Rantifesto
The Role of CTO: A RantifestoThe Role of CTO: A Rantifesto
The Role of CTO: A Rantifesto
 
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
 
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
 

Ähnlich wie Operations-Driven Web Services at Rent the Runway

Rent The Runway: Transitioning to Operations Driven Webservices
Rent The Runway: Transitioning to Operations Driven WebservicesRent The Runway: Transitioning to Operations Driven Webservices
Rent The Runway: Transitioning to Operations Driven Webservices
Dan Chan
 
Making operations visible - Nick Gallbreath
Making operations visible - Nick GallbreathMaking operations visible - Nick Gallbreath
Making operations visible - Nick Gallbreath
Devopsdays
 
Making operations visible - devopsdays tokyo 2013
Making operations visible  - devopsdays tokyo 2013Making operations visible  - devopsdays tokyo 2013
Making operations visible - devopsdays tokyo 2013
Nick Galbreath
 

Ähnlich wie Operations-Driven Web Services at Rent the Runway (20)

Rent The Runway: Transitioning to Operations Driven Webservices
Rent The Runway: Transitioning to Operations Driven WebservicesRent The Runway: Transitioning to Operations Driven Webservices
Rent The Runway: Transitioning to Operations Driven Webservices
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
Making operations visible - Nick Gallbreath
Making operations visible - Nick GallbreathMaking operations visible - Nick Gallbreath
Making operations visible - Nick Gallbreath
 
Making operations visible - devopsdays tokyo 2013
Making operations visible  - devopsdays tokyo 2013Making operations visible  - devopsdays tokyo 2013
Making operations visible - devopsdays tokyo 2013
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
Fluentd meetup #3
Fluentd meetup #3Fluentd meetup #3
Fluentd meetup #3
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
20170624 GraphQL Presentation
20170624 GraphQL Presentation20170624 GraphQL Presentation
20170624 GraphQL Presentation
 
Time series databases
Time series databasesTime series databases
Time series databases
 
4Developers: Time series databases
4Developers: Time series databases4Developers: Time series databases
4Developers: Time series databases
 
Announcing AWS Step Functions - December 2016 Monthly Webinar Series
Announcing AWS Step Functions - December 2016 Monthly Webinar SeriesAnnouncing AWS Step Functions - December 2016 Monthly Webinar Series
Announcing AWS Step Functions - December 2016 Monthly Webinar Series
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
 

Mehr von Camille Fournier (7)

Building Engaged Teams in 2017
Building Engaged Teams in 2017Building Engaged Teams in 2017
Building Engaged Teams in 2017
 
The Elements of Scaling
The Elements of ScalingThe Elements of Scaling
The Elements of Scaling
 
Hopelessness and Confidence in Distributed Systems Design
Hopelessness and Confidence in Distributed Systems DesignHopelessness and Confidence in Distributed Systems Design
Hopelessness and Confidence in Distributed Systems Design
 
A People's History of Microservices
A People's History of MicroservicesA People's History of Microservices
A People's History of Microservices
 
Becoming a Multiplier
Becoming a MultiplierBecoming a Multiplier
Becoming a Multiplier
 
Keynote talk: How to stay in love with programming (with notes)
Keynote talk: How to stay in love with programming (with notes)Keynote talk: How to stay in love with programming (with notes)
Keynote talk: How to stay in love with programming (with notes)
 
Keynote talk: How to stay in love with programming
Keynote talk: How to stay in love with programmingKeynote talk: How to stay in love with programming
Keynote talk: How to stay in love with programming
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Operations-Driven Web Services at Rent the Runway

  • 1. Operations Driven Web Services -A Case Study of Service Evolution at Rent the Runway Camille Fournier, Head of Engineering @skamille Carlo Barbara, Senior Systems Engineer @CarloBarbara
  • 2. In The Beginning, There Was Drupal
  • 3. There was also all of these folks…
  • 4.
  • 5. Can‟t Just Burn the World Down
  • 11. Operations first…  Availability and performance of our services is critical to running our business  The software we develop has to make delivering on our SLAs possible  How (besides sane design):  Healthchecks + Nagios  Measurements  Historical Data with Graphs
  • 12. Metrics  Gauges – instantaneous value  Counters – counter with +/-  Meters – rate over time (mean, 1, 5, & 15 moving avg.)  Histograms – distribution of data (mean, median, max, std. div., 75th, 90th, 95th, 98th, 99th, & 99.9th percentiles)  Timers – Meter of requests & Histogram of duration (frequency & latency)
  • 13. Metrics - Healthchecks  Verify that your service is running correctly
  • 14. Metrics - Reporting  HTTP  JMX  Graphite
  • 15. Dropwizard: What is it?  Quality open source Java webservice components glued together in a modular way  Eliminates the need for picking a platform stack, it‟s all there  It‟s opinionated. If you don‟t like a Dropwizard core component, that‟s too bad, don‟t use Dropwizard  Developers focus on business logic, not framework  It‟s easy, maintainable, and it works!
  • 16. A Few Words from Coda… “I had no one I had to toss a WAR to. I had no one to stand up a Tomcat server and fiddle with it until their eyes bled. I had no one who didn't trust me to spin up my own threads or connection pools. So I wrote something which worked as simply and in as straight- forward a manner as possible because my own ass was on the line if it didn't work.”
  • 17. Dropwizard: The Ingredients  Jersey for REST  Jackson for JSON  Jetty for a webserver  Metrics for measuring  YAML for configuring  Dropwizard for weaving everything together
  • 18. Dropwizard – Healthchecks  Register hooks that check the health of your app  An HTTP endpoint that iterates over all the hooks  “The meaning of healthy” is decided by you (i. e. Database Connections, Client Connections, DeadLock Count)
  • 19. Dropwizard + Metrics  Dropwizard has lots of platform instrumentation baked in using Metrics, happens for free! (i.e. Jetty, JVM, Log Counts, etc…)  Ability to add Timers to your endpoints with @Timed  Ability to add arbitrary metrics as you see fit
  • 20. Other Frameworks  Play 1.X  Abandonware for Play 2.X, which was still beta  Magic  Glassfish  OSGI hell  “standards”  Spring  Everything and the kitchen sink  Also I hate XML
  • 21. What do I get out of it? Dev agenda  Story telling: causation & correlation  Integral piece of the operational excellence puzzle  State of the world – Dashboards  Developers focus on features, operations is mostly free lunch  Code review & demo Disclaimer: You need graphite to really harness the value
  • 22. Story telling  The grid is slow why?  Is it load?  Is it dependent service latency?  How does that compare to yesterday  JVM throws out of memory, what‟s the problem?  What does the GC jigsaw look?  When did it change?  Is it correlated with increased load?  How is that new „performance‟ tweak?  If you never measured, then you didn‟t tune. True story!  What does my 5XX graph look like?
  • 23. Operational Excellence: The ingredients  Application Instrumentation (Dropwizard)  Time Series Data & Graphing (Graphite, D3)  Centralized logging & log parsing (Rsyslog, Logstash, Nagios)  Automated alerting & escalation (Pagerduty) DW & Graphite will get you very far, but if you want total control & visibility you need the rest. This is the stack that RTR is moving towards, rather than relying on basic java logging smtp appenders
  • 24. OMG, we are on GMA, are we OK?  10+ services  Each services runs in a cluster behind an LB  „OK‟ is somewhat service specific Basically you need a lot of info at your fingertips. Pictures are worth a thousand words. Get yourself some dashboards!
  • 26. Tasseo dashboard (D3) • Red, Yellow, & Green Lights • Realtime • Endless cool things: graphite + D3 If we see yellow or red, start diagnosing
  • 27. Free Lunch? Not really  DB connection pool monitoring  Http client connection pool monitoring  JVM Heap & GC info  Http Server response counts  Http Server connection info  Endpoint duration & throughput stats
  • 28. Where do I sign up?  You install Graphite, one time hit + some TLC. Medium Difficulty  You annotate your endpoints and maybe add finer telemetry. Easy  You configure so your service is feeding into graphite. Hopefully consistently across services, via a „Bundle‟. Easy
  • 29. Demo  Show a simple dropwizard codebase  Do some curls  Show the admin endpoints
  • 31. Presenters  @CarloBarbara (www.cabkata.com)  @Skamille (whilefalse.blogspot.com)  Rent The Runway is hiring! (renttherunway.com/careers)