TeamStation AI System Report LATAM IT Salaries 2024
Rent The Runway: Transitioning to Operations Driven Webservices
1. Operations Driven Web Services
-A Case Study of Service Evolution at Rent the
Runway
Camille Fournier, Head of Engineering @skamille
Carlo Barbara, Senior Engineer @CarloBarbara
11. Operations first…
Availability and performance of our services is critical to
running our business
The software we develop has to make delivering on our SLAs
possible
How (besides sane design):
Healthchecks + Nagios
Measurements
Historical Data with Graphs
12. Metrics
Gauges – instantaneous value
Counters – counter with +/ Meters – rate over time (mean, 1, 5, & 15 moving avg.)
Histograms – distribution of data (mean, median, max, std.
div., 75th, 90th, 95th, 98th, 99th, & 99.9th percentiles)
Timers – Meter of requests & Histogram of duration (frequency
& latency)
14. Dropwizard: What is it?
Quality open source Java webservice components glued
together in a modular way
Eliminates the need for picking a platform stack, it‟s all there
It‟s opinionated. If you don‟t like a Dropwizard core
component, that‟s too bad, don‟t use Dropwizard
Developers focus on business logic, not framework
It‟s easy, maintainable, and it works!
15. A Few Words from Coda…
“I had no one I had to toss a WAR to. I had no one to
stand up a Tomcat server and fiddle with it until their
eyes bled. I had no one who didn't trust me to spin up
my own threads or connection pools. So I wrote
something which worked as simply and in as straightforward a manner as possible because my own ass
was on the line if it didn't work.”
16. Dropwizard: The Ingredients
Jersey for REST
Jackson for JSON
Jetty for a webserver
Metrics for measuring
YAML for configuring
Dropwizard for weaving everything together
17. Dropwizard – Healthchecks
Register hooks that check the health of your app
An HTTP endpoint that iterates over all the hooks
“The meaning of healthy” is decided by you (i. e. Database
Connections, Client Connections, DeadLock Count)
18. Dropwizard + Metrics
Dropwizard has lots of platform instrumentation baked in using
Metrics, happens for free! (i.e. Jetty, JVM, Log Counts, etc…)
Ability to add Timers to your endpoints with @Timed
Ability to add arbitrary metrics as you see fit
19. Other Frameworks
Play 1.X
Abandonware for Play 2.X, which was still beta
Magic
Glassfish
OSGI hell
“standards”
Spring
Everything and the kitchen sink
Also I hate XML
20. What do I get out of it? Dev agenda
Story telling: causation & correlation
Integral piece of the operational excellence puzzle
State of the world – Dashboards
Developers focus on features, operations is mostly free lunch
Code review & demo
Disclaimer: You need graphite to really harness the value
21. Story telling
The grid is slow why?
Is it load?
Is it dependent service latency?
How does that compare to yesterday
JVM throws out of memory, what‟s the problem?
What does the GC jigsaw look?
When did it change?
Is it correlated with increased load?
How is that new „performance‟ tweak?
If you never measured, then you didn‟t tune. True story!
What does my 5XX graph look like?
22. Operational Excellence: The ingredients
Application Instrumentation (Dropwizard)
Time Series Data & Graphing (Graphite, D3)
Centralized logging & log parsing (Rsyslog, Logstash, Nagios)
Automated alerting & escalation (Pagerduty)
DW & Graphite will get you very far, but if you want total control &
visibility you need the rest. This is the stack that RTR is moving
towards, rather than relying on basic java logging smtp appenders
23. OMG, we are on GMA, are we OK?
10+ services
Each services runs in a cluster behind an LB
„OK‟ is somewhat service specific
Basically you need a lot of info at your fingertips. Pictures are
worth a thousand words. Get yourself some dashboards!
25. Tasseo dashboard (D3)
• Red, Yellow, & Green Lights
• Realtime
• Endless cool things: graphite + D3
If we see yellow or red, start diagnosing
26. Free Lunch? Really
DB connection pool monitoring
Http client connection pool monitoring
JVM Heap & GC info
Http Server response counts
Http Server connection info
Endpoint duration & throughput stats
27. Where do I sign up?
You install Graphite, one time hit + some TLC. Medium
Difficulty
You annotate your endpoints and maybe add finer telemetry.
Easy
You configure so your service is feeding into graphite.
Hopefully consistently across services, via a „Bundle‟. Easy
28. Demo
Show a simple dropwizard codebase
0.6.2 Slim Example: https://github.com/cab222/choco
0.7.0-SNAPSHOT Complete: https://github.com/dropwizard/dropwizard/tree/master/dropwizard-example
Do some curls
Show the admin endpoints