What is the right balance between moving fast, innovating, experimenting with new technology, and protecting the personal data of our customers and interests of our stakeholders? How can we safely try new ideas in production without risking costly downtime? Does the utopia where developers are free from lock-in and operators enjoy the calm of a steadily running system exist in the real world? Is it possible to have open platforms with better security? At Kroger Digital we are still working through these questions every day but are redesigning our systems with the goals of true operational maturity and security. Discover how we are building capabilities for monitoring, A/B testing, and continuous delivery with Docker Datacenter, plugins, and open source building blocks such as NGiNX, ElasticSearch, and more.
9. Contentious Delivery
* Placeholder graphics
Let’s write
functional tests
so this never
happens again.
What are all
these red things?
42 TESTS
FAILED *coding*
code
code
code
10. Collaborate!
Easier said than done, but also:
• Microservices
• Docker!
• Monitoring
• Docker!
• Communication
• Not yet Dockerized
and…
Lessons learned
13. Pull requests!
All commits subject to approval.
• Applications
• Base Docker images
• Deploy scripts
• Reverse proxy configuration
Open source model
Faster
Safer
- +
+-
14. Regression’s a dirty job but someone’s got to do it
What test cases can’t be covered with a lower
level unit or contract test?
Automation is required for true Continuous
Delivery.
Reduce manual regression tests
Faster
Safer
- +
+-
18. Docker Trusted Registry and Universal Control Plane
Docker with additional manageability
including:
• Trusted CI build and push to DTR
• RBAC for running containers on UCP
• Auditing
Docker Datacenter
Faster
Safer
19. Colorful Delivery
• Two versions of app deployed
• Docker!
• Dark test non-prod color
• If (failed):
• Remove non-prod color
• Else:
• Swap non-prod color to prod
• Profit
Blue/Green deploys
Faster
Safer
- +
+-
20. Blue/green demo
Deploy new version to UCP
Run tests
Switch color
Code walkthrough
See new version in prod
Rollback
21. • Consul vs. Interlock
• Registration by apps vs.
Registrator
• Docker open source vs. Datacenter
• Admin UI vs. Authz plugins
Tech stack considerations
22. Continuous Delivery and Experimentation (CDE)
A/B testing and Canary deploys
Faster
Safer
NGiNX
User1 User2
Echo
A B
- +
+-
23. • Marriages and relationships?
• Stock trades?
• Startups?
• Job candidates?
• Conference presentations?
• Anything and everything?
Real life A/B
What if we could A/B test…
24. A/B demo
Show button clicking site
Monitor button click metrics
25. What color do you think
will generate the most
clicks?
26. A/B demo
Configure and deploy app with new color as B route
Code walkthrough
Demonstrate UCP RBAC with routing and app stacks
Audience vote
Monitor performance metrics in real time
27. Go to [demo URL] on
your mobile device and
click away!
29. • A/B, always better
• CD, cautious distribution
• CDE, carefully doing experiments
• DTR, docked trusty register
• UCP, ultimate container platform
• RBAC, read books about cats
Your new alphabet
There will be a quiz
30. • Open source all code
• https://github.com/timperman
• Automated scaling
• Automated rollback
• Blog posts
Next steps
I’m Brett Timperman, I work at Kroger Technology as a developer lead for the Core Engineering team. We are responsible for building and maintaining Kroger’s customer-facing web applications, including ClickList, an order-online-pick-up-at-store application. Right now we are re-architecting and re-building everything that makes up kroger.com into a new experience.
You may not engage with us online (yet), but you’re probably familiar with at least one brick and mortar store in our large family. Here in Seattle, that’s probably QFC or Fred Meyer, but we have a presence throughout most of the country under one name or another. Kroger is the 3rd largest retailer in the world even though we only operate here in the US.
Kroger and its family of companies are known mostly for those physical grocery and convenience stores, but we are investing heavily in the “digital” experience.
To illustrate the extent of that investment: when I started on the web team in early 2013, I was a member of one of two teams of roughly 10 developers each. Today, about three and a half years later, the Core Engineering team and I work with over 200 fellow developers, divided into vertical product teams.
Our job on Core Engineering is to curate and promote best practices between the teams and to deliver reusable solutions that solve common problems. In practice, this forms us into somewhat of a DevOps team. One of our top team goals, and the one closest to my “core,” is to provide platforms for Continuous Delivery of a variety applications to production.
While we are currently undergoing a revolution of collaboration, things didn’t always used to be so cozy.
I would like to start with a quick tale I will call “Contentious Delivery.”
This is a familiar tale to the enterprise world, a tale of two teams with fundamentally different goals who must work together despite barely concealed tension.
Unsurprisingly, this is largely due to competing incentives. Each team gets bonus checks based on how well they accomplish seemingly incompatible goals.
Production issues will often necessitate a conference bridge that every single team in the company calls in to, where someone is likely to ask “what changed?!” This issue didn’t happen yesterday/last week/last quarter/last year! Which oh-so-carefully-balanced spinning plate fell and brought them all crashing down?
Sometimes there can be a bit of denial and buck-passing until the front-end team finally is able to come up with mathematical proof that a downstream service is returning 500s. Someone restarts a server and the system is restored to glory.
QA may try to prevent production issues with more functional tests, but they add too much overhead to the build and developers may become dangerously accustomed to red build pipelines. Broken tests may be ignored to get out a critical fix for the next production issue, which seems to happen entirely too often. The QA team eventually relies solely upon week-long manual regression cycles to catch breaking issues. Features are still heavily prioritized so more and more developers are on-boarded while dev teams focus on bug fixes and tech debt. The code bases quickly grow into monoliths.
I wish I could say my team is nothing like this anymore; that everything changed and became perfect today but it is a daily effort to strive towards operational maturity. That said, we have tons of talented and smart people who are working together better than ever. I’d like to take this opportunity to thank every member of Kroger Technology for their work and willingness to evolve.
It’s obvious in retrospect that as we grew our teams needed to break the monoliths apart into microservices. Our apps were not designed to accommodate this and we built up too much technical debt too quickly to effectively reuse the monolith code. Today we are completely rewriting large portions of the codebase, and like many other companies have adopted Docker to package and share work between development teams.
Monitoring was a big pain point for us - with large distributed applications and no centralized logging, we struggled to identify issues with just log files and out-of-the-box tools. We built our own streaming data pipeline for log and performance messages with dashboards and visualizations. Guess what? It’s fully Dockerized and runs on Docker in production! You’ll get more detail and even see pieces of this system in action today.
This system also helps facilitate communication with other teams as we are much more easily able to articulate the state of our applications – which services have high response times, and so on.
All these things are making huge differences for us, but the question still remains…
“How can we deploy faster AND safer?” - the theme of today’s talk.
This chapter of our story is a work in progress. New features are coming very fast and bringing change for us all. We have rapidly evolving teams and practices, and with the introduction of tools like Docker, finding the right balance between speed and stability seems within our grasp.
I’ll take you through the signposts on our roadmap to operational maturity, each with its own set of advantages and drawbacks.
I am a fan of the open source model, even for enterprise applications. Security teams will love the audit data and comfort they get from requiring someone else to review and approve a commit before it goes live in production. We are implementing this everywhere, from application source code, to Docker builds and deploys, to sometimes overlooked but critical items like the reverse proxy configuration.
Opponents of using pull requests will argue that it makes a team slower, but the added safety of a good approval workflow may help you sleep at night.
I am not a fan of manual regression testing, and have yet to meet one in real life. There are still some that think we will always need it. I believe a reliance on manual regressions should be a red flag that the code base is too big. If an application’s developers cannot make a change without breaking something unrelated the code generally needs to be smashed into pieces with a sledgehammer and put back together again. To achieve our goal of moving as fast as possible, testing must be automated.
Most of you will have seen the Test Automation Pyramid before – it stresses the value of faster Unit tests over slower UI tests. This is not a controversial position to take these days, but unit tests are undoubtedly faster and safer. They are more reliable because they are ran without the dependency of a web browser or dev/test environments. We are now able to replace the vast majority of our slow UI tests since we are rebuilding our apps with a component-based UI architecture that allows for deep unit testing of UI components.
I mentioned our custom monitoring system earlier – it is a real-time streaming data collection system that aggregates log messages and other key metrics and correlates events across systems. Kafka is the backbone of the system, which collects and filters messages over the Apache Thrift messaging protocol and pipes free-form data into ElasticSearch for querying and visualization via Kibana. It doesn’t make us much faster, but does make us much safer with unprecedented visibility into production events and errors. It also runs on Docker and we’ll be open sourcing it soon!
Show components of Echo in docker-compose file. Start system with docker compose, view data format and Kibana dashboard.
Docker Datacenter has proven necessary for our Docker push-to-prod. It has allowed our developers and operators to take advantage of the awesome Docker experience while providing all of the visibility and control Security teams crave. With DTR, we can limit the push rights to app teams or even just a CI server. We can show who pushed an image when, and with UCP role based access control, ensure that only the proper groups can deploy or administer a team’s containers. Authorization integrates with our existing enterprise systems sych as LDAP. Security teams get detailed auditing and increased piece of mind – I’ll demo more of how we use these features for separation of duties later in the talk.
Blue/green deploys are a common technique for continuous delivery. With it, there is the concept of a live, or production, color – either blue or green. Traffic should be freely switchable to either color at the load balancer, so a team can update the dark, non-prod, version without disrupting production users. Once the dark version has been tested and verified, the team updates the load balancer so that the new version becomes the live version. The team can optionally leave the old version running for some time to enable quick rollbacks if problems are detected via monitoring or user complaints. The Blue/green technique allows teams to deploy much faster and safer with downtime-free rollout and rollbacks.
Demo rollout and rollback of a new version with B/G
A/B testing is another familiar term to many of you, but to level-set us here I’ll define it as distributing a portion of your production user traffic to a different version or instance of the application, or the B route. Proper A/B testing should always also include monitoring the key performance indicators of B in real-time next to A with the intent of confirming a hypothesis. Traffic can be distributed any number of ways – by a strict percentage, by location/IP, by participation in a beta, and so on. Once the truth of the hypothesis seems more and more likely, the team can ramp up the amount of users directed to route B, increasing the sample size. Likewise, the team can quickly ramp down if the data doesn’t support the hypothesis. A team can A/B test any hypothesis, including cosmetic criteria such as colors. Canary deploys is a specific flavor of A/B testing that revolves around deploying a new version of an application. Like the traditional use of canaries to see if a mine is safe, a Canary version is deployed as a B route to a portion of customer traffic, and only ramps up if performance criteria are met. Due to the scientific nature of these tests, I call the use of A/B or Canary in deployment CDE (Continuous Delivery and Experimentation).
A/B testing an application should be easy compared to the powers one would need to pull it off in the physical world, but its still fun to imagine the possibilities.
As a man who has been divorced and is remarried, wouldn’t it have been great to test the happiness of Marriage A vs. Marriage B? My wife tells me that is what dating should have been, but my counter is that what works in Test doesn’t always work in Production… unless its in a Docker container. I’m not even sure how I would define marital bliss in terms of key performance indicators.
There are many scenarios I could imagine, but my science fiction –trained brain gets stuck on the time-travel logistics of rollbacks in reality. Does each A/B test create an alternate dimension that lives on to the end of time? Is there a multiverse of failed B routes?
This demo uses an app that simply asks users to click a button. We will demonstrate the power of A/B testing in production by forming a hypothesis that X button color will receive more clicks, and test that live during the talk!
Final comments about the power of live feedback with the ability to safely rollback
Kind of lame jokes, we’ll see how I feel about these later.