1 year has passed since my Devops laboratory talk in Devopsdays Melbourne and we haven't stopped experimenting. After all the buzz and great conversations at Devops days I decided to extend the talk with a few more experiments on top of the previous presentation. This talk was first presented in Last.conf Melbourne on June 2016. The objective is no matter were your company is in terms of adopting a Devops culture/mindset there is always opportunities to try something new.
The experiments covered include:
E0. At the beginning, there was devs and ops
E1. Placements
E2. The tooling team (code name Gandalf)
E3. Secondments
E4. Ops as an attribute of Business areas
E5. The era of Guilds
E6. The raise of the Delivery Engineering teams
E7. Sec + DevOps
E8. Leverage vs Autonomy
E9. Finance + DevOps
E10. ????
6. At the beginning...
Delivery Team1
Site
Operations
Ops Ops Ops Ops
Dev Dev Dev
Dev Dev Dev
Dev Dev Dev
Delivery Team2
Dev Dev Dev
Dev Dev Dev
Dev Dev Dev
Delivery Team N
Dev Dev Dev
Dev Dev Dev
Dev Dev Dev
Ops
13. Placements
Delivery Team1
Site
Operations
Ops Ops Ops Ops
Dev Dev
Dev
Dev Dev Dev
Dev Dev Dev
Delivery Team2
Dev Dev Dev
Dev Dev Dev
Dev Dev Dev
Ops
Delivery Team N
Dev Dev Dev
Dev Dev Dev
Dev Dev Dev
24. And what about pager?
Site
Operations
Ops OpsOps
Ops
Dev Dev Dev
Dev Dev Dev
Dev Dev Dev
Dev Dev Dev Ops
Dev Dev Dev
Dev Dev Dev
Dev Dev Dev
Dev Dev Dev
DayDay
Day/Nights
32. Dev
QA Ops
BAIM
TechL Dev Dev
Team 1 – Midsize initiative X
Dev DevBA
Team 2 – Small Initiative Y
IM
Dev
QA Ops
BA
TechL Dev Dev
IM
Dev Dev Dev
QA
UX
UX
Team 3 – Big Initiative G
LoB “A”
Team 4 – Midsize initiative Z
IM
Dev Dev Dev
QA
Ops
Lead
Tech Lead
IM
BA
UX
TechL
Dev
QA
Ops
Iteration Manager
Business Analyst
User Experience
Tech lead
Developer
Tech lead
Quality Assurance
Operations
Ops
43. Dev
QA Ops
BAIM
TechL Dev Dev
Team 1 – Midsize initiative X
Dev DevBA
Team 2 – Small Initiative Y
IM
Dev
QA
BA
TechL Dev Dev
IM
Dev Dev Dev
QA
UX
UX
Team 3 – Big Initiative G
LoB “TOO MANY STREAMS”
Team 4 – Midsize initiative Z
IM
Dev Dev Dev
QA
IM
BA
UX
TechL
Dev
QA
Ops
Iteration Manager
Business Analyst
User Experience
Tech lead
Developer
Tech lead
Quality Assurance
Operations
Dev DevBA
Team 6 – Small Initiative Y
Dev Dev
Team N – Small Initiative Y
Dev
BAIM
TechL Dev Dev
IM
Ops
Ops
44.
45. Dev
QA
BAIM
TechL Dev Dev
Team 1 – Midsize initiative X
Dev DevBA
Team 2 – Small Initiative Y
IM
Dev
QA Ops
BA
TechL Dev Dev
IM
Dev Dev Dev
QA
UX
UX
Team 3 – Big Initiative G
LoB “A”
Team 4 – Midsize initiative Z
IM
Dev Dev Dev
QA
Team 5 – Delivery Engineering
Ops Ops
Dev QA
Ops
Lead
Tech Lead
IM
BA
UX
TechL
Dev
QA
Ops
Iteration Manager
Business Analyst
User Experience
Tech lead
Developer
Tech lead
Quality Assurance
Operations
Ops
62. TL;DR: Which one worked?
“There are only a few problems that can't be
solved by cake”
QUESTIONS?
FEEDBACK?
THANKS!
@setoide
Editor's Notes
These is me and my passions.
In the last 4 and half year I've been working for REA.
We operate heavy traffic sites around the world.
Some of the things that make REA special are:
- Innovation
- Though leadership in areas like agile, lean and devops
The only constant is change, always looking to improve.
This talk is about the different experiments we've run to try to create a devops culture in REA.
As probably Nigel could explain better:
“Complex systems are complex” and organizations like REA are complex in many dimensions: business, engineering, IT systems, etc...
The approach
Change something and observe. Be brave. Repeat.
Delviery vs Site Operations
Ops:
- To modify the code
- To help understand how the application works
Devs:
- To help us deploy to prod
- To help us with some non functional requirements
The night is dark and full of incidents.
Days since a full night sleep counter
3-4 alerts per night
Happy engineer getting off pager.
Ops had to understand and troubleshoot a massively large complex set of systems
Storage/Networks/Systems/Apps/Monitoring/Data/Security etc...
That made hiring difficult because:
Heroes don't scale
Short temporal placements of engineers in a different functional area. Normally went for a few weeks.
Allocated capacity
Working closer to where the action is
Knowledge of full stack
You would never stop learning
Handovers and rump up for a new area difficult
Still there were conflicting priorities
Alerts and incidents still been managed by the central team
Meet ADO, one of our first Devs to be fully knighted by the SiteOps team
Ops in Delivery
Devs in Site Ops
I am going to fit there?
As many companies have done
Create a centralize team to drive automation, continuous delivery, cloud adoption, etc...
PROBLEMS:
Painful manual deployments
QA blessing to go to prod
Coordination wall
1 staging fits all
The approach
Centralized team
Build tools ( #cloud + #chef + #git )
Solution that fits all needs
Influence teams to adoption
This is a simplified version of an E2E environment. One of the achievements of the Gandalf team that allowed us for a long time to have better opportunities for developing and testing changes that affected multiple components.
Just an example of some of the tech challenges the team was going through as they tried to provide stable infrastructure for EVERYONE.
Send your champions to contaminate other areas with their passion
Longer term allocations to a team
Ops still reported/belonged to the SiteOps team
Different approach
- Champions in each team to build the needed capabilities: automation, monitoring, performance
Some pluses
Priorities dictated by your function area
Engagement with the team
Better understanding of pain points
Early input in the project
Longer term allocations to a team
Ops still reported/belonged to the SiteOps team
Example of optimization from within a team instead of tackling the full-company problem.
The Autobots team was part of one of the Delivery areas and was focused on automating some parts of their delivery process.
They mianaged to automate some really compex processes:
- Schemabot: Database schema changes in an automated maner.
- Deploybot: Managed the deployment. One of its components, the netscaler gem, was afterward used by multiple teams.
The idea of copying from the open source model and having teams looking at what other teams have come up with has repeated over time becoming one of the most successful patterns at REA.
Different business areas highly independent
Develop + Operation
A very lean layer of Global Infrastructure to support
Thing layers of shared services and vendor mgmt
The principle was to impulse TMI: Team Managed Infrastructure.
Cloud – Many accounts
Cons: Does everybody needs to know about infrastructure/netoworks/etc...?
Negative
Priorities dictated by your business area
New Silos
Lost sense of community
Postivie
Focus - Get Shit Done
Engagement with the team +++
Input into the roadmap
We give autonomy to the business areas to chose the best tools/practices for their areas.
They will have to support and maintain what they create which drives the Accountability.
Can you spot the Ops engineer?
Devs step up (Pager, deployments, metrics, performance, etc...)
Day pager going to devs
Escalate if needed after troubleshooting
Proxy knowledge
Pick up BAU
Deploy something that hasn't been deployed
Tom our ops engineer can focus in general improvements of operations like:
Exploring a new CDN
Regresion testing in Operations
Automating Security patches
Etc...
If the problem was beyond the knowledge of the engineers they can escalate the problem to the Ops representative and the good thing is that they will cache the knowledge.
The role of the ops in LoBs has evolved:
Their role (boost operations capacity in their area)
Enable previously disabled people
Early input into the projects
Different business areas highly independent
Develop + Operation
A very lean layer of Global Infrastructure to support
War room becomes the exception. For example this all hands on deck collaboration to tackle Hearbleed as soon as possible.
2 challenges so far:
- We need to increase our Ops capability across the organisation
- We need to minimise the walls of the new Silos.
What are guilds?
- Communities of interest around different topics
- Opt in model
- They are horizontal
The previous model was quite successful but as we can see as we became faster the business areas tried to run more streams in parallel but the Ops capability sometimes wasn't correctly readjusted...
How many ops are too many ops?
With areas running so many concurrent projects
Push to regroup again
But how is this different?
Previous investments paying off. Devs++
Focus in areas that can boost the full group
Sometimes called Devops (arrrgggggg) or BAU teams.
Focus: go fast from idea to prod
Examples: MaD walking scheleton, Group Delivery Engineering
Danger: BAU and operations brought back to this group undoing the previous beneficts
Night pager improved over time.
And finally we had our first grad on Pager.
Kudos to Angus.
Different business areas highly independent
Develop + Operation
A very lean layer of Global Infrastructure to support
Security as consultants/coaches/experts
Teams are accountable for security
Lean technique: A3s (find out american sizes)
Story telling
(PIC) A3s
Different business areas highly independent
Develop + Operation
A very lean layer of Global Infrastructure to support
This experiments presented are just examples of what we have tried at some point of time. They had different level of success and the results are based on the state of our own business and our own journey.
Run your own experiments. Try new things. Monitor the results.