4,307
467
Spread out across 35 feature teams
ProductionDevelopment
Backlog
Requirements
Visual Studio
& TFS
Update 1
Visual Studio
& TFS
Update 2
Visual Studio
& TFS
Update n
VS Team Services
Code Test & Stabilize Code Test & Stabilize
Beta RTM
2 years
Planning
Customer feedback – we should
change the way a feature works. We
didn’t get it quite right…
… but we’re booked so...
S1 S2 S3 S4 S5 Stabilization S6
A
B
S7 S8
2 years
3 weeks
https://flic.kr/p/arXUyP
Alignment
Autonomy
“Let’s try to give our teams three things….
Autonomy, Mastery, Purpose”
Scenarios
Features
Stories
Tasks
Sprint
3 week
3
Plan
3 sprint
Season
6 month
Scenario
18 month
3 6
SpringFallSpring Fall
Aspirational
60%
Sprint
3 week
Plan
3 sprint
3
Season
6 month
Scenario
18 month
3 6
SpringFallSpring Fall
Hopeful
80%
What Epics are we lig...
Sprint
3 week
3
Plan
3 sprint
Season
6 month
Scenario
18 month
3 6
SpringFallSpring Fall
Thoughtful
90%
What features are ...
Sprint
3 week
3
Plan
3 sprint
Scenario
18 month
3 6
SpringFallSpring Fall
Confident
95%
What stories are we complete? What...
Week 1 Week 2 Week 3
Week 1 Week 2 Week 3Week 2 Week 3
Sprint 98
Sprint 97 Sprint 99
The sprint plan What we accomplished
• Updates were large
• Months apart
• Lots of problems!
4/1/2010 4/23/2012
5/3/2010
TFS 2010 RTM
4/23/2011
ServiceDeployme...
Program Management Development Testing
Operations
Program Management Engineering
Operations
Engineering
Program Management Engineering
Week 1 Week 2 Week 3
Week 1 Week 2 Week 3Week 2 Week 3
Sprint 98
Sprint 97 Sprint 99
Deployment
Sprint Planning
Done
Week 1 Week 2 Week 3
Week 1 Week 2 Week 3
Week 1 Week 2 Week 3
Week 1 Week 2 Week 3
ONE
Code Test & Stabilize Code Test & Stabilize
Beta RTM
Planning
Code
Complete
ON
OFF
ON
OFF
ON
OFF
ON
OFF
ON
OFF
ON
OFF
VSO SU1
Chicago
VSO SU0
San Antonio
VSO SU4
Amsterdam
Shared Platform Services
San Antonio
Existing experience Baseline:
36% conversion to project
50% to 100% customers
conversion to project (+18%)
There’s no
place like
production!
Telemetry everywhere
Customer IntelligenceBusiness IntelligenceOperational Intelligence
Dashboard DevOps Debug Experiments
Getting the availability model right
0,8
0,82
0,84
0,86
0,88
0,9
0,92
0,94
0,96
0,98
1
-200
0
200
400
600
800
1000
1200
14...
Alerting is key to fast detection
Every alert must be actionable and represent
a real issue with the system.
Alerts should...
Health model in action
• 3 errors for memory
and performance
• All 3 related to same
code defect
• APM component mapped to...
Live Site Issues (LSIs)
Time to MitigateTime to Detect
%ofIncidents
DRAFT
DRAFT
Microsoft Confidential 52
Service Availability & Health Metrics
DR...
Service status
© 2015 Microsoft Corporation. All rights reserved.
DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter
DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter
DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter
DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter
DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter
DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter
DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter
DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter
DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter
DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter
DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter
Nächste SlideShare
Wird geladen in …5
×

DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter

6.294 Aufrufe

Veröffentlicht am

Märkte sind dynamischer denn je und Businessmodelle ändern sich. Oft unterstützt das Engineering nicht mehr ausreichend diese Dynamik, wodurch sich erhebliche Wettbewerbsnachteile ergeben können. Kürzere Zyklen und eine agile Kultur sind hierbei Schlüsselelemente für eine bessere Wertschöpfung, sind aber in großen Organisationen nicht trivial zu realisieren. Der Vortrag beschrieb am Beispiel von Visual Studio Team Services die agile Transformation der Microsoft Developer Division hin zu einer DevOps-Kultur beschreiben und Ihnen einige Einblicke hinter die Kulissen gewähren, wie die Developer Division heute arbeitet.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

DevDay 2016: Artur Speth - DevOps - Microsoft Developer Divisions Weg ins nächste Agile Zeitalter

  1. 1. 4,307
  2. 2. 467 Spread out across 35 feature teams
  3. 3. ProductionDevelopment Backlog Requirements
  4. 4. Visual Studio & TFS Update 1 Visual Studio & TFS Update 2 Visual Studio & TFS Update n VS Team Services
  5. 5. Code Test & Stabilize Code Test & Stabilize Beta RTM 2 years
  6. 6. Planning Customer feedback – we should change the way a feature works. We didn’t get it quite right… … but we’re booked solid already. 2 years
  7. 7. S1 S2 S3 S4 S5 Stabilization S6 A B S7 S8
  8. 8. 2 years 3 weeks
  9. 9. https://flic.kr/p/arXUyP
  10. 10. Alignment Autonomy “Let’s try to give our teams three things…. Autonomy, Mastery, Purpose”
  11. 11. Scenarios Features Stories Tasks
  12. 12. Sprint 3 week 3 Plan 3 sprint Season 6 month Scenario 18 month 3 6 SpringFallSpring Fall Aspirational 60%
  13. 13. Sprint 3 week Plan 3 sprint 3 Season 6 month Scenario 18 month 3 6 SpringFallSpring Fall Hopeful 80% What Epics are we lighting up
  14. 14. Sprint 3 week 3 Plan 3 sprint Season 6 month Scenario 18 month 3 6 SpringFallSpring Fall Thoughtful 90% What features are planned?
  15. 15. Sprint 3 week 3 Plan 3 sprint Scenario 18 month 3 6 SpringFallSpring Fall Confident 95% What stories are we complete? What features are shipping? Season 6 month
  16. 16. Week 1 Week 2 Week 3 Week 1 Week 2 Week 3Week 2 Week 3 Sprint 98 Sprint 97 Sprint 99 The sprint plan What we accomplished
  17. 17. • Updates were large • Months apart • Lots of problems! 4/1/2010 4/23/2012 5/3/2010 TFS 2010 RTM 4/23/2011 ServiceDeployment 8/5/2011 ServiceUpdate 9/26/2011 //BUILD2011 12/7/2011 ServiceUpdate 1/30/2012 ServiceUpdate 2/20/2012 ServiceUpdate 3/12/2012 ServiceUpdate 4/2/2012 ServiceUpdate
  18. 18. Program Management Development Testing Operations
  19. 19. Program Management Engineering Operations Engineering
  20. 20. Program Management Engineering
  21. 21. Week 1 Week 2 Week 3 Week 1 Week 2 Week 3Week 2 Week 3 Sprint 98 Sprint 97 Sprint 99 Deployment Sprint Planning Done
  22. 22. Week 1 Week 2 Week 3
  23. 23. Week 1 Week 2 Week 3
  24. 24. Week 1 Week 2 Week 3
  25. 25. Week 1 Week 2 Week 3
  26. 26. ONE
  27. 27. Code Test & Stabilize Code Test & Stabilize Beta RTM Planning Code Complete
  28. 28. ON OFF
  29. 29. ON OFF
  30. 30. ON OFF
  31. 31. ON OFF
  32. 32. ON OFF
  33. 33. ON OFF
  34. 34. VSO SU1 Chicago VSO SU0 San Antonio VSO SU4 Amsterdam Shared Platform Services San Antonio
  35. 35. Existing experience Baseline: 36% conversion to project 50% to 100% customers conversion to project (+18%)
  36. 36. There’s no place like production!
  37. 37. Telemetry everywhere Customer IntelligenceBusiness IntelligenceOperational Intelligence Dashboard DevOps Debug Experiments
  38. 38. Getting the availability model right 0,8 0,82 0,84 0,86 0,88 0,9 0,92 0,94 0,96 0,98 1 -200 0 200 400 600 800 1000 1200 1400 1600 9.25.13 2:24 PM 9.25.13 3:36 PM 9.25.13 4:48 PM 9.25.13 6:00 PM 9.25.13 7:12 PM 9.25.13 8:24 PM 9.25.13 9:36 PM 9.25.13 10:48 PM Sept 25th 2013 LSI FailedExecutionCount SlowExecutionCount Start End Availability (ID4 - Activity Only) Availability (Current)
  39. 39. Alerting is key to fast detection Every alert must be actionable and represent a real issue with the system. Alerts should create a sense of urgency – false alerts dilutes that Redundant alerts for same the issue Needed to set right thresholds and tune often Stateless alerts contributed to further noise
  40. 40. Health model in action • 3 errors for memory and performance • All 3 related to same code defect • APM component mapped to feature team • Auto-dialer engaged Global DRI Eliminated alert noise ~928 alerts per week to ~22 and reduced DRI escalations by ~56%
  41. 41. Live Site Issues (LSIs)
  42. 42. Time to MitigateTime to Detect %ofIncidents DRAFT DRAFT Microsoft Confidential 52 Service Availability & Health Metrics DRAFT DRAFT DRAFT IncidentCount IncidentCount DRAFT DRAFT DRAFT %ofIncidents UserMinutes DRAFT DRAFTDRAFT Error By SourceIncidents by Severity User Impact Minutes During Incidents [TFS Only] 3 2 1 4 1. TFS Availability is on an improving trend. No Sev0/Sev1 LSIs for July. 2. App Insights switched from synthetic availability to real-user experience in Ibiza portal. A high volume of SEV-2 LSIs (72) contributed to customer impact in addition to intermittent UX errors. (UX fixes applied on 8/11 that improves availability) 3. App Insights was impacted by 3 long running LSIs related to ES maintenance, Ibiza updates and an Azure Storage outage. 4. TFS Service attainment (SLO) improved significantly MoM with focus on minimizing failed/slow commands and reviewing in weekly LiveSite reviews
  43. 43. Service status
  44. 44. © 2015 Microsoft Corporation. All rights reserved.

×