SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Kim Moir (kmoir), Mozilla Release Engineering
Ship Happens:
A better Firefox build & release pipeline
“I am notorious for making impassioned speeches
about things nobody cares about.”
― Mindy Kaling, Why Not Me?
Today’s agenda
● Faster pipelines and what they mean for you
● How to try it yourself!
● Lessons learned and what’s next
Mozilla Releng live here
Release times
● 2013 - 11 hours
● 2017 - 4-5 hours
Continuous integration
Land code
Unit tests
Decision
graph
Builds x N
platforms
Performance
tests
Sign Builds
Nightlies
Land code
Unit tests
Decision
graph
Builds x N
platforms
Performance
tests
Sign Builds
Generate
updates
L10n
Release process using release promotion
Use existing
build
artifacts
Generate
updates
L10n
Unit tests
Decision
graph
Sign Builds
Performance
tests
Repackage
Builds
+
Move
artifacts
Refresh
update db
rules
Update
websites
with release
About:Taskcluster
● Taskcluster is a task execution framework that supports Mozilla’s continuous
integration farm + release pipeline
It is a set of components that manages task queuing, scheduling, execution and
provisioning of resources.
Why: In-tree and Decision Graph
● Build and test configs are all in tree
○ Good news: Developer autonomy
○ Bad news: Developer autonomy
● Decision graph upon push identifies failures more quickly
● Changes can be tested locally and on try
Testing the graph locally
● Generates the full taskgraph.
○ ./mach taskgraph full > full.txt
● Generates an optimized taskgraph
○ ./mach taskgraph optimized > full.txt
● Generates a target taskgraph
○ ./mach taskgraph target -p parameters.yml > target.txt
● Generates a target taskgraph with json to inspect content of graph
○ ./mach taskgraph target --json -p parameters.yml > target.txt
● Taskcluster config files are under taskcluster/ in tree
○ Example: taskcluster/ci/build/macosx.yml defines mac builds (which
actually run on Linux)
Changing tests
● YAML files in taskcluster/ci/test/ files define tests groups by suite name - e.g.
mochitest, reftest, talos etc
Why: Docker Containers
● Docker containers for test and build images (not all platforms)
○ Consistent environment to debug build and test failures via one click loaners
○ More self-serve developer loaners
Why: More autoscaling
● Moved more platforms to AWS enable autoscaling in response to bursty load
○ Moved Macosx builds to Linux cross-compile on AWS
○ Moved many Windows builds/tests to AWS
Why: More security
● Better security - Chain of Trust (CoT) between artifacts as they are built,
signed and moved to AWS S3/CDNs for download on releases/nightlies
● CoT is the security model for releases
● Task execution is restricted by taskcluster scopes, but that is only one type of
authentication
● CoT allows us to trace requests back to the tree and verify each previous task
in the chain.
● If CoT fails, the task is marked as invalid
Why+?
● Team learned new things - Docker, transforms, migration strategies,
microservices, monitoring
● Future efficiencies - allow us to continue to scale
● Migrate off technologies that did not scale to our needs
● Re-evaluate existing jobs: Are they still needed? Could they be improved?
Timeline for migration
● Jan 20 - Linux Desktop and Android Firefox nightly builds from Taskcluster
● Mar 13 - Mobile beta in Taskcluster
● July 2 - Mac Nightlies in Taskcluster
● Aug 30 - Windows nightlies in Taskcluster
● Nov 14 - Shipped Firefox Quantum in Taskcluster
Approach to migration
● Incremental portions of pool
● Communication
● Checklist
● Monitor capacity and wait times
● Monitor state after migration
● Rollback plan
● Decommission old
● Migrate more
Strangler Application - Martin Fowler
56 was a rough release
● We had many automation changes
○ New compression format for updates
○ Watersheds for win32->win64 migration for people on 64 bit hardware
○ Win32/Win64 on taskcluster
Operation: Don’t F*ck up 57
● Implement missing release automation
● Fix our staging environment
● Smooth our merge day process
● Train team members on merges and staging releases
● Run staging releases and merges to iron out any issues
before 57 releases
● Write tests to validate update rules for 57
● Spreadsheet to coordinate update rules with relman
What have we learned?
● Incrementalism - change one thing, evaluate, then change
another
● Expectations change. The faster we build, the faster other
groups expect to be able to ship
● Staging environment is important to test new automation
● Communication
● Organizational changes
● Consider the operational side, not just landing code
Upcoming work
● In tree release promotion for beta and release builds
● Release process optimizations: measure our release end-
to-end times, common failure points with the aim of
providing more predictable and stable releases
● Staging releases on try
● More incremental fixes to make things faster
I embrace mistakes, they make you who you are
―Beyoncé
Questions?
Additional Reading
● Justin Wood’s (Callek’s) talks on transforms
https://gitpitch.com/Callek/slideshows/transforms_2017
● All your nightlies are belong to Taskcluster
https://atlee.ca/blog/posts/migration-status.html
● Nightly builds from Taskcluster https://atlee.ca/blog/posts/nightly-builds-from-
taskcluster.html
● 2016 retrospective https://atlee.ca/blog/posts/2016-releng-retrospective.html
● What's So Special About "In-Tree?"
http://code.v.igoro.us/posts/2016/08/whats-so-special-about-in-tree.html
Additional Reading
● Chris Cooper Nightlies in Taskcluster
http://coopcoopbware.tumblr.com/post/156133487075/nightlies-in-taskcluster-
go-team
● Chris Cooper Mobile Betas in TC
http://coopcoopbware.tumblr.com/post/158362146735/shameless-self-
release-promotion-firefox-530b1
● So you want to rewrite that - Camille Fournier, GOTO conference, Chicago,
2014 https://www.youtube.com/watch?v=PhYUvtifJXk

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Kürzlich hochgeladen (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Ship happens: A better firefox build and release pipeline

  • 1. Kim Moir (kmoir), Mozilla Release Engineering Ship Happens: A better Firefox build & release pipeline
  • 2. “I am notorious for making impassioned speeches about things nobody cares about.” ― Mindy Kaling, Why Not Me?
  • 3. Today’s agenda ● Faster pipelines and what they mean for you ● How to try it yourself! ● Lessons learned and what’s next
  • 4.
  • 5.
  • 7.
  • 8.
  • 9.
  • 10. Release times ● 2013 - 11 hours ● 2017 - 4-5 hours
  • 11. Continuous integration Land code Unit tests Decision graph Builds x N platforms Performance tests Sign Builds
  • 12. Nightlies Land code Unit tests Decision graph Builds x N platforms Performance tests Sign Builds Generate updates L10n
  • 13. Release process using release promotion Use existing build artifacts Generate updates L10n Unit tests Decision graph Sign Builds Performance tests Repackage Builds + Move artifacts Refresh update db rules Update websites with release
  • 14. About:Taskcluster ● Taskcluster is a task execution framework that supports Mozilla’s continuous integration farm + release pipeline It is a set of components that manages task queuing, scheduling, execution and provisioning of resources.
  • 15.
  • 16. Why: In-tree and Decision Graph ● Build and test configs are all in tree ○ Good news: Developer autonomy ○ Bad news: Developer autonomy ● Decision graph upon push identifies failures more quickly ● Changes can be tested locally and on try
  • 17. Testing the graph locally ● Generates the full taskgraph. ○ ./mach taskgraph full > full.txt ● Generates an optimized taskgraph ○ ./mach taskgraph optimized > full.txt ● Generates a target taskgraph ○ ./mach taskgraph target -p parameters.yml > target.txt ● Generates a target taskgraph with json to inspect content of graph ○ ./mach taskgraph target --json -p parameters.yml > target.txt
  • 18.
  • 19.
  • 20. ● Taskcluster config files are under taskcluster/ in tree ○ Example: taskcluster/ci/build/macosx.yml defines mac builds (which actually run on Linux)
  • 21. Changing tests ● YAML files in taskcluster/ci/test/ files define tests groups by suite name - e.g. mochitest, reftest, talos etc
  • 22.
  • 23.
  • 24. Why: Docker Containers ● Docker containers for test and build images (not all platforms) ○ Consistent environment to debug build and test failures via one click loaners ○ More self-serve developer loaners
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. Why: More autoscaling ● Moved more platforms to AWS enable autoscaling in response to bursty load ○ Moved Macosx builds to Linux cross-compile on AWS ○ Moved many Windows builds/tests to AWS
  • 30. Why: More security ● Better security - Chain of Trust (CoT) between artifacts as they are built, signed and moved to AWS S3/CDNs for download on releases/nightlies ● CoT is the security model for releases ● Task execution is restricted by taskcluster scopes, but that is only one type of authentication ● CoT allows us to trace requests back to the tree and verify each previous task in the chain. ● If CoT fails, the task is marked as invalid
  • 31. Why+? ● Team learned new things - Docker, transforms, migration strategies, microservices, monitoring ● Future efficiencies - allow us to continue to scale ● Migrate off technologies that did not scale to our needs ● Re-evaluate existing jobs: Are they still needed? Could they be improved?
  • 32. Timeline for migration ● Jan 20 - Linux Desktop and Android Firefox nightly builds from Taskcluster ● Mar 13 - Mobile beta in Taskcluster ● July 2 - Mac Nightlies in Taskcluster ● Aug 30 - Windows nightlies in Taskcluster ● Nov 14 - Shipped Firefox Quantum in Taskcluster
  • 33. Approach to migration ● Incremental portions of pool ● Communication ● Checklist ● Monitor capacity and wait times ● Monitor state after migration ● Rollback plan ● Decommission old ● Migrate more
  • 34.
  • 35. Strangler Application - Martin Fowler
  • 36. 56 was a rough release ● We had many automation changes ○ New compression format for updates ○ Watersheds for win32->win64 migration for people on 64 bit hardware ○ Win32/Win64 on taskcluster
  • 37.
  • 38. Operation: Don’t F*ck up 57 ● Implement missing release automation ● Fix our staging environment ● Smooth our merge day process ● Train team members on merges and staging releases ● Run staging releases and merges to iron out any issues before 57 releases ● Write tests to validate update rules for 57 ● Spreadsheet to coordinate update rules with relman
  • 39. What have we learned? ● Incrementalism - change one thing, evaluate, then change another ● Expectations change. The faster we build, the faster other groups expect to be able to ship ● Staging environment is important to test new automation ● Communication ● Organizational changes ● Consider the operational side, not just landing code
  • 40. Upcoming work ● In tree release promotion for beta and release builds ● Release process optimizations: measure our release end- to-end times, common failure points with the aim of providing more predictable and stable releases ● Staging releases on try ● More incremental fixes to make things faster
  • 41. I embrace mistakes, they make you who you are ―Beyoncé
  • 43. Additional Reading ● Justin Wood’s (Callek’s) talks on transforms https://gitpitch.com/Callek/slideshows/transforms_2017 ● All your nightlies are belong to Taskcluster https://atlee.ca/blog/posts/migration-status.html ● Nightly builds from Taskcluster https://atlee.ca/blog/posts/nightly-builds-from- taskcluster.html ● 2016 retrospective https://atlee.ca/blog/posts/2016-releng-retrospective.html ● What's So Special About "In-Tree?" http://code.v.igoro.us/posts/2016/08/whats-so-special-about-in-tree.html
  • 44. Additional Reading ● Chris Cooper Nightlies in Taskcluster http://coopcoopbware.tumblr.com/post/156133487075/nightlies-in-taskcluster- go-team ● Chris Cooper Mobile Betas in TC http://coopcoopbware.tumblr.com/post/158362146735/shameless-self- release-promotion-firefox-530b1 ● So you want to rewrite that - Camille Fournier, GOTO conference, Chicago, 2014 https://www.youtube.com/watch?v=PhYUvtifJXk

Hinweis der Redaktion

  1. Hi, my name is Kim Moir and I work in Mozilla Release Engineering. I’m also one of the unapologetic Canadians here in Austin this week. Today I’m going to tell a story. Last month, we shipped Firefox Quantum. We released a beautiful new and much faster browser. So far the reviews have been stellar and we are all looking forward to seeing the impact that it has in the marketplace. But there is another story. While the platform teams were transforming the browser, engineering ops teams were transforming the pipelines that deliver our products to the world. This work was ongoing while we continued to deliver betas every week, and releases on schedule. How did we do this? Why did we do this? How does this help you? This is the story I’m going to tell today. As a side note, this picture was taken near Stanley Park in Vancouver. I took it during a work week almost two years ago. At this point we were starting a lot of the work to transform our build and release pipeline. Today the bulk of that work is now done.
  2. I am also notorious for talking a lot about release engineering. I have a promise for you that this talk will be interesting, informative and relevant, no matter what your role at Mozilla. So let’s get started!
  3. Faster pipelines -> feedback -> shipping How to try it yourself! (loaners, mach commands, overview of tasks, transforms) Lessons learned and what’s next I’ll publish the slides online after the talk
  4. Photo by Taylor Leopold on Unsplash Before I start talking about the work we did the past year, I’m going to ask you why are you here. Not in the why are we here in the universe sense, but why are you here at Mozilla? Would anyone like to share why they are here at Mozilla? I’m here because: I care about the open web I like release engineering at scale I enjoy working with an amazing team who like to constantly improve things I like to ship!
  5. I’d also like to introduce the cast of characters that did a lot of the work I’m going to be talking about. This is the Mozilla release engineering team. We also didn’t do all the work ourselves - there was a lot of work from the Taskcluster Platform team, Release Engineering Operations, Developer Productivity, Developer Services, Release Management, Sheriffs, Buildduty, QA and more As I go through this presentation, I’m going to have a series of trivia questions. If you get the right answer, I’ll have stickers for you. Trivia time - how many countries do we live in? The answer is 7
  6. One other thing to note is that we are a very distributed team as you can see from the map. We are in New Zealand, Canada, the US, UK, France, Germany and Romania.
  7. Picture Photo by Quinten de Graaf on Unsplash How many of you have? "pushed a patch to Try?" "landed an uplift to mozilla-beta or mozilla-release?" "received a notification that there is a new update available?" So if you have done before, you’ve used some of the systems that release engineering builds and maintains What does releng do? Transform code to shippable product Develop and maintain a build and release pipeline Build: compile, package, sign, run tests, create updates, verify various update scenarios work Optimize! Make things faster! From Wikipedia: “Release engineering, is a sub-discipline in software engineering concerned with the compilation, assembly, and delivery of source code into finished products or other software components. Associated with the software release life cycle, it was said by Boris Debic of Google Inc.[1][2] that release engineering is to software engineering as manufacturing is to an industrial process.”
  8. Photo by Garett Mizunaka on Unsplash At previous job, I used to work with someone that said that everything in life is a constraint optimization problem. Building a build and release pipeline is the same. We are bounded by constraints: Money, time, machines, people. How can we optimize these constraints most effectively so we have happy developers and are able to deliver product?
  9. Photo by Uroš Jovičić on Unsplash What are end to end times - the time from a developer lands a commit until we are able to ship the finished product Why are end-to-end times important? Developers love to ship. In order to ship, they need feedback on their patches. Can I ship this? Or does is there a regression that needs to be backed out? Improves happiness if they can see the results of their work more quickly Landing small incremental patches reduces risk. - Years ago, many software teams ran nightly builds, only run once a day, bisect to figure out what broke everything. We don’t do that anymore. Too difficult to figure out what went wrong on a high velocity team with a huge number of commits. 0 days - we need to be able to get security patches to our users quickly Trivia time - how long does a release take today?
  10. References 2013 - https://oduinn.com/2013/12/11/on-leaving-mozilla/ What changed? Release promotion More parallelization of tasks Faster machines Moved more platforms to AWS so we can scale for bursty load (mac builds now run on Linux in AWS, windows on AWS machines) Fastci work
  11. Very simplified diagram We sign with a signing key specific to CI builds. It’s important that CI, nightly and release builds have different signing keys. Trivia time - How much does it cost to run all the jobs associated with a push to m-c? $134 This doesn’t account for costs of machine we have in data centers, like mac test machine or machines for performance tests on windows and linux.
  12. For nightlies, the signing key is different than releases. Also, we generate language packs for different locales. And generate updates for all of that.
  13. Very simplified diagram again We use a process called release promotion to take the existing artifacts from a CI build and repackage them for release builds. With CI builds on release and beta builds, we use the release key for nightlies because these builds are promoted. In the future, we plan to sign with the release key only when the builds are promoted Trivia time - how much does did we pay AWS for a push to m-r that we used for 57? It’s about $56
  14. Photo by ARTHUR YAO on Unsplash A lot of the work I’m going to talk about today is regarding our migration of buildbot to taskcluster. So I’m going to talk a little about what that is. Release Engineering + other eng ops teams recently finished migrating builds, tests and much of release automation to Taskcluster
  15. What does taskcluster provide? It has a lot of features as you can see from this page. The most important features for the releng team were scheduling flexibility and platform support.
  16. Before this migration, we had several repos full of Python code that were used to define how builds and tests ran in CI. Releng + a small group of developers knew how to make changes, and this was a bottleneck to enabling new or disabling old builds and tests Now these configs are managed in tree and any developer can make changes The drawback to that is that any developer can make changes. Sometimes mistakes are made. E.g. I had to write patches to back out changes that enabled tests on branches where they were not needed to run and just added to costs. Also we need to find ways of testing that developers are not touching configs for releases riding the trains. With every push to a repo, such as autoland, a decision graph is generated automatically. Basically it contains a list of tasks and all their dependencies that are needed to run associated with that push. If it fails, the builds aren’t run which saves resources Developers can also test these changes locally or on try Photo by ARTHUR YAO on Unsplash
  17. Photo by ARTHUR YAO on Unsplash Trivia time - How many tasks are generated with a full graph? 7945 tasks and 16874 dependencies We don’t really use the full task graph very often It’s filtered to select the tasks we actually need So an optimized task graph is Filter filter_target_tasks 2748 tasks You can also specify a target A regular push to autoland or m-c has a target of default There are other targets that we use, for instance when we are running releases there are promote_firefox targets filters the tasks needed to promote existing ci builds to beta or release builds. When you have your changes working locally, you can push to try. You can also export the task in json format
  18. Every decision task generates a parameters.yml so you can download that to use in the target task
  19. Photo by michael podger on Unsplash I wrote some code to generate our a graph of our dependencies but is was too large so this is a replica. Trivia time: 7945 tasks and 16874 dependencies
  20. Defines the various flavours of mac builds, toolchain, scripts and config to run the build, toolchains Anyone with commit rights can change these, but don’t unless you know what you’re doing!
  21. This is the start of the talos.yml file. You can see that for talos-chrome, we can specify that they only run on linux-qr on m-c and try. By default they only run on selected branches
  22. You can also specify built-projects and the tests will only be scheduled for projects where there upstream dependencies are built. taskcluster/taskgraph/transforms/ transforms the taskgraph This is code to transform the graphs different purposes to reduce code duplication See Callek’s talk - it’s a entire talk of it’s own and explains taskcluster transforms very clearly https://gitpitch.com/Callek/slideshows/transforms_2017
  23. Photo by ARTHUR YAO on Unsplash
  24. The taskcluster team implemented one-click loaners which is a super easy way to get a short term loan of a machine that has a docker environment setup configured with the job that you want to debug. https://docs.taskcluster.net/tutorial/debug-task#content Reproducing errors on same environment that runs on ci One click, get an interactive terminal with that job running in it List platforms available Demo?
  25. You’ll have an interactive task created for you. (Note you have to be logged into taskcluster to create one.).
  26. You’ll be redirected to a page with several options. I chose 2 to setup the task, but not run them
  27. The end result is page like this, with a Docker environment with the tests set up. Some caveats from https://docs.taskcluster.net/tutorial/debug-task#content The original task command executes anyway. You can, of course, kill it manually. The shell stays open until there are no active connections, but only until the task's maxRunTime expires, at which time it will be forcibly terminated. Tasks generally run on EC2 spot instances which can be killed at any time.
  28. Photo by ARTHUR YAO on Unsplash Mozilla has a lot of bursty load on their CI farm. When Europe and North America are online, there is a lot of load, overnight it decreases. It’s expensive to have all these machines in data centers to accommodate bursty load. So you can’t autoscale macs in AWS. This is not an offering. We wanted to cross compile Mac on Linux. This took a lot of work to get the toolchain correctly configured from Ted, Mshal, and wcosta. When we did get the builds working, a performance issue was identified. The performance of the browser built on Linux was not as good as the one on native mac hardware. So we couldn’t ship it. It turns out that on mac we were building in a different directory than on Linux, and this was the root cause https://bugzilla.mozilla.org/show_bug.cgi?id=1338651 “Taskcluster OS X builds are cross-compiled from a Linux Docker image. The build is done in a path under '/home/', which shows up in the symbol table of the resulting binary as STAB entries referencing the object files. Some system libraries on OS X will attempt to stat the files in those entries, which can cause noticeable performance issues. This has shown up as a result of the sandboxing system causing this behavior while reporting violations, which caused Talos regressions, as well as timeouts in GTest death tests due to the system crash reporting system causing this behavior.” Trivia time: How many comments were on the bug to address this performance issue 200 How many people cc’ed 48 It is a small novel. Add it to your reading list.
  29. Photo by ARTHUR YAO on Unsplash Aki’s talk on CoT https://vreplay.mozilla.com/replay/showRecordingExternal.html?key=mHlTiJ4RZZSVRPc Blog post on CoT https://escapewindow.dreamwidth.org/249409.html We have been generating Chain of Trust artifacts for a while now. These are gpg-signed json blobs with the task definition, artifact shas, and other information needed to verify the task and follow the chain back to the tree. However, nothing has been verifying these artifacts until now. With the latest scriptworker changes, scriptworker follows and verifies the chain of trust before proceeding with its task. If there is any discrepancy in the verification step, it marks the task invalid before proceeding further. This is effectively a second factor to verify task request authenticity.
  30. Photo by ARTHUR YAO on Unsplash
  31. This is a timeline for some of the work we did this year. There was a lot of work done in the previous year to migrate the ci builds to tc.
  32. Tested in project branches Mention phoenix project
  33. Who loves to delete code? I do, it’s one of my favourite things.
  34. From Jez Humble’s Continuous delivery page https://continuousdelivery.com/implementing/architecture/ “One pattern that is particularly valuable in this context is the strangler application. In this pattern, we iteratively replace a monolithic architecture with a more componentized one by ensuring that new work is done following the principles of a service-oriented architecture, while accepting that the new architecture may well delegate to the system it is replacing. Over time, more and more functionality will be performed in the new architecture, and the old system being replaced is “strangled”.” One of the things that really helped us achieve this in our transition was an application called buildbot bridge. This allowed us to schedule jobs on taskcluster, but continue to run them on buildbot. This is similar to the dispatcher function showed in the diagram above.
  35. Trivia time: 22 beta builds, 12 Betas, 6 RCs toward 56.0 Watersheds are are rules that define an upgrade path to a newer release through a previous release
  36. Photo by Lance Anderson on Unsplash This is an accurate portrayal of the releng team after the 56 release cycle. We were very tired. Not just us. Other teams too. And we knew that it wouldn’t be good for the team to run through 57 hitting some of the same problems given the importance of that release.
  37. Not sure who coined this term, but I have heard it circulate There were many facets to this approach from other groups For releng it was to Running staging releases had already been a magical incantation that only a few releng folks knew how to do. So we set about documenting the process. Where were the pain points? How could more things be automated? How could we share the knowledge so everyone involved with releases could fire up a staging a release and test that their changes worked as expected? Trivia question: How many beta builds were there before the 57 release? 11 betas - 16 builds, 4 RCs before final release
  38. This is an excellent talk on code rewrites as well So you want to rewrite that - Camille Fournier https://www.youtube.com/watch?v=PhYUvtifJXk
  39. What is in tree release promotion? Currently the code that promotes/ships our builds for beta and release doesn’t all reside in tree, such as mozilla-central. There are several github repos for that code. So there is a lot of ongoing work in progress to migrate that functionality so it resides in tree. This will allow us to run staging releases on try, among other things.
  40. This was a really huge rewrite. We learned a lot. We made mistakes. We learned from them and will take those lessons forward in future work. I learned a lot from this process. In the end, our build and release pipeline is more resilient, more scaleable, and more self-serve for developers.