From not knowing Python (let alone Airflow), and from submitting the first PR that fixes typo to becoming Airflow Committer, PMC Member, Release Manager, and #1 Committer this year, this talk walks through Kaxil’s journey in the Airflow World.
The second part of this talk explains:
how you can also start your OSS journey by contributing to Airflow
Expanding familiarity with a different part of the Airflow codebase
Continue committing regularly & steadily to become Airflow Committer. (including talking about current Guidelines of becoming a Committer)
Different mediums of communication (Dev list, users list, Slack channel, Github Discussions etc)
Contributing to Apache Airflow | Journey to becoming Airflow's leading contributor
1. Contributing to Apache Airflow
Airflow Summit
8 July 2021
Kaxil Naik
Airflow Committer and PMC Member
OSS Airflow Team @ Astronomer
2. Who am I?
● Airflow Committer & PMC Member
● Manager of Airflow Engineering team @ Astronomer
○ Work full-time on Airflow
● Previously worked at DataReply
● Masters in Data Science & Analytics from Royal
Holloway, University of London
● Twitter: https://twitter.com/kaxil
● Github: https://github.com/kaxil/
● LinkedIn: https://www.linkedin.com/in/kaxil/
3. Agenda
● My Journey
● How to start contributing
● Communication channels
● Guidelines to become a committer
http://gph.is/1VBGIPv
16. What did I learn by working on Airflow?
● Writing unit-tests
● Improved Coding skills
● Got to know many companies & devs across the globe
● Improved communication skills
○ Commit messages & PR descriptions
○ Email threads on dev list
○ Presentations (Public Speaking was one of my fears !!)
19. How to start contributing?
● Contributing Guidelines: CONTRIBUTING.rst
● Contributing Quick Start Guide: CONTRIBUTORS_QUICK_START.rst
● Good First Issues: https://github.com/apache/airflow/contribute
https://gph.is/g/ZWdK71X
20. Contribution Workflow
1. Find the issue you want to work on
2. Setup a local dev environment
3. Understand the codebase
4. Write Code & add tests
5. Run tests locally
6. Create PR and wait for reviews
7. Address any suggestions by reviewers
8. Nudge politely if your PR is pending reviews for a while
22. Finding issues to work on
● Start small: the aim should be to understand the process
● Bugs / features impacting you or your work
● Documentation Issues (including Contribution Guides)
○ Missing or outdated info, typos, formatting issues, broken links etc
● Good First Issues: https://github.com/apache/airflow/contribute
● Other open GitHub Issues: https://github.com/apache/airflow/issues
23. Finding issues to work on - Open Unassigned Issues
If the issue is open and un-assigned,
comment you want to work on it.
A committer will assign that issue to
you. Then it is all yours.
24. Finding issues to work on - Improving Documentation
● If you faced an issue with docs, fix it for future readers
● Documentation PRs are the great first contributions
● Missing or outdated info, typos, formatting issues, broken links etc
● No need of writing unit tests
● Examples:
○ https://github.com/apache/airflow/pull/16275
○ https://github.com/apache/airflow/pull/13462
○ https://github.com/apache/airflow/pull/15265
26. SetUp Local Development Environment
● Fork Apache Airflow repo & clone it locally
● Install pre-commit hooks (link) to detect minor issues before creating a PR
○ Some of them even automatically fix issues e.g ‘black’ formats python code
○ Install pre-commit framework: pip install pre-commit
○ Install pre-commit hooks: pre-commit install
● Use breeze - a wrapper around docker-compose for Airflow development.
○ Mac Users: Increase resources available to Docker for Mac
○ Check Prerequisites: https://github.com/apache/airflow/blob/main/BREEZE.rst#prerequisites
○ Setup autocomplete: ./breeze setup-autocomplete
27. SetUp Local Development Environment - Breeze
● Airflow CI uses breeze too so it allows reproduction locally
● Allows running Airflow with different environments (different Python versions,
different Metadata db, etc):
○ ./breeze --python 3.6 --backend postgres --postgres-version 12
● You can also run a local instance of Airflow using:
○ ./breeze start-airflow --python 3.6 --backend postgres
● You can then access the Webserver on http://localhost:28080
30. Understand the Codebase
● apache/airflow is mono-repo containing code for:
○ Apache Airflow Python package
○ More than 60 Providers (Google, Amazon, Postgres, etc)
○ Container image
○ Helm Chart
● Each of these items are released and versioned separately
● Contribution process for the entire repo is same
31. Understand the Codebase
● Do not try to understand the entire codebase at once
● Get familiar with the directory structure first
● Dive into the source code related to your issue
● Similar to: If you are moving to a new house, you would try to first get
familiar with your immediate neighbours and then others. (unless you have
memory like Sheldon Cooper !!!)
http://gph.is/2F2nUVb
32. Understand the Codebase - Directory Structure
Area Paths (relative to the repository root)
Core Airflow Docs docs/apache-airflow
Stable API airflow/api_connexion
CLI airflow/cli
Webserver / UI airflow/www
Scheduler airflow/jobs/scheduler_job.py
Dag Parsing airflow/dag_processing
Executors airflow/executors
DAG Serialization airflow/serialization
Helm Chart (& it’s tests) chart
Container Image Dockerfile
Tests tests
33. Understand the Codebase - Directory Structure
Area Paths (relative to the repository root)
Providers airflow/providers
Core Operators airflow/operators
Core Hooks airflow/hooks
Core Sensors airflow/sensors
DB Migrations airflow/migrations
ORM Models
(Python Class -> DB Tables)
airflow/models
Secrets Backend airflow/secrets
Configuration airflow/configuration.py
Permission Model airflow/www/security.py
All Docs (incl. docs for Chart & Container image) docs
34. Understand the Codebase - Areas
● Get expertise in a certain area before diving into a different one.
Easy Medium Complex (core)
Docs Webserver Scheduler
CLI Helm Chart Executors
Operators / Hooks /
Sensors (Providers)
Dockerfile Configuration
Stable API Secrets Backend Permission Model
DB Migrations Dag Parsing
36. Write code
● Take inspiration from existing code
● E.g. when writing a hook, look at:
○ Code for other similar hooks
○ PRs that added other hooks to see everything that changed including docs & tests
● Check out Coding style and best practices in CONTRIBUTING.rst
37. Add tests and docs
● The tests directory has same structure as airflow.
● E.g If code file is airflow/providers/google/cloud/operators/bigquery.py
; tests for it should
be added at tests/providers/google/cloud/operators/test_bigquery.py
● Docs for it would be at
docs/apache-airflow-providers-google/operators/cloud/bigquery.rst
39. Run tests locally - Single Test
● Start breeze: ./breeze --backend postgres --python 3.7
● Run a single test from a file:
pytest tests/secrets/test_secrets.py -k test_backends_kwargs
40. Run tests locally - Multiple Tests
● Start breeze: ./breeze --backend postgres --python 3.7
● Run all test in a file:
pytest tests/secrets/test_secrets.py
41. Run tests locally
● Similarly, you can run various different tests locally:
○ Integration Tests (with Celery, Redis, etc)
○ Kubernetes Tests with the Helm Chart
○ System Tests (useful for testing providers)
● Check TESTING.rst for more details on how you can run them
42. Build docs locally
● If you have updated docs including docstrings, build docs locally
● Two types of tests for docs:
1. Docs are built successfully with Sphinx
2. Spelling Checks
43. Build docs locally
Example: If you updated Helm Chart docs (docs/helm-chart), build docs using
./breeze build-docs -- --package-filter helm-chart
44. Ready to commit - Static Code Checks
● Once you are happy with your code, commit it
● Pre-commit hooks will run as you as you run git commit
● ~90 pre-commit hooks (flake8, black, mypy, trim trailing whitespaces etc)
● All these hooks are documented in STATIC_CODE_CHECKS.rst
● Fix any failing hooks and run git add . && git commitagain until all pass
● These checks will be run on CI too when you create PR
46. Write a good git commit message (Very Important)
1. Separate subject from body with a blank line
2. Limit the subject line to 50 characters
3. Capitalize the subject line
4. Do not end the subject line with a period
5. Use the imperative mood in the subject line
6. Wrap the body at 72 characters
7. Use the body to explain what and why vs. how
Source: https://chris.beams.io/posts/git-commit/
Example: https://github.com/apache/airflow/commit/73b9163a8f55ce3d5bf6aec0a558952c27dd1b55
48. Create PR
● Finally create a PR from your fork to apache/airflow repo
● Make sure to add PR description and title appropriately (similar to commit messages)
● You can add commits to your branch after creating the PR too
● Wait for one of the Committers to review the PR
● Reviewers of the PR might leave suggestions or ask clarifications
● Ask for help on the PR itself if you have any questions by tagging Committers
49. Wait for Reviews
● Be Patient, sometimes it may take multiple days or weeks before you get a review
● If you don’t get any reviews after a couple of weeks, you can ping on #development
channel in Airflow Slack Workspace.
50. Tests on CI
● Tests will run via GitHub Actions as soon as you create PR
● Fix any failing tests
51. Tests on CI
● Sometimes you might see CI failures unrelated to your PRs
● It can be due to one of the following reasons:
○ Flaky tests
○ Tests/Code on “main” branch might be broken
○ GitHub Runner failures -- these are transient errors
○ Timeouts due to no available slot to run on Workers
● Failure of “Quarantined Tests” can be ignored -- those are expected to fail randomly
52. When and who will merge the PR?
● One approved vote from a committer is needed before a PR can be merged
● One of the committers will merge the PR once the tests are completed
● Mention the committer who reviewed if your PR is approved but not merged for a while
54. Communication channels
● Mailing Lists
○ Dev List - dev@airflow.apache.org (Public Archive Link)
■ official source for any decisions, discussions & announcements
■ "If it didn't happen on the dev list, it didn't happen"
■ Subscribe by sending email to dev-subscribe@airflow.apache.org
○ User List - users@airflow.apache.org (Public Archive Link)
● Airflow Slack Workspace: https://s.apache.org/airflow-slack (Public Archive Link)
● GitHub Discussions: https://github.com/apache/airflow/discussions
56. Roles
● Contributors: Anyone who contributes code, documentation etc by creating PRs
● Committers: Community members that have ‘write access’ to the project’s repositories
● PMC Members: Members who are responsible for governance of the project
○ Binding votes on releases
○ Responsible for voting in new committers and PMC members to the project
○ Making sure code licenses and all ASF’s legal policies & brand are complied with
○ Dealing with vulnerability reports
57. How to become a Committer - Prerequisites
● Guidelines are documented at https://github.com/apache/airflow/blob/main/COMMITTERS.rst
● You can become committer either by (1) Code Contributions or (2) Community Contributions
● Prerequisites
○ Consistent contribution over last few months
○ Visibility on discussions on the dev mailing list, Slack channels or GitHub issues/discussions
○ Contributions to community health and project's sustainability for the long-term
○ Understands contributor/committer guidelines: Contributors' Guide
58. How to become a Committer - Code Contributions
1. High-quality commits (especially commit messages), including upgrade paths or deprecation policies
2. Testing Release Candidates
3. Proposed and led to completion Airflow Improvement Proposal(s) - AIPs
4. Champions one of the areas in the codebase like Airflow Core, API, Docker Image, Helm Chart, etc
5. Made a significant improvement or added an integration that is important to the Airflow Ecosystem
59. How to become a Committer - Community contributions
1. Instrumental in triaging issues
2. Improved documentation of Airflow in a significant way
3. Lead change and improvements in the “community” processes and tools
4. Actively spreads the word about Airflow, for example organising Airflow summit, workshops for
community members, giving and recording talks in Meetups & conference, writing blogs
5. Reporting bugs with detailed reproduction steps
60. Airflow Improvement Proposal (AIP)
● The purpose of an AIP is to introduce any major change to Apache Airflow, mostly the ones that
require architectural changes after planning and discussing with the community
● Details on https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals
● Proposal lifecycle:
○ Discuss - discussions on the dev mailing list
○ Draft - create a proposal on the WIKI
○ Vote - vote on dev mailing list (only Committers & PMC Members have a binding vote)
○ Accepted - work is started if vote passes
○ Completed - once all PRs related to the AIPs are merged