SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
It’s a Breeze
to develop
Airflow
Polidea
Apache
Airflow
Polidea
Apache Airflow
Airflow is a platform to programmatically author,
schedule and monitor workflows.
Dynamic/Elegant
Extensible
Scalable
Polidea
Team @
Polidea
Polidea
Logo or mockup
Hi!
Jarek Potiuk
Principal Software Engineer @Polidea
Apache Airflow Committer
Certified GCP Architect
ex-Googler, ex-CTO, ex-choir member
@higrys
Polidea
70+
TALENTS
100+
PROJECTS
DELIVERED
3m
USERS OF
OUR APPS
75%
OF BUSINESS
THROUGH
REFERRALS
Team
@Polidea
Polidea
All-time Apache Airflow team at Polidea
Jarek Potiuk Kamil Breguła Tomasz Urbaszek Karolina Rosół
Dariusz Aniszewski Szymon Przedwojski Antoni Smoliński
Tobiasz Kędzierski Michał Słowikowski
Polidea
Polidea &
Apache Airflow
Polidea
August 2018
2 people
Timeline September 2019
6 (9) people
Polidea
Our tasks
● 100+ operators
● 18+ GCP services
● Oozie-To-Airflow
Polidea
It’s a Breeze to develop [https://polidea.com]
Polidea
What we delivered extra
● 2 Apache Airflow committers
● Documentation improvements
● Breeze - improved development environment
● Py2 -> Py3
● Pylint compatibility
● CI environment reimplemented
● Operator scaffolding
Polidea
Integration
Test
Challenges
Polidea
Integration tests on Travis CI
Polidea
Integration tests challenges
● Multiple backends: postgres, mysql, sqlite
● Multiple python versions (2.7) - 3.5, 3.6. 3.7
● Multiple executors: Local/Sequential/Kubernetes
● Automated static code analysis
● Automated documentation building
Polidea
● Long time to set it up
● Frustrations of fresh developer experience
● High friction/learning curve for Airflow development environment
● Slow iteration speed
● Complicated Development Environment
The problems with Integration Tests
Polidea
● Scripts only designed for CI, not local environment
● Dependencies installed every time you start the environment
● Always full database reset
● Minutes to run one test
● No guidance how to iterate over tests
Original CI environment
Polidea
Ash’s “Hacking on Airflow”
Polidea
Challenge
accepted
Polidea
● Focus on developer productivity
● Faster development cycle
● Decrease developer frustration
● Improve the teamwork
● Easy for ad-hoc contributors to code & test
The Goal
Polidea
● AIP-10: Multi-layered and multi-stage official Airflow image
● AIP-7: Simplified Development Workflow
● AIP-23: Migrate out of Travis CI
● AIP-4: Support for System Tests for external systems
Improvements
Polidea
● Local virtualenv
● Own Travis CI fork
● Docker compose (Travis CI equivalent)
The environments
Polidea
Previous testing experience
● Total time: 7 minutes
● Running one test only
● Failure at the end (!)
● Re-run - 10-20 seconds for DB
● Re-enter - same time (!)
● No bash history
Polidea
Improved
Integration
Tests
Polidea
● Docker images built from master automatically (DockerHub)
● Local images use cached images
● Tests and static checks run using Docker Compose/Docker environment
● Can be run on Kubernetes Cluster (Docker-In-Docker)
● CI system - independent
Step 1 - Multi-stage, multi-layered Docker image
Polidea
● Entering the environment
○ PYTHON_VERSION=3.5 BACKEND=postgres ENV=docker ./scripts/ci/local_ci_enter_environment.sh
● Static checks run in Docker
○ Mypy: ./scripts/ci/ci_mypy.sh
○ Pylint main: ./scripts/ci/ci_pylint_main.sh,
○ Pylint tests: ./scripts/ci/ci_pylint_test.sh
○ Flake8: ./scripts/ci/ci_flake8.sh
○ Licence check: ./scripts/ci/ci_check_licence.sh
○ Documentation build: ./scripts/ci/ci_docs.sh
● Run static checks on individual files/packages
○ ./scripts/ci/ci_pylint.sh ./airflow/stats.py
● Update images
○ ./scripts/ci/local_ci_build.sh
○ ./scripts/ci/pull_and_build.sh
Step 2 - local scripts to manage the environment
Polidea
Step 3 - Easy way of running tests
● works out-of-the-box
● initializes DB when needed
● environment variables set
● sub-second test overhead
● ipdb debugging
● verbose output
Polidea
Breeze
Polidea
● entering the environment: ./breeze --backend sqlite --python 3.5
● re-entering the environment: ./breeze
● automated image management
● autocomplete of options
● sub-second test execution overhead
● host sources mounted to Docker container
● ports forwarded
● hints for ad-hoc developers
The ideal workflow
Polidea
● run-tests tests.core<TAB><TAB> autocomplete
● bash history across sessions
● run static checks with Breeze
● easy debugging (including debugging with IDE)
● pre-commit checks
Extras - nice to haves
Polidea
Feel the Breeze
Polidea
Breeze goodies
Polidea
Additional sources
Polidea
Breeze features
● Docker images management
● Pre-commit checks (almost all merged)
● Run-tests with DB initialisation
● Travis CI integration
● Comprehensive documentation
Polidea
Breeze Documentation
Polidea
Breeze
spin-offs
Polidea
Pre-commit checks
● easy to use
○ pre-commit install
○ pre-commit run
○ pre-commit run mypy
○ pre-commit run --all-files
● run only for changed files (fast)
● catches errors early
● make committers time efficient
● promotes good practices
Polidea
Example errors with pre-commit
Polidea
What’s next ?
● Migrating out of Travis CI
○ GitLab CI (only CI) or GitHub Actions
○ Kubernetes Cluster on Google Kubernetes Engine (Thanks Google!)
● Better onboarding documentation (Google Season of Docs !)
● Production-optimised Docker images
Polidea
Thanks!
hello@polidea.com

Weitere ähnliche Inhalte

Was ist angesagt?

How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
PyData
 

Was ist angesagt? (20)

Apache Airflow at Dailymotion
Apache Airflow at DailymotionApache Airflow at Dailymotion
Apache Airflow at Dailymotion
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Upgrading to Apache Airflow 2 | Airflow Summit 2021
Upgrading to Apache Airflow 2 | Airflow Summit 2021Upgrading to Apache Airflow 2 | Airflow Summit 2021
Upgrading to Apache Airflow 2 | Airflow Summit 2021
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
From business requirements to working pipelines with apache airflow
From business requirements to working pipelines with apache airflowFrom business requirements to working pipelines with apache airflow
From business requirements to working pipelines with apache airflow
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
 
Cloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guideCloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guide
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Mój przepis na skalowalną architekturę mikroserwisową? Apollo Federation i Gr...
Mój przepis na skalowalną architekturę mikroserwisową? Apollo Federation i Gr...Mój przepis na skalowalną architekturę mikroserwisową? Apollo Federation i Gr...
Mój przepis na skalowalną architekturę mikroserwisową? Apollo Federation i Gr...
 
Introduction to DevOps and the Practical Use Cases at Credit OK
Introduction to DevOps and the Practical Use Cases at Credit OKIntroduction to DevOps and the Practical Use Cases at Credit OK
Introduction to DevOps and the Practical Use Cases at Credit OK
 
Deploying Web Apps with PaaS and Docker Tools
Deploying Web Apps with PaaS and Docker ToolsDeploying Web Apps with PaaS and Docker Tools
Deploying Web Apps with PaaS and Docker Tools
 
Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_on
 
So you want to write a cloud function
So you want to write a cloud functionSo you want to write a cloud function
So you want to write a cloud function
 
Introduction to Modern DevOps Technologies
Introduction to  Modern DevOps TechnologiesIntroduction to  Modern DevOps Technologies
Introduction to Modern DevOps Technologies
 

Ähnlich wie It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)

Ähnlich wie It's a Breeze to develop Apache Airflow (London Apache Airflow meetup) (20)

It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)
 
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
 
Gocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous DeploymentGocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous Deployment
 
Webinar - Unbox GitLab CI/CD
Webinar - Unbox GitLab CI/CD Webinar - Unbox GitLab CI/CD
Webinar - Unbox GitLab CI/CD
 
Gitlab ci, cncf.sk
Gitlab ci, cncf.skGitlab ci, cncf.sk
Gitlab ci, cncf.sk
 
Continuous testing
Continuous testingContinuous testing
Continuous testing
 
CI/CD with Azure DevOps and Azure Databricks
CI/CD with Azure DevOps and Azure DatabricksCI/CD with Azure DevOps and Azure Databricks
CI/CD with Azure DevOps and Azure Databricks
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
 
CI/CD with Github Actions
CI/CD with Github ActionsCI/CD with Github Actions
CI/CD with Github Actions
 
Advanced Code Flow, Notes From the Field
Advanced Code Flow, Notes From the FieldAdvanced Code Flow, Notes From the Field
Advanced Code Flow, Notes From the Field
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
 
Docking with Docker
Docking with DockerDocking with Docker
Docking with Docker
 
Reducing DevOps Burden with Git-based CI/CD Pipelines for APIs
Reducing DevOps Burden with Git-based CI/CD Pipelines for APIsReducing DevOps Burden with Git-based CI/CD Pipelines for APIs
Reducing DevOps Burden with Git-based CI/CD Pipelines for APIs
 
Drupal 8 DevOps . Profile and SQL flows.
Drupal 8 DevOps . Profile and SQL flows.Drupal 8 DevOps . Profile and SQL flows.
Drupal 8 DevOps . Profile and SQL flows.
 
Bgoug 2019.11 building free, open-source, plsql products in cloud
Bgoug 2019.11   building free, open-source, plsql products in cloudBgoug 2019.11   building free, open-source, plsql products in cloud
Bgoug 2019.11 building free, open-source, plsql products in cloud
 
DevOps Practices @Pipedrive
DevOps Practices @PipedriveDevOps Practices @Pipedrive
DevOps Practices @Pipedrive
 
Automate Your Automation | DrupalCon Vienna
Automate Your Automation | DrupalCon ViennaAutomate Your Automation | DrupalCon Vienna
Automate Your Automation | DrupalCon Vienna
 
Chef - Administration for programmers
Chef - Administration for programmersChef - Administration for programmers
Chef - Administration for programmers
 
Сontinuous Integration - step to continuous deployment
Сontinuous Integration - step to continuous deploymentСontinuous Integration - step to continuous deployment
Сontinuous Integration - step to continuous deployment
 
DCEU 18: Building Your Development Pipeline
DCEU 18: Building Your Development PipelineDCEU 18: Building Your Development Pipeline
DCEU 18: Building Your Development Pipeline
 

Mehr von Jarek Potiuk

Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Jarek Potiuk
 

Mehr von Jarek Potiuk (7)

Subtle Differences between Python versions
Subtle Differences between Python versionsSubtle Differences between Python versions
Subtle Differences between Python versions
 
Caching in Docker - the hardest thing in computer science
Caching in Docker - the hardest thing in computer scienceCaching in Docker - the hardest thing in computer science
Caching in Docker - the hardest thing in computer science
 
Off time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social mediaOff time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social media
 
Berlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow WorkshopsBerlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow Workshops
 
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
 
Ci for android OS
Ci for android OSCi for android OS
Ci for android OS
 
React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)