Data Contracts: Consensus as Code - Pycon 2023

Ryan Collingwood
Ryan CollingwoodBusiness Analyst | Requirements Wrangler | Boundary Spanner | Continuously Learning um Containerchain
Data Contracts
Consensus as Code
Ryan Collingwood
2023-08-18
Who am I and my current context
• Ryan Collingwood, Head of Data & Analytics at Oroton
• Australia’s oldest luxury fashion company
• Centralised Data Team
• Monoliths (ERP & POS) surrounded by number of SaaS
• Data is mostly moved in batch
Why I think you might care about this
Responsibility in the
modern data stack
Andrew Jones -
Driving Data Quality with
Data Contracts (2023)
Shout out to Andrew Jones
https://data-contracts.com/
Similar, Related, and Complementary Concepts
APIs Data
Dictionaries
Data Mesh Event Storming
I’d be curious to know what else you might add to this list
Data Catalogs
Domain Driven
Design
Advice is a form of nostalgia. Dispensing it is a way
of fishing the past from the disposal, wiping it off,
painting over the ugly parts and recycling it for
more than it's worth
Mary Schmich
https://www.chicagotribune.com/columns/chi-schmich-sunscreen-column-column.htm
“If I could offer you only one tip for the future, sunscreen would be it.”
What are Data
Contracts?
... outlines how data can get exchanged between two parties.
It defines the structure, format, and rules of exchange in a
distributed data architecture. These formal agreements make
sure that there aren’t any uncertainties or undocumented
assumptions about data.
https://atlan.com/data-contracts/
... is an agreed interface between the generators of data and
its consumers. It sets the expectations around that data,
defines how it should be governed, and facilitates the explicit
generation of quality data that meets the business
requirements.
Andrew Jones - Driving Data Quality with Data Contracts (2023)
Data Producers and Data Consumers
Team A Team B
Team C
You can be a Data Producer without knowing about it
Non-consensual API
Team C
��
Broken pipelines, broken non-promises
Non-consensual API
Non-consensual API
Non-consensual API
🧰󰠼
❌
Team A
Team C
��
Team B
One of the largest impediments to addressing data quality at any organization is the
lack of collaboration between data producers and data consumers.
...
A common workaround (is the) proliferation of non-consensual APIs.
Can’t get a software engineer to emit the data you need to solve some business
problem?
Connect your ELT tool to a production source and extract a batch dump on a
schedule.
Easy
(Until things start breaking…whoops).
Chad Sanderson - https://dataproducts.substack.com/p/the-production-grade-data-pipeline
What makes up a Data Contract
https://github.com/PacktPublishing/Driving-Data-Quality-with-Data-Contracts/blob/main/Chapter03/order_events.yaml
However, data contracts are more than just a
schema... we need our data contracts to capture
metadata that describes how the data can be used,
how it is governed, and the controls around the data
Driving Data Quality with Data Contracts - Andrew Jones (2023)
What makes up a Data Contract
Schema
Contract
Governance
Semantics
Service Level
Objectives
Dataset
Governance
Mechanisms of
Transmission
People
Schema versus Semantics
Schema Semantics
Systems interoperability Human Expectations
Support for Implicit Validation
by Database Technologies
Tends to require Explicit
Validation by complimentary
solutions
Ensuring we capture and
retrieve the data consistently
Ensuring we interpret the data
consistently
Dates / times, monetary values - are a trap if considered only as schema.
What are your “schema” but “secretly semantic” situations?
Minimum Viable Data Contract Tooling
Andrew Jones - Driving Data Quality with Data Contracts (2023)
Operate
Meta-Data Powered Tooling
Andrew Jones -
Driving Data Quality
with Data Contracts
(2023)
Data Quality Checks
Andrew Jones -
Driving Data Quality
with Data Contracts
(2023)
Data Contract Tooling - My Context
Data Contract Tooling - My Context
Producer
Boundaries
Semantics
Schema &
SLOs
Checks
and Tests
Semantics
Schema &
SLOs
Checks
and Tests
Semantics
Ok so how are
we going to
make this all
happen?
Awesome humans who
understand models,
abstractions, constraints
You could even do it in
✨code ✨
... and you should definitely
version control it
Why Code? Why not Text?
● Entanglement of meaning and representation
● Finding References instead of text matches
● Enforcement of structure
● Refactoring
● Testable constraints
● More options for document generation
○ Including JSON and yaml
Although... I’ve been having a blast using Logseq (a graph like outliner) and
I might be crazy enough to give that a go as an IDE for this
“Refactoring” Text
Expectation Reality
https://xkcd.com/208/
Scope &
Allies
Constraints
& Guiding
Principles
People
and
Process
Centric
Contract
Meta
Schema
Maximise
Contribution
Opportunities
What was considered
Guiding
Principles
● Primary Objective: Consensus
● Evolution
● Quick Feedback
● First Outcome: Data Tests
Creating a Meta
Model
● Focused around Events
● From UI to DB
● Schema and Semantics
● People
... still figuring it out
Don’t have to do it all at once!
Data Contracts: Consensus as Code - Pycon 2023
The optimistic path to capturing and generating contracts
The Event Capture spreadsheet
Who’s Going to Do The Work?
Andrew Jones - Driving Data Quality with Data Contracts (2023)
Probably
these people
Hopefully
these people
Why Python? ● Gradual Typing*
● Static Analysis
● Well understood within the team
Helpful Python
Libraries
● Pandas
● Pydantic
● Rope
● Pytest
● Mypy
● Black
Data Contracts: Consensus as Code - Pycon 2023
Data Contracts: Consensus as Code - Pycon 2023
Refactoring, doing variable extraction with Rope
https://colab.research.google.com/drive/1fHLit3hF2G0dFV0Xl11jnovcdPR87s-E
Refactoring, doing variable extraction with Rope
https://colab.research.google.com/drive/1fHLit3hF2G0dFV0Xl11jnovcdPR87s-E
Code Refactoring - Other Libraries
• https://pybowler.io/ - doesn't have variable extraction and not much
development activity in the last while
• https://github.com/hchasestevens/astpath - useful for finding parts of the AST
but then I'm not sure how to proceed with it, seems to be powering a number
of meta-programming libs though
• traad - https://av.tib.eu/en/media/19947
Further explorations for wrangling generated code
• Abstract Syntax Tree - Options for querying
• Linting - Define my own rules to as they apply to the meta
schema
• Code duplication detection
• Network (Graph) Analysis
linkedin.com/in/ryancollingwood
mastodon.social/@ryancollingwood
twitter.com/ryancollingwood
www.meetup.com/en-AU/data-engineering-melbourne
• You can be a Data Producer without knowing about it, make it
worthwhile for Consumers to “register” with you
• You can do this through having a contract which provides clarity and
can be used to power tooling and generate artefacts
• Code is easier to refactor, find references, and generally maintain than
the alternatives
Key Takeaways
My References
• Andrew Jones - Driving Data Quality with Data Contracts (2023) - ISBN 13 978-1837635009
• Data Contracts: The Key to Scaling Distributed Data Architecture and Reducing Data Chaos -
https://atlan.com/data-contracts/
• Chad Sanderson - The Production-Grade Data Pipeline -
https://dataproducts.substack.com/p/the-production-grade-data-pipeline
• Chad Sanderson and Adrian Kreuziger - An Engineers Guide to Data Contracts -
https://mlops.community/an-engineers-guide-to-data-contracts-pt-1/
• Green Tree Snakes the missing Python AST docs - https://greentreesnakes.readthedocs.io/en/latest/
• Rope - Refactoring Variable Extraction -
https://rope.readthedocs.io/en/latest/library.html#performing-refactorings
Questions?
linkedin.com/in/ryancollingwood
mastodon.social/@ryancollingwood
twitter.com/ryancollingwood
www.meetup.com/en-AU/data-engineering-melbourne
1 von 44

Recomendados

BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What... von
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...Thomas Rones
189 views12 Folien
Roadmap for Enterprise Graph Strategy von
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyNeo4j
1.4K views37 Folien
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F... von
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Codemotion
1.5K views92 Folien
Data engineering design patterns von
Data engineering design patternsData engineering design patterns
Data engineering design patternsValdas Maksimavičius
1K views53 Folien
Big data business case von
Big data   business caseBig data   business case
Big data business caseKarthik Padmanabhan ( MLE℠)
1K views38 Folien

Más contenido relacionado

Similar a Data Contracts: Consensus as Code - Pycon 2023

Ordering the chaos: Creating websites with imperfect data von
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataAndy Stretton
777 views30 Folien
Building an enterprise Natural Language Search Engine with ElasticSearch and ... von
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Debmalya Biswas
305 views24 Folien
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli von
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
543 views35 Folien
How to Get Cloud Architecture and Design Right the First Time von
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeDavid Linthicum
12.4K views65 Folien
Your Roadmap for An Enterprise Graph Strategy von
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
340 views34 Folien
How Cloud is Affecting Data Scientists von
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists CCG
104 views27 Folien

Similar a Data Contracts: Consensus as Code - Pycon 2023(20)

Ordering the chaos: Creating websites with imperfect data von Andy Stretton
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
Andy Stretton777 views
Building an enterprise Natural Language Search Engine with ElasticSearch and ... von Debmalya Biswas
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Debmalya Biswas305 views
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli von Data Driven Innovation
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
How to Get Cloud Architecture and Design Right the First Time von David Linthicum
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First Time
David Linthicum12.4K views
Your Roadmap for An Enterprise Graph Strategy von Neo4j
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
Neo4j340 views
How Cloud is Affecting Data Scientists von CCG
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
CCG104 views
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411 von Mark Tabladillo
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Mark Tabladillo575 views
Jeremy cabral search marketing summit - scraping data-driven content (1) von Jeremy Cabral
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy Cabral363 views
Knowledge Graph for Machine Learning and Data Science von Cambridge Semantics
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
Your Roadmap for An Enterprise Graph Strategy von Neo4j
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
Neo4j180 views
Data Discovery and Metadata von markgrover
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
markgrover611 views
La bi, l'informatique décisionnelle et les graphes von Cédric Fauvet
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
Cédric Fauvet1.2K views
Optimizing Your Supply Chain with Neo4j von Neo4j
Optimizing Your Supply Chain with Neo4jOptimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4j
Neo4j46 views
Microsoft Build 2020: Data Science Recap von Mark Tabladillo
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science Recap
Mark Tabladillo196 views
2022-09-14-MATLABDay_SREC.pptx von AnjanMayra1
2022-09-14-MATLABDay_SREC.pptx2022-09-14-MATLABDay_SREC.pptx
2022-09-14-MATLABDay_SREC.pptx
AnjanMayra127 views
Your Roadmap for An Enterprise Graph Strategy von Neo4j
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
Neo4j1.2K views
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re... von Chris Andrews
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...
2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Re...
Chris Andrews64 views

Último

Oral presentation (1).pdf von
Oral presentation (1).pdfOral presentation (1).pdf
Oral presentation (1).pdfreemalmazroui8
5 views10 Folien
Inawsidom - Data Journey von
Inawsidom - Data JourneyInawsidom - Data Journey
Inawsidom - Data JourneyPhilipBasford
8 views38 Folien
Inawisdom Quick Sight von
Inawisdom Quick SightInawisdom Quick Sight
Inawisdom Quick SightPhilipBasford
7 views27 Folien
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... von
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...DataScienceConferenc1
5 views18 Folien
shivam tiwari.pptx von
shivam tiwari.pptxshivam tiwari.pptx
shivam tiwari.pptxAanyaMishra4
7 views14 Folien
Listed Instruments Survey 2022.pptx von
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptxsecretariat4
121 views12 Folien

Último(20)

[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... von DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
Listed Instruments Survey 2022.pptx von secretariat4
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptx
secretariat4121 views
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion von Bertram Ludäscher
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf von Oppotus
OPPOTUS - Malaysians on Malaysia 3Q2023.pdfOPPOTUS - Malaysians on Malaysia 3Q2023.pdf
OPPOTUS - Malaysians on Malaysia 3Q2023.pdf
Oppotus31 views
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language... von patiladiti752
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
patiladiti7528 views
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... von DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... von DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
PRIVACY AWRE PERSONAL DATA STORAGE von antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204217 views
Customer Data Cleansing Project.pptx von Nat O
Customer Data Cleansing Project.pptxCustomer Data Cleansing Project.pptx
Customer Data Cleansing Project.pptx
Nat O6 views
Lack of communication among family.pptx von ahmed164023
Lack of communication among family.pptxLack of communication among family.pptx
Lack of communication among family.pptx
ahmed16402315 views
4_4_WP_4_06_ND_Model.pptx von d6fmc6kwd4
4_4_WP_4_06_ND_Model.pptx4_4_WP_4_06_ND_Model.pptx
4_4_WP_4_06_ND_Model.pptx
d6fmc6kwd47 views
Product Research sample.pdf von AllenSingson
Product Research sample.pdfProduct Research sample.pdf
Product Research sample.pdf
AllenSingson33 views

Data Contracts: Consensus as Code - Pycon 2023

  • 1. Data Contracts Consensus as Code Ryan Collingwood 2023-08-18
  • 2. Who am I and my current context • Ryan Collingwood, Head of Data & Analytics at Oroton • Australia’s oldest luxury fashion company • Centralised Data Team • Monoliths (ERP & POS) surrounded by number of SaaS • Data is mostly moved in batch
  • 3. Why I think you might care about this Responsibility in the modern data stack Andrew Jones - Driving Data Quality with Data Contracts (2023)
  • 4. Shout out to Andrew Jones https://data-contracts.com/
  • 5. Similar, Related, and Complementary Concepts APIs Data Dictionaries Data Mesh Event Storming I’d be curious to know what else you might add to this list Data Catalogs Domain Driven Design
  • 6. Advice is a form of nostalgia. Dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than it's worth Mary Schmich https://www.chicagotribune.com/columns/chi-schmich-sunscreen-column-column.htm “If I could offer you only one tip for the future, sunscreen would be it.”
  • 8. ... outlines how data can get exchanged between two parties. It defines the structure, format, and rules of exchange in a distributed data architecture. These formal agreements make sure that there aren’t any uncertainties or undocumented assumptions about data. https://atlan.com/data-contracts/ ... is an agreed interface between the generators of data and its consumers. It sets the expectations around that data, defines how it should be governed, and facilitates the explicit generation of quality data that meets the business requirements. Andrew Jones - Driving Data Quality with Data Contracts (2023)
  • 9. Data Producers and Data Consumers Team A Team B Team C
  • 10. You can be a Data Producer without knowing about it Non-consensual API Team C ��
  • 11. Broken pipelines, broken non-promises Non-consensual API Non-consensual API Non-consensual API 🧰󰠼 ❌ Team A Team C �� Team B
  • 12. One of the largest impediments to addressing data quality at any organization is the lack of collaboration between data producers and data consumers. ... A common workaround (is the) proliferation of non-consensual APIs. Can’t get a software engineer to emit the data you need to solve some business problem? Connect your ELT tool to a production source and extract a batch dump on a schedule. Easy (Until things start breaking…whoops). Chad Sanderson - https://dataproducts.substack.com/p/the-production-grade-data-pipeline
  • 13. What makes up a Data Contract https://github.com/PacktPublishing/Driving-Data-Quality-with-Data-Contracts/blob/main/Chapter03/order_events.yaml
  • 14. However, data contracts are more than just a schema... we need our data contracts to capture metadata that describes how the data can be used, how it is governed, and the controls around the data Driving Data Quality with Data Contracts - Andrew Jones (2023)
  • 15. What makes up a Data Contract Schema Contract Governance Semantics Service Level Objectives Dataset Governance Mechanisms of Transmission People
  • 16. Schema versus Semantics Schema Semantics Systems interoperability Human Expectations Support for Implicit Validation by Database Technologies Tends to require Explicit Validation by complimentary solutions Ensuring we capture and retrieve the data consistently Ensuring we interpret the data consistently Dates / times, monetary values - are a trap if considered only as schema. What are your “schema” but “secretly semantic” situations?
  • 17. Minimum Viable Data Contract Tooling Andrew Jones - Driving Data Quality with Data Contracts (2023) Operate
  • 18. Meta-Data Powered Tooling Andrew Jones - Driving Data Quality with Data Contracts (2023)
  • 19. Data Quality Checks Andrew Jones - Driving Data Quality with Data Contracts (2023)
  • 20. Data Contract Tooling - My Context
  • 21. Data Contract Tooling - My Context Producer Boundaries
  • 24. Ok so how are we going to make this all happen? Awesome humans who understand models, abstractions, constraints You could even do it in ✨code ✨ ... and you should definitely version control it
  • 25. Why Code? Why not Text? ● Entanglement of meaning and representation ● Finding References instead of text matches ● Enforcement of structure ● Refactoring ● Testable constraints ● More options for document generation ○ Including JSON and yaml Although... I’ve been having a blast using Logseq (a graph like outliner) and I might be crazy enough to give that a go as an IDE for this
  • 28. Guiding Principles ● Primary Objective: Consensus ● Evolution ● Quick Feedback ● First Outcome: Data Tests
  • 29. Creating a Meta Model ● Focused around Events ● From UI to DB ● Schema and Semantics ● People ... still figuring it out Don’t have to do it all at once!
  • 31. The optimistic path to capturing and generating contracts
  • 32. The Event Capture spreadsheet
  • 33. Who’s Going to Do The Work? Andrew Jones - Driving Data Quality with Data Contracts (2023) Probably these people Hopefully these people
  • 34. Why Python? ● Gradual Typing* ● Static Analysis ● Well understood within the team
  • 35. Helpful Python Libraries ● Pandas ● Pydantic ● Rope ● Pytest ● Mypy ● Black
  • 38. Refactoring, doing variable extraction with Rope https://colab.research.google.com/drive/1fHLit3hF2G0dFV0Xl11jnovcdPR87s-E
  • 39. Refactoring, doing variable extraction with Rope https://colab.research.google.com/drive/1fHLit3hF2G0dFV0Xl11jnovcdPR87s-E
  • 40. Code Refactoring - Other Libraries • https://pybowler.io/ - doesn't have variable extraction and not much development activity in the last while • https://github.com/hchasestevens/astpath - useful for finding parts of the AST but then I'm not sure how to proceed with it, seems to be powering a number of meta-programming libs though • traad - https://av.tib.eu/en/media/19947
  • 41. Further explorations for wrangling generated code • Abstract Syntax Tree - Options for querying • Linting - Define my own rules to as they apply to the meta schema • Code duplication detection • Network (Graph) Analysis
  • 42. linkedin.com/in/ryancollingwood mastodon.social/@ryancollingwood twitter.com/ryancollingwood www.meetup.com/en-AU/data-engineering-melbourne • You can be a Data Producer without knowing about it, make it worthwhile for Consumers to “register” with you • You can do this through having a contract which provides clarity and can be used to power tooling and generate artefacts • Code is easier to refactor, find references, and generally maintain than the alternatives Key Takeaways
  • 43. My References • Andrew Jones - Driving Data Quality with Data Contracts (2023) - ISBN 13 978-1837635009 • Data Contracts: The Key to Scaling Distributed Data Architecture and Reducing Data Chaos - https://atlan.com/data-contracts/ • Chad Sanderson - The Production-Grade Data Pipeline - https://dataproducts.substack.com/p/the-production-grade-data-pipeline • Chad Sanderson and Adrian Kreuziger - An Engineers Guide to Data Contracts - https://mlops.community/an-engineers-guide-to-data-contracts-pt-1/ • Green Tree Snakes the missing Python AST docs - https://greentreesnakes.readthedocs.io/en/latest/ • Rope - Refactoring Variable Extraction - https://rope.readthedocs.io/en/latest/library.html#performing-refactorings