SlideShare a Scribd company logo
1 of 33
Download to read offline
Composition in ML:
in Models, Tools, and Teams
ODSC West - Nov 16, 2021
Dr. Bryan Bischof
– Head of Data Science @ Weights and Biases –
1
In collaboration with Dr. Eric Bunch
Email: bryan.bischof@gmail.com
What is composition?
2
Definition
Compositionality, also known as Fregeʼs principle, states that the
meaning of a complex expression is determined by
1. the meanings of its constituent parts,
and
2. the rules for how those parts are combined.
3
c.f. Fong, Spivak, 2018
Model Composition
4
Examples
Matrix Factorization–or more specifically Singular Value Decomposition–is an
extremely popular latent factor model for recommendation systems. Recall that given a
user-item matrix with rating elements:
We wish to approximate this matrix via training; our approximation technique is to
factorize the matrix into three parts:
- U: representing the relationship between users and latent factors
- 𝚺: describing the strength of each latent factor
- V: indicating the similarity between items and latent factors
5
c.f. Koren, Bell, Volinsky, 2009
Definition
So sometimes,
ʻconstituent partsʼ are metric embeddings
and
ʻhow they are combinedʼ is linear-algebraically.
6
Examples
Seasonal Average Pooling–and other composite forecasting methods–are extremely
simple forecasting methods utilizing repeated model fitting on residuals-of-residuals.
For example letʼs build a univariate forecasting model for a series using only seasonal
components; during the training sequence f(t), consider Month-of-year, Week-of-month,
and Day-of-week as categorical features on each day; and consider
7
Definition
So sometimes,
ʻconstituent partsʼ are pooling layers
and
ʻhow they are combinedʼ is a recursive residual additive process.
8
Examples
Boosted Trees are an ensemble of trees fit via sequentially fit decision trees on the
residuals of the iteratively composed models. In particular, the model at each iteration
is the weighted sum of the iʼth tree, fit on the residuals of the i-1ʼth tree:
with learnable weighting parameters we get a powerful learner!
9
c.f. Friedman, 2009
Definition
So sometimes,
ʻconstituent partsʼ are weighted learners
and
ʻhow they are combinedʼ is recursive additively.
10
Examples
Foundational models–pretrained models combined with downstream task-specific
training–is becoming ubiquitous in deep learning research and applications.
11
c.f. Standley et. al., 2020, Li, Hoiem, 2017
There are numerous architectures for
model transfer. Some of the most exciting
in my opinion are those which jointly train
multiple downstream tasks, e.g. overall
network performance via minimization of
the aggregate loss over all tasks:
Definition
So sometimes,
ʻconstituent partsʼ are parameters trained via learning
and
ʻhow they are combinedʼ is a layer composition and loss sharing.
12
Examples
Equivariant Globally Natural DL–or Graph DL with invariance up to graph
isomorphism–pushes the emerging domain of graph learning to not only accommodate
global isomorphisms, but those built from local mappings.
13
c.f. Haan, Cohen, Welling, 2021
In this example the compositional structure is more
obvious, but nonetheless essential to the
formulation.
Node features may be embedded onto edge
features, and passed into convolution as normal
GNNs.
Definition
So sometimes,
ʻconstituent partsʼ are action mappings on the data structure
and
ʻhow they are combinedʼ is function composition and kernel
convolution.
14
Ok, ok. So composition is much a part of the
structural modeling we do as Machine Learning
practitioners.
But Iʼm more on the applied side...
15
Compositional tools
16
YAFPT? YAMST?
I want to sell you an ML pipeline:
- Itʼs comprised of pure components, i.e. they return the same output every time
from the same input, and have no side effects
- It is higher order–each component provides APIʼs for a function
- It is composable, i.e. theyʼre easily combined via knowledge only of the types of
their inputs and outputs
- It is curriable–providing a fixed set of parameters and inputs allows you to execute
the entire pipeline.
Are you buying? These happen to align with the core principles of Functional
Programming, but also Micro-services. Why does MLOps care about these?
17
c.f. fklearn
Models? Data? No!
Andrew Ng, has recently been proselytizing the gains of a data-centric approach to AI.
He rightly recognizes both the effectiveness of data improvement and preparation, and
systematic attention to the data that your product is built on.
In particular he identifies, correctly, that one formulation of the data pipeline is as
follows:
And he rightly identifies the importance of those backwards arrows this flow. But...
18
c.f. From Model-centric to Data-centric AI
Right answer; wrong test.
Dr. Ngʼs recommendation:
Donʼt: Hold the data fixed and iteratively improve the Model,
Hold the code fixed and iteratively improve the data.
While I deeply appreciate this suggestion to be modular and flexible, it aims too low!
The recommendation from compositional thinking:
Hold the (composition) fixed and iteratively improve (one component).
i.e. Pipeline-centric AI!
19
Itʼs about the process
Data changes, but so do the other components!
The needs of the data change, the expectations of the model change, the objective
functions change, the sources change, etc. If your focus only on the data, youʼre
focusing too closely on the short term goals, and over-constraining your solution.
By instead making primary the data transformations, data assumptions, and
compositions (input and output types).
This allows you to rapidly iterate at multiple locations across the stack where you see
the most opportunity.
20
YAAICP
Letʼs bring in yet another AI catch-phrase:
the data flywheel.
What makes sense about this analogy is the implication
that the inertia of the spinning wheel, ramps up.
In the data flywheel strategy, data products provide
personalization and insight to drive more customer
interactions which may be converted back into
learnable structures.
Notice here the focus on composition!
21
c.f. Matt Turck, Building an AI Startup
Letʼs look at a real ML system architecture
Consider this incredible
overview of just about
every RecSys out there.
This diagram is
data-structure,
infrastructure, and model
architecture agnostic!
And yet, via only the
composition rules, we
have a full system design. 22
c.f. Higley, Oldridge, 2021, Yan, 2021
Is there anything that can help?
MLOps is a somewhat nascent field focused on the overall structure of ML products
and pipelines.
Technology is beginning to be developed around these needs, both to manage the
components of a pipeline centric system, and to execute the type alignment.
People are starting to align on explicit composition coherence:
23
c.f. Shreya Shankar, 2021
And some of us are building the platform
Like these compositional pipelines, our platform is built of components
24
c.f. Weights and Biases
and our platform handles the coherence.
In practice
Machine learning engineers
can avoid writing glue code,
and assert statements, and
drift monitors, and hard
coding url-slugs, and reading
local data into dataloaders,
and training loops, and
ensemble dags, and can get
back to focusing on the data,
the models, and the tasks.
25
c.f. W&B Launch
Donʼt start from scratch
In their Dota 2 challenge landmark paper, the OpenAI team described an essential
component in their mission to train better and better models:
In order to train without restarting from the beginning after each change, we developed
a collection of tools to resume training with minimal loss in performance which we call
surgery.... we performed approximately one surgery per two weeks.
If your dog hasnʼt learned to catch a frisbee by the time theyʼre six weeks old, donʼt get a
new dog–get a new training methodology. 🐕
Composable tools allow you to swap in and out your strategies wherever necessary.
26
c.f. OpenAI, Dota 2, 2019, OpenAI & Weights and Biases
My ML products donʼt look like this!
Well Andrej Karpathyʼs do 󰤇
27
c.f. Karpathy, ICML 2019
Composable teams
28
Thereʼs more?
An even bigger challenge than building effective ML systems is building effective
team structures to support the people who can build those systems.
- What is the right team architecture to enable people to do their best work, and
yet provide opportunities for growth?
- How do you create robustness to team departures, vacations, or burnout?
29
Take from engineering
In much of the above, we took engineeringʼs learning as a foundation and built on
top of that. Here too, we can take away important lessons:
- Atomic tasks, clearly specced
- Assignee agnostic tasks
- PR processes
- Component expertise
While being a full-stack data scientist creates plenty of opportunity for innovation,
over time that stack owns you, and buckles you into a full-time maintenance role.
30
c.f. Eric Colson, 2019
Focus on the relationships
Like our components and interfaces throughout this talk, we as ML practitioners
should–at any given time–focus on executing one task.
We should be given clear inputs and expectations for our outputs.
And we should understand how to communicate and exchange with others.
When it comes time for someone else to work on this task, it should be frictionless
and context rich. With clear documentation of whatʼs been done, and a system of
record for how to reproduce it.
31
c.f. Collaborative Reports
One more Karpathy reference
Karpathyʼs team on self-driving was distributed
over many components of a
massively-multitask problem. In addition to
adversarial collaboration, he generally found
difficulty in optimizing how to compose their
efforts.
Maybe he should try back-propagation to learn
a better weighting. 󰤇
32
c.f. Karpathy, ICML 2019
Thanks!
Check out W&Bʼs composable tools at:
Wandb.ai
Totally free for individuals & academics.
Come chat with us at our booth today and tomorrow, or email contact@wandb.ai. 33

More Related Content

Similar to ODSC West 2021 – Composition in ML

01. Birta L. G., Arbez G. - Modelling and Simulation_ (2007).pdf
01. Birta L. G., Arbez G. - Modelling and Simulation_  (2007).pdf01. Birta L. G., Arbez G. - Modelling and Simulation_  (2007).pdf
01. Birta L. G., Arbez G. - Modelling and Simulation_ (2007).pdf
AftaZani1
 
CSE333 project initial spec: Learning agents
CSE333 project initial spec: Learning agentsCSE333 project initial spec: Learning agents
CSE333 project initial spec: Learning agents
butest
 
CSE333 project initial spec: Learning agents
CSE333 project initial spec: Learning agentsCSE333 project initial spec: Learning agents
CSE333 project initial spec: Learning agents
butest
 
Mca1040 system analysis and design
Mca1040  system analysis and designMca1040  system analysis and design
Mca1040 system analysis and design
smumbahelp
 
Course Description Considering that an organization’s peopl
Course Description   Considering that an organization’s peoplCourse Description   Considering that an organization’s peopl
Course Description Considering that an organization’s peopl
CruzIbarra161
 
New England CollegeFI6315 Managerial FinanceSummer I 2019P.docx
New England CollegeFI6315 Managerial FinanceSummer I 2019P.docxNew England CollegeFI6315 Managerial FinanceSummer I 2019P.docx
New England CollegeFI6315 Managerial FinanceSummer I 2019P.docx
vannagoforth
 
Quantitative management
Quantitative managementQuantitative management
Quantitative management
smumbahelp
 

Similar to ODSC West 2021 – Composition in ML (20)

01. Birta L. G., Arbez G. - Modelling and Simulation_ (2007).pdf
01. Birta L. G., Arbez G. - Modelling and Simulation_  (2007).pdf01. Birta L. G., Arbez G. - Modelling and Simulation_  (2007).pdf
01. Birta L. G., Arbez G. - Modelling and Simulation_ (2007).pdf
 
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
 
lecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptxlecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptx
 
Notes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleNotes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at Scale
 
D046031927
D046031927D046031927
D046031927
 
CSE333 project initial spec: Learning agents
CSE333 project initial spec: Learning agentsCSE333 project initial spec: Learning agents
CSE333 project initial spec: Learning agents
 
CSE333 project initial spec: Learning agents
CSE333 project initial spec: Learning agentsCSE333 project initial spec: Learning agents
CSE333 project initial spec: Learning agents
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learning
 
Mca1040 system analysis and design
Mca1040  system analysis and designMca1040  system analysis and design
Mca1040 system analysis and design
 
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
 
Week 3 data journey and data storage
Week 3   data journey and data storageWeek 3   data journey and data storage
Week 3 data journey and data storage
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification System
 
Course Description Considering that an organization’s peopl
Course Description   Considering that an organization’s peoplCourse Description   Considering that an organization’s peopl
Course Description Considering that an organization’s peopl
 
New England CollegeFI6315 Managerial FinanceSummer I 2019P.docx
New England CollegeFI6315 Managerial FinanceSummer I 2019P.docxNew England CollegeFI6315 Managerial FinanceSummer I 2019P.docx
New England CollegeFI6315 Managerial FinanceSummer I 2019P.docx
 
gn-160406200425 (1).pdf
gn-160406200425 (1).pdfgn-160406200425 (1).pdf
gn-160406200425 (1).pdf
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse design
 
Quantitative management
Quantitative managementQuantitative management
Quantitative management
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 
Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)
 
Machine learning (domingo's paper)
Machine learning (domingo's paper)Machine learning (domingo's paper)
Machine learning (domingo's paper)
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

ODSC West 2021 – Composition in ML

  • 1. Composition in ML: in Models, Tools, and Teams ODSC West - Nov 16, 2021 Dr. Bryan Bischof – Head of Data Science @ Weights and Biases – 1 In collaboration with Dr. Eric Bunch Email: bryan.bischof@gmail.com
  • 3. Definition Compositionality, also known as Fregeʼs principle, states that the meaning of a complex expression is determined by 1. the meanings of its constituent parts, and 2. the rules for how those parts are combined. 3 c.f. Fong, Spivak, 2018
  • 5. Examples Matrix Factorization–or more specifically Singular Value Decomposition–is an extremely popular latent factor model for recommendation systems. Recall that given a user-item matrix with rating elements: We wish to approximate this matrix via training; our approximation technique is to factorize the matrix into three parts: - U: representing the relationship between users and latent factors - 𝚺: describing the strength of each latent factor - V: indicating the similarity between items and latent factors 5 c.f. Koren, Bell, Volinsky, 2009
  • 6. Definition So sometimes, ʻconstituent partsʼ are metric embeddings and ʻhow they are combinedʼ is linear-algebraically. 6
  • 7. Examples Seasonal Average Pooling–and other composite forecasting methods–are extremely simple forecasting methods utilizing repeated model fitting on residuals-of-residuals. For example letʼs build a univariate forecasting model for a series using only seasonal components; during the training sequence f(t), consider Month-of-year, Week-of-month, and Day-of-week as categorical features on each day; and consider 7
  • 8. Definition So sometimes, ʻconstituent partsʼ are pooling layers and ʻhow they are combinedʼ is a recursive residual additive process. 8
  • 9. Examples Boosted Trees are an ensemble of trees fit via sequentially fit decision trees on the residuals of the iteratively composed models. In particular, the model at each iteration is the weighted sum of the iʼth tree, fit on the residuals of the i-1ʼth tree: with learnable weighting parameters we get a powerful learner! 9 c.f. Friedman, 2009
  • 10. Definition So sometimes, ʻconstituent partsʼ are weighted learners and ʻhow they are combinedʼ is recursive additively. 10
  • 11. Examples Foundational models–pretrained models combined with downstream task-specific training–is becoming ubiquitous in deep learning research and applications. 11 c.f. Standley et. al., 2020, Li, Hoiem, 2017 There are numerous architectures for model transfer. Some of the most exciting in my opinion are those which jointly train multiple downstream tasks, e.g. overall network performance via minimization of the aggregate loss over all tasks:
  • 12. Definition So sometimes, ʻconstituent partsʼ are parameters trained via learning and ʻhow they are combinedʼ is a layer composition and loss sharing. 12
  • 13. Examples Equivariant Globally Natural DL–or Graph DL with invariance up to graph isomorphism–pushes the emerging domain of graph learning to not only accommodate global isomorphisms, but those built from local mappings. 13 c.f. Haan, Cohen, Welling, 2021 In this example the compositional structure is more obvious, but nonetheless essential to the formulation. Node features may be embedded onto edge features, and passed into convolution as normal GNNs.
  • 14. Definition So sometimes, ʻconstituent partsʼ are action mappings on the data structure and ʻhow they are combinedʼ is function composition and kernel convolution. 14
  • 15. Ok, ok. So composition is much a part of the structural modeling we do as Machine Learning practitioners. But Iʼm more on the applied side... 15
  • 17. YAFPT? YAMST? I want to sell you an ML pipeline: - Itʼs comprised of pure components, i.e. they return the same output every time from the same input, and have no side effects - It is higher order–each component provides APIʼs for a function - It is composable, i.e. theyʼre easily combined via knowledge only of the types of their inputs and outputs - It is curriable–providing a fixed set of parameters and inputs allows you to execute the entire pipeline. Are you buying? These happen to align with the core principles of Functional Programming, but also Micro-services. Why does MLOps care about these? 17 c.f. fklearn
  • 18. Models? Data? No! Andrew Ng, has recently been proselytizing the gains of a data-centric approach to AI. He rightly recognizes both the effectiveness of data improvement and preparation, and systematic attention to the data that your product is built on. In particular he identifies, correctly, that one formulation of the data pipeline is as follows: And he rightly identifies the importance of those backwards arrows this flow. But... 18 c.f. From Model-centric to Data-centric AI
  • 19. Right answer; wrong test. Dr. Ngʼs recommendation: Donʼt: Hold the data fixed and iteratively improve the Model, Hold the code fixed and iteratively improve the data. While I deeply appreciate this suggestion to be modular and flexible, it aims too low! The recommendation from compositional thinking: Hold the (composition) fixed and iteratively improve (one component). i.e. Pipeline-centric AI! 19
  • 20. Itʼs about the process Data changes, but so do the other components! The needs of the data change, the expectations of the model change, the objective functions change, the sources change, etc. If your focus only on the data, youʼre focusing too closely on the short term goals, and over-constraining your solution. By instead making primary the data transformations, data assumptions, and compositions (input and output types). This allows you to rapidly iterate at multiple locations across the stack where you see the most opportunity. 20
  • 21. YAAICP Letʼs bring in yet another AI catch-phrase: the data flywheel. What makes sense about this analogy is the implication that the inertia of the spinning wheel, ramps up. In the data flywheel strategy, data products provide personalization and insight to drive more customer interactions which may be converted back into learnable structures. Notice here the focus on composition! 21 c.f. Matt Turck, Building an AI Startup
  • 22. Letʼs look at a real ML system architecture Consider this incredible overview of just about every RecSys out there. This diagram is data-structure, infrastructure, and model architecture agnostic! And yet, via only the composition rules, we have a full system design. 22 c.f. Higley, Oldridge, 2021, Yan, 2021
  • 23. Is there anything that can help? MLOps is a somewhat nascent field focused on the overall structure of ML products and pipelines. Technology is beginning to be developed around these needs, both to manage the components of a pipeline centric system, and to execute the type alignment. People are starting to align on explicit composition coherence: 23 c.f. Shreya Shankar, 2021
  • 24. And some of us are building the platform Like these compositional pipelines, our platform is built of components 24 c.f. Weights and Biases and our platform handles the coherence.
  • 25. In practice Machine learning engineers can avoid writing glue code, and assert statements, and drift monitors, and hard coding url-slugs, and reading local data into dataloaders, and training loops, and ensemble dags, and can get back to focusing on the data, the models, and the tasks. 25 c.f. W&B Launch
  • 26. Donʼt start from scratch In their Dota 2 challenge landmark paper, the OpenAI team described an essential component in their mission to train better and better models: In order to train without restarting from the beginning after each change, we developed a collection of tools to resume training with minimal loss in performance which we call surgery.... we performed approximately one surgery per two weeks. If your dog hasnʼt learned to catch a frisbee by the time theyʼre six weeks old, donʼt get a new dog–get a new training methodology. 🐕 Composable tools allow you to swap in and out your strategies wherever necessary. 26 c.f. OpenAI, Dota 2, 2019, OpenAI & Weights and Biases
  • 27. My ML products donʼt look like this! Well Andrej Karpathyʼs do 󰤇 27 c.f. Karpathy, ICML 2019
  • 29. Thereʼs more? An even bigger challenge than building effective ML systems is building effective team structures to support the people who can build those systems. - What is the right team architecture to enable people to do their best work, and yet provide opportunities for growth? - How do you create robustness to team departures, vacations, or burnout? 29
  • 30. Take from engineering In much of the above, we took engineeringʼs learning as a foundation and built on top of that. Here too, we can take away important lessons: - Atomic tasks, clearly specced - Assignee agnostic tasks - PR processes - Component expertise While being a full-stack data scientist creates plenty of opportunity for innovation, over time that stack owns you, and buckles you into a full-time maintenance role. 30 c.f. Eric Colson, 2019
  • 31. Focus on the relationships Like our components and interfaces throughout this talk, we as ML practitioners should–at any given time–focus on executing one task. We should be given clear inputs and expectations for our outputs. And we should understand how to communicate and exchange with others. When it comes time for someone else to work on this task, it should be frictionless and context rich. With clear documentation of whatʼs been done, and a system of record for how to reproduce it. 31 c.f. Collaborative Reports
  • 32. One more Karpathy reference Karpathyʼs team on self-driving was distributed over many components of a massively-multitask problem. In addition to adversarial collaboration, he generally found difficulty in optimizing how to compose their efforts. Maybe he should try back-propagation to learn a better weighting. 󰤇 32 c.f. Karpathy, ICML 2019
  • 33. Thanks! Check out W&Bʼs composable tools at: Wandb.ai Totally free for individuals & academics. Come chat with us at our booth today and tomorrow, or email contact@wandb.ai. 33