SlideShare ist ein Scribd-Unternehmen logo
1 von 74
Some Things I Wish I Had Known
Before Scaling Machine Learning
Solutions
Invector Labs
Today’s
session is
about
differentiating
BS from
reality…
Agenda
• Myths and realities of machine learning solutions in the real world
• 15 Lessons I learned when building large scale machine learning
systems
• Challenge
• What we learned?
• Solution
The different
dimensions of
machine
intelligence
solutions…
We can discuss the theoretical definitions or,
instead, focus on the pragmatic one…
But the reality
remains that
building machine
learning
solutions
remains brutally
difficult
But not just because of the obvious reasons…
Challenges of Machine Learning in the Real
World
High
Technological
Barrier
Limited
Talent
Availability
Labeled
Datasets
Cost
…
A lifecycle
we haven’t
seen
before…
We are dealing with a new app lifecycle…
Traditional App Lifecycle Machine Learning App
Lifecycle
Experimentation
Model Creation
Training
Testing
Regularization
Deployment
Monitoring
Optimization
Design Implementation Deployment
Management/
Monitoring
The
Ecosystem
is Incredibly
Crowded
The Aspects of a Machine Learning Solution
that will Drive You Crazy
Strategy &
Processes
Data Engineering
Experimentation Model Training
Model
Operationalization
Runtime
Execution
Security
Lifecycle
Management
Optimization …
Lessons
learned when
building high
scale machine
learning
solutions…
Strategy & Processes…
Lesson #1:
Data
scientists
make horrible
engineers…
Challenges Data scientists are great at experimentation
Not so much at writing high quality code
Experimentation deep learning frameworks
don’t necessarily make great production
frameworks, ex: PyTorch vs. TensorFlow
Some Ideas to Consider
•Write notebooks and
experimentation
models
Data Science
Team
•Refactor or rewrite
models for production
environments
•Automate training
and optimization jobs
Engineering
Team •Deploy models
•Monitor, retrain, and
optimize models
DevOps Teams
• Divide data science and
data engineering teams
Lesson #2
Neither Agile nor
Waterfall
Methodologies
Work in Machine
Learning
Challenges Waterfall methods don’t work
because you rarely know what
machine learning methods are
going to work for a specific problem
Agile methods don’t work because
you need very specific
requirements
Some Ideas to Consider
Agile Waterfall Agile
• Split the
development
lifecycle into agile
and waterfall
iterations
Data Engineering…
Lesson # 3 :
Feature
extraction can
become a
reusability
nightmare…
Challenges Different models require the same
features from a dataset
Feature extraction jobs are
computationally expensive
Different teams create proprietary
ways to capture and store feature
information
Some Ideas to Consider
Dataset Preparation
Job1
Dataset Preparation
Job2
Dataset Preparation
JobN
Representation
Learning Task1
Representation
Learning Task1
Representation
Learning Task1
Feature
Store
Model 1
Model N
 Implement a centralized
feature store
 Leverage
representation learning
to extract relevant
features from a dataset
 Look for reference
architectures: ex:
Uber’s Michelangelo
Lesson #4 :
Data labeling is
so easy to
underestimate
Challenges Data experts spend a lot of time
labeling datasets
The logic for data labeling is often not
reusable
Subjective data labeling strategy fail to
differentiate between useful and
useless features
Some Ideas to Consider
 Implement an
automated data
labeling strategy
 Generative learning can
help to structure more
effective labels
 Project Snorkel is one of
the leading automated
data labeling
frameworks in the
market
Model Experimentation…
Lesson #5: The
single machine
learning
framework
fallacy
Challenges Enterprises like to standardize on a
single machine learning framework
Different teams have different
technology preferences
Providing a consistent machine learning
platform across different machine
learning frameworks is no easy task
Some Ideas to Consider
Experimentation
Framework
Intermediate
Representation
Production
Framework
 Optimize for productivity, not
consistency
 Enable enough flexibility to
leverage different frameworks for
experimentation and production
 ONNX is a great solution for
intermediate representations
Lesson #6: Too
much time
going from
notebooks to
production
programs
Challenges Notebooks are ideal for model
experimentation and testing
Notebooks typically have performance
challenges when executed at scale
Scaling Notebook environments can be
challenging
Parametrizing Notebook executions is
far from trivial
Some Ideas To Consider
• Jupyter,
Zeppelin
Model
Experimentation
• Papermill
• Netflix’s
Meson
Scheduling
Notebooks • Docker
Containers
• Kubernetes
Running
Complex
Workflows
 Enable an infrastructure to
operationalize data science
notebooks
 Use containers for the most
complex machine learning
workflows
Lesson #7:
Model
selection can
be a machine
learning
problem
Challenges Data scientists make very subjective
decisions when comes to model
selection
The same problem can be solved using
different machine learning models
Very often is almost impossible to
differentiate between similar models
Some Ideas To Consider
 Represent machine learning
requirements as a dataset
with an objective attribute
 Leverage AutoML-based
techniques for model
selection
Problem
Dataset
AutoML
Proposed
Models
Machine learning training…
Lesson #8:
Training is
a
continuous
task…
Challenges The No Free Lunch Theorem
Trained models can perform poorly
against new datasets
New engineers and DevOps need to
understand how to re-train existing
models
Some Ideas to Consider
DataLake
Data Outcomes/Feature
Store
Training Job1
Training Job2
Training JobN
 Automate Training Jobs
 Orchestrate scheduled
execution of training jobs
Lesson #9:
Training
should be
incremental…
Challenges Training machine learning models can
be computationally expensive
Most machine learning models need to
be retrained entirely based on the
arrival of new data
Its nearly impossible to quantify the
impact that new datasets have in the
performance of a model
Some Ideas to Consider
 Implement continual
learning models
 Consider transfer learning
as a fundamental enabler
Lesson #10:
Training a
model requires
as much
coding as
creating it…
Challenges Data engineers spend a lot of time
writing training routines for machine
learning models
Comparing the performance of different
models on the same datasets remains
tricky
Changes on a training dataset often
imply changes on the training code
Some Ideas to Consider
 Explore a configuration-
driven training process
 Uber’s Ludwig is an
innovative, no-code
framework for training
machine learning models
Executing Machine Learning Models…
Lesson #11:
Different models
require different
execution
patterns…
Challenges Not all models can be executed via APIs
Some models take a long time to run
In some scenarios, different models
need to be executed at the same time
based on a specific condition
Some Ideas to Consider
Scheduled
Activation
Model Model
Pub-Sub
Activation
Model Model
On-Demand
Activation
Model Model
Model API
Gateway
Event
Gateway Enable different
execution modes based
on client’s requirements
Lesson #12:
Mobile deep
learning is
more
complicated
than you think
Challenges Centralized cloud deep learning models don’t
scale
On-device deep learning models are hard to
distribute and train
Tons of privacy challenges
Some Ideas to Consider
 Consider using
federated learning
or similar patterns
for mobile based
machine learning
Machine Learning Operationalization…
Lesson
#13:
Debugging
is a
nightmare
Challenges The accuracy-interpretability friction
The unpredictability factor
Limited toolset
Some Ideas to Consider
•Use tools like
TensorBoard to
visualize the structure
of neural networks
Visualize the Network
and its Results
•High training error is a
sign of underfitting
•High test error and
low training error is a
sign of overfitting
Compare Training and
Test Errors •Helps to determine
whether the error is in
the code or in the data
Test with Small
Datasets
•Monitor the number
of activations in
hidden units
Monitor Activations
and Gradient Values
Understanding How
Nodes are Activated
Understanding what
Hidden Layers Do
Understanding How
Concepts are Formed
Interpretability
 Establish systematic
practices to debug
machine learning
models
 Onboard modeling
visualization and
interpretability tools
Security…
Lesson #14:
Machine
learning
models are so
easy to hack
Challenges Most neural networks are vulnerable to
adversarial attacks
Attackers don’t need access to the models but
can simply manipulate input datasets
Most of the times adversarial attacks go
undetected
Some Ideas to Consider
 Test your neural
networks for
adversarial robustness
 IBM’s adversarial
robustness toolbox is
one of the leading
stacks in neural
network security
Lesson # 15:
Data privacy
is the
elephant in
the machine
learning room
Challenges Machine learning models intrinsically build
knowledge about private datasets
Most machine learning techniques require
clear access to data which, in many cases,
contains sensitive information
There are no established techniques to
evaluating the privacy robustness of machine
learning models
Some Ideas to Consider
 Private machine learning is
an emerging area of
research
 Leverage techniques such
as secured multi-party
computations or zero-
knowledge-proofs to
obfuscate training datasets
 PySyft is an emerging
framework to enable
privacy in machine learning
models
Some not-well-known, reference
architectures that might help…
DAWN Project from Stanford University Michelangelo from Uber
MLFlow from DataBricks
FBLearner from Facebook
TFX from Google
The challenges go beyond the obvious…
Three Foundational Challenges for the
Mainstream Adoption of Machine Learning
Lowering the Technological Entry Point
• Can mainstream developers embrace machine learning stacks?
Talent Availability
• Can companies and governments nurture local data science
talent?
Data Democratization
• Can rich datasets stop being a privilege of large corporations
and governments ?
Some Initiatives to Consider
Lowering the Technological Entry Point
• AutoML, low-code machine learning frameworks
Talent Availability
• Google AI Academy, Coursera, Udacity…
Data Democratization
• Decentralized AI platforms
Summary
• Implementing machine learning solutions in the real world remains
incredibly challenging
• There is a large gap between the advancements in AI research and the
practical viability of those techniques
• Machine learning applications require a new lifecycle different from
traditional software models
• Each aspect of that lifecycle brings a unique set of challenges
• Start small, iterate…
Thanks
jr@invectoriq.com
jr@intotheblock.io
https://medium.com/@jrodthoughts
https://twitter.com/jrdothoughts

Weitere ähnliche Inhalte

Was ist angesagt?

Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101QuantUniversity
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Temporal EMF: A temporal metamodeling platform
Temporal EMF: A temporal metamodeling platformTemporal EMF: A temporal metamodeling platform
Temporal EMF: A temporal metamodeling platformJordi Cabot
 

Was ist angesagt? (8)

Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
Ai use cases
Ai use casesAi use cases
Ai use cases
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at Scale
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Temporal EMF: A temporal metamodeling platform
Temporal EMF: A temporal metamodeling platformTemporal EMF: A temporal metamodeling platform
Temporal EMF: A temporal metamodeling platform
 

Ähnlich wie 15 Lessons I Learned Before Scaling ML Solutions

Machine learning: A Walk Through School Exams
Machine learning: A Walk Through School ExamsMachine learning: A Walk Through School Exams
Machine learning: A Walk Through School ExamsRamsha Ijaz
 
Managing Data Science Projects
Managing Data Science ProjectsManaging Data Science Projects
Managing Data Science ProjectsDanielle Dean
 
Webinar: Design Patterns : Tailor-made solutions for Software Development
Webinar: Design Patterns : Tailor-made solutions for Software DevelopmentWebinar: Design Patterns : Tailor-made solutions for Software Development
Webinar: Design Patterns : Tailor-made solutions for Software DevelopmentEdureka!
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflowCharmi Chokshi
 
Design Patterns - The Ultimate Blueprint for Software
Design Patterns - The Ultimate Blueprint for SoftwareDesign Patterns - The Ultimate Blueprint for Software
Design Patterns - The Ultimate Blueprint for SoftwareEdureka!
 
20121121101127simulation azmi
20121121101127simulation azmi20121121101127simulation azmi
20121121101127simulation azmiAhmad Nur Faiz
 
Module_1_Slide_01.pdf
Module_1_Slide_01.pdfModule_1_Slide_01.pdf
Module_1_Slide_01.pdfFazleeKan
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for TestingSQALab
 
Supervised learning
Supervised learningSupervised learning
Supervised learningankit_ppt
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MAHIRA
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning CCG
 
Artificial Intelligence with Python | Edureka
Artificial Intelligence with Python | EdurekaArtificial Intelligence with Python | Edureka
Artificial Intelligence with Python | EdurekaEdureka!
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsInductive Automation
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...Edge AI and Vision Alliance
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptxDr. Amanpreet Kaur
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or realityAwantik Das
 
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)WeCloudData
 

Ähnlich wie 15 Lessons I Learned Before Scaling ML Solutions (20)

Machine learning: A Walk Through School Exams
Machine learning: A Walk Through School ExamsMachine learning: A Walk Through School Exams
Machine learning: A Walk Through School Exams
 
Managing Data Science Projects
Managing Data Science ProjectsManaging Data Science Projects
Managing Data Science Projects
 
Webinar: Design Patterns : Tailor-made solutions for Software Development
Webinar: Design Patterns : Tailor-made solutions for Software DevelopmentWebinar: Design Patterns : Tailor-made solutions for Software Development
Webinar: Design Patterns : Tailor-made solutions for Software Development
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Design Patterns - The Ultimate Blueprint for Software
Design Patterns - The Ultimate Blueprint for SoftwareDesign Patterns - The Ultimate Blueprint for Software
Design Patterns - The Ultimate Blueprint for Software
 
Simulation Powerpoint- Lecture Notes
Simulation Powerpoint- Lecture NotesSimulation Powerpoint- Lecture Notes
Simulation Powerpoint- Lecture Notes
 
20121121101127simulation azmi
20121121101127simulation azmi20121121101127simulation azmi
20121121101127simulation azmi
 
Module_1_Slide_01.pdf
Module_1_Slide_01.pdfModule_1_Slide_01.pdf
Module_1_Slide_01.pdf
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
Artificial Intelligence with Python | Edureka
Artificial Intelligence with Python | EdurekaArtificial Intelligence with Python | Edureka
Artificial Intelligence with Python | Edureka
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
 
Vitriol
VitriolVitriol
Vitriol
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptx
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)Applied Machine Learning Course - Jodie Zhu (WeCloudData)
Applied Machine Learning Course - Jodie Zhu (WeCloudData)
 

Mehr von Jesus Rodriguez

The Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesThe Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesJesus Rodriguez
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxJesus Rodriguez
 
DeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketDeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketJesus Rodriguez
 
The Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersThe Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersJesus Rodriguez
 
Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Jesus Rodriguez
 
DeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesDeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesJesus Rodriguez
 
High Frequency Trading and DeFi
High Frequency Trading and DeFiHigh Frequency Trading and DeFi
High Frequency Trading and DeFiJesus Rodriguez
 
Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Jesus Rodriguez
 
15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics15 Minutes of DeFi Analytics
15 Minutes of DeFi AnalyticsJesus Rodriguez
 
DeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesDeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesJesus Rodriguez
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revJesus Rodriguez
 
Better Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsBetter Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsJesus Rodriguez
 
Price Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesPrice Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesJesus Rodriguez
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesJesus Rodriguez
 
Price PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningPrice PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningJesus Rodriguez
 
Demystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceDemystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceJesus Rodriguez
 
Crypto assets are a data science heaven rev
Crypto assets are a data science heaven revCrypto assets are a data science heaven rev
Crypto assets are a data science heaven revJesus Rodriguez
 
Fundamental Analysis for Crypto Assets
Fundamental Analysis for Crypto AssetsFundamental Analysis for Crypto Assets
Fundamental Analysis for Crypto AssetsJesus Rodriguez
 

Mehr von Jesus Rodriguez (20)

The Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesThe Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-Primitives
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
 
DeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketDeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto Market
 
MEV Deep Dive .pptx
MEV Deep Dive .pptxMEV Deep Dive .pptx
MEV Deep Dive .pptx
 
Quant in Crypto Land
Quant in Crypto LandQuant in Crypto Land
Quant in Crypto Land
 
The Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersThe Polygon Blockchain by the Numbers
The Polygon Blockchain by the Numbers
 
Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies
 
DeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesDeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating Strategies
 
High Frequency Trading and DeFi
High Frequency Trading and DeFiHigh Frequency Trading and DeFi
High Frequency Trading and DeFi
 
Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About
 
15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics
 
DeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesDeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and Challenges
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions rev
 
Better Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsBetter Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain Indicators
 
Price Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesPrice Predictions for Cryptocurrencies
Price Predictions for Cryptocurrencies
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About Cryptocurrencies
 
Price PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningPrice PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep Learning
 
Demystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceDemystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data Science
 
Crypto assets are a data science heaven rev
Crypto assets are a data science heaven revCrypto assets are a data science heaven rev
Crypto assets are a data science heaven rev
 
Fundamental Analysis for Crypto Assets
Fundamental Analysis for Crypto AssetsFundamental Analysis for Crypto Assets
Fundamental Analysis for Crypto Assets
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

15 Lessons I Learned Before Scaling ML Solutions

  • 1. Some Things I Wish I Had Known Before Scaling Machine Learning Solutions Invector Labs
  • 3. Agenda • Myths and realities of machine learning solutions in the real world • 15 Lessons I learned when building large scale machine learning systems • Challenge • What we learned? • Solution
  • 5. We can discuss the theoretical definitions or, instead, focus on the pragmatic one…
  • 6.
  • 7.
  • 8. But the reality remains that building machine learning solutions remains brutally difficult
  • 9. But not just because of the obvious reasons…
  • 10. Challenges of Machine Learning in the Real World High Technological Barrier Limited Talent Availability Labeled Datasets Cost …
  • 12. We are dealing with a new app lifecycle… Traditional App Lifecycle Machine Learning App Lifecycle Experimentation Model Creation Training Testing Regularization Deployment Monitoring Optimization Design Implementation Deployment Management/ Monitoring
  • 14. The Aspects of a Machine Learning Solution that will Drive You Crazy Strategy & Processes Data Engineering Experimentation Model Training Model Operationalization Runtime Execution Security Lifecycle Management Optimization …
  • 15. Lessons learned when building high scale machine learning solutions…
  • 18. Challenges Data scientists are great at experimentation Not so much at writing high quality code Experimentation deep learning frameworks don’t necessarily make great production frameworks, ex: PyTorch vs. TensorFlow
  • 19. Some Ideas to Consider •Write notebooks and experimentation models Data Science Team •Refactor or rewrite models for production environments •Automate training and optimization jobs Engineering Team •Deploy models •Monitor, retrain, and optimize models DevOps Teams • Divide data science and data engineering teams
  • 20. Lesson #2 Neither Agile nor Waterfall Methodologies Work in Machine Learning
  • 21. Challenges Waterfall methods don’t work because you rarely know what machine learning methods are going to work for a specific problem Agile methods don’t work because you need very specific requirements
  • 22. Some Ideas to Consider Agile Waterfall Agile • Split the development lifecycle into agile and waterfall iterations
  • 24. Lesson # 3 : Feature extraction can become a reusability nightmare…
  • 25. Challenges Different models require the same features from a dataset Feature extraction jobs are computationally expensive Different teams create proprietary ways to capture and store feature information
  • 26. Some Ideas to Consider Dataset Preparation Job1 Dataset Preparation Job2 Dataset Preparation JobN Representation Learning Task1 Representation Learning Task1 Representation Learning Task1 Feature Store Model 1 Model N  Implement a centralized feature store  Leverage representation learning to extract relevant features from a dataset  Look for reference architectures: ex: Uber’s Michelangelo
  • 27. Lesson #4 : Data labeling is so easy to underestimate
  • 28. Challenges Data experts spend a lot of time labeling datasets The logic for data labeling is often not reusable Subjective data labeling strategy fail to differentiate between useful and useless features
  • 29. Some Ideas to Consider  Implement an automated data labeling strategy  Generative learning can help to structure more effective labels  Project Snorkel is one of the leading automated data labeling frameworks in the market
  • 31. Lesson #5: The single machine learning framework fallacy
  • 32. Challenges Enterprises like to standardize on a single machine learning framework Different teams have different technology preferences Providing a consistent machine learning platform across different machine learning frameworks is no easy task
  • 33. Some Ideas to Consider Experimentation Framework Intermediate Representation Production Framework  Optimize for productivity, not consistency  Enable enough flexibility to leverage different frameworks for experimentation and production  ONNX is a great solution for intermediate representations
  • 34. Lesson #6: Too much time going from notebooks to production programs
  • 35. Challenges Notebooks are ideal for model experimentation and testing Notebooks typically have performance challenges when executed at scale Scaling Notebook environments can be challenging Parametrizing Notebook executions is far from trivial
  • 36. Some Ideas To Consider • Jupyter, Zeppelin Model Experimentation • Papermill • Netflix’s Meson Scheduling Notebooks • Docker Containers • Kubernetes Running Complex Workflows  Enable an infrastructure to operationalize data science notebooks  Use containers for the most complex machine learning workflows
  • 37. Lesson #7: Model selection can be a machine learning problem
  • 38. Challenges Data scientists make very subjective decisions when comes to model selection The same problem can be solved using different machine learning models Very often is almost impossible to differentiate between similar models
  • 39. Some Ideas To Consider  Represent machine learning requirements as a dataset with an objective attribute  Leverage AutoML-based techniques for model selection Problem Dataset AutoML Proposed Models
  • 42. Challenges The No Free Lunch Theorem Trained models can perform poorly against new datasets New engineers and DevOps need to understand how to re-train existing models
  • 43. Some Ideas to Consider DataLake Data Outcomes/Feature Store Training Job1 Training Job2 Training JobN  Automate Training Jobs  Orchestrate scheduled execution of training jobs
  • 45. Challenges Training machine learning models can be computationally expensive Most machine learning models need to be retrained entirely based on the arrival of new data Its nearly impossible to quantify the impact that new datasets have in the performance of a model
  • 46. Some Ideas to Consider  Implement continual learning models  Consider transfer learning as a fundamental enabler
  • 47. Lesson #10: Training a model requires as much coding as creating it…
  • 48. Challenges Data engineers spend a lot of time writing training routines for machine learning models Comparing the performance of different models on the same datasets remains tricky Changes on a training dataset often imply changes on the training code
  • 49. Some Ideas to Consider  Explore a configuration- driven training process  Uber’s Ludwig is an innovative, no-code framework for training machine learning models
  • 51. Lesson #11: Different models require different execution patterns…
  • 52. Challenges Not all models can be executed via APIs Some models take a long time to run In some scenarios, different models need to be executed at the same time based on a specific condition
  • 53. Some Ideas to Consider Scheduled Activation Model Model Pub-Sub Activation Model Model On-Demand Activation Model Model Model API Gateway Event Gateway Enable different execution modes based on client’s requirements
  • 54. Lesson #12: Mobile deep learning is more complicated than you think
  • 55. Challenges Centralized cloud deep learning models don’t scale On-device deep learning models are hard to distribute and train Tons of privacy challenges
  • 56. Some Ideas to Consider  Consider using federated learning or similar patterns for mobile based machine learning
  • 59. Challenges The accuracy-interpretability friction The unpredictability factor Limited toolset
  • 60. Some Ideas to Consider •Use tools like TensorBoard to visualize the structure of neural networks Visualize the Network and its Results •High training error is a sign of underfitting •High test error and low training error is a sign of overfitting Compare Training and Test Errors •Helps to determine whether the error is in the code or in the data Test with Small Datasets •Monitor the number of activations in hidden units Monitor Activations and Gradient Values Understanding How Nodes are Activated Understanding what Hidden Layers Do Understanding How Concepts are Formed Interpretability  Establish systematic practices to debug machine learning models  Onboard modeling visualization and interpretability tools
  • 63. Challenges Most neural networks are vulnerable to adversarial attacks Attackers don’t need access to the models but can simply manipulate input datasets Most of the times adversarial attacks go undetected
  • 64. Some Ideas to Consider  Test your neural networks for adversarial robustness  IBM’s adversarial robustness toolbox is one of the leading stacks in neural network security
  • 65. Lesson # 15: Data privacy is the elephant in the machine learning room
  • 66. Challenges Machine learning models intrinsically build knowledge about private datasets Most machine learning techniques require clear access to data which, in many cases, contains sensitive information There are no established techniques to evaluating the privacy robustness of machine learning models
  • 67. Some Ideas to Consider  Private machine learning is an emerging area of research  Leverage techniques such as secured multi-party computations or zero- knowledge-proofs to obfuscate training datasets  PySyft is an emerging framework to enable privacy in machine learning models
  • 69. DAWN Project from Stanford University Michelangelo from Uber MLFlow from DataBricks FBLearner from Facebook TFX from Google
  • 70. The challenges go beyond the obvious…
  • 71. Three Foundational Challenges for the Mainstream Adoption of Machine Learning Lowering the Technological Entry Point • Can mainstream developers embrace machine learning stacks? Talent Availability • Can companies and governments nurture local data science talent? Data Democratization • Can rich datasets stop being a privilege of large corporations and governments ?
  • 72. Some Initiatives to Consider Lowering the Technological Entry Point • AutoML, low-code machine learning frameworks Talent Availability • Google AI Academy, Coursera, Udacity… Data Democratization • Decentralized AI platforms
  • 73. Summary • Implementing machine learning solutions in the real world remains incredibly challenging • There is a large gap between the advancements in AI research and the practical viability of those techniques • Machine learning applications require a new lifecycle different from traditional software models • Each aspect of that lifecycle brings a unique set of challenges • Start small, iterate…