SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Automating Machine Learning
Features and Workflows
jao@bigml.com
PAPIs Connect Valencia, 2016
Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
Machine Learning as a System Service
The goal
Machine Learning as a system
level service
The means
APIs: ML building blocks
Abstraction layer over
feature engineering
Abstraction layer over
algorithms
Automation
Machine Learning Workflows
Dr. Natalia Konstantinova (http://nkonst.com/machine-learning-explained-simple-words/)
Machine Learning Workflows for real
Jeannine Takaki, Microsoft Azure Team
Machine Learning Automation Today
from bigml.api import BigML
api = BigML()
project = api.create_project({’name’: ’ToyBoost’})
orig_source =
api.create_source(source,
{"name": "ToyBoost",
"project": project[’resource’]})
api.ok(orig_source)
orig_dataset =
api.create_dataset(orig_source, {"name": "Boost"})
api.ok(orig_dataset)
trainset = api.get_dataset(trainset)
for loop in range(0,10):
api.ok(trainset)
model = api.create_model(trainset, {
"name": "ToyBoost - Model%d" % loop,
"objective_fields": ["letter"],
"excluded_fields": ["weight"],
"weight_field": "100011"})
api.ok(model)
batchp =
api.create_batch_prediction(model, trainset, {
"name": "ToyBoost - Result%d" % loop,
"all_fields": True,
"header": True})
api.ok(batchp)
batchp = api.get_batch_prediction(batchp)
batchp_dataset =
api.get_dataset(batchp[’object’])
Machine Learning Automation Today
Problems of current solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows hard to optimize
Machine Learning Automation Today
Problems of current solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows hard to optimize
Not enough abstraction
Machine Learning Automation Tomorrow
Solution: Domain-specific languages
Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
Domain-specific Expressions (sexps)
(if (missing? "height")
(random-value "height")
(field "height"))
(window "income" 10)
(within-percentiles? "age" 0.5 0.95)
(cond (> (field "score") (mean "score")) "above average"
(= (field "score") (mean "score")) "below average"
"mediocre")
Domain-specific Expressions (JSON)
["if", ["missing?", "height"],
["random-value", "height"],
["field", "height"]]
["window", "income", 10]
["within-percentiles?", "age", 0.5, 0.95]
["cond", [">", ["field", "score"], ["mean", "score"]], "above av
["=", ["field", "score"], ["mean", "score"]], "below av
"mediocre"]
Domain-specific Expressions (sexps)
(if (missing? "height")
(random-value "height")
(field "height"))
(window "income" 10)
(within-percentiles? "age" 0.5 0.95)
(cond (> (field "score") (mean "score")) "above average"
(= (field "score") (mean "score")) "below average"
"mediocre")
Abstraction via the Language
;; (if (missing? "height")
;; (random-value "height")
;; (field "height"))
(ensure-value "height")
(window "income" 10)
(within-percentiles? "age" 0.5 0.95)
;; (cond (> (field "score") (mean "score")) "above average"
;; (= (field "score") (mean "score")) "below average"
;; "mediocre")
(discretize "score" "above above" "below average" "mediocre")
Abstraction via the User Interface
Remote for efficiency and reuse, local for discoverability
Flatline: A DSL for Feature Enginering
Domain-specific: new fields from an input sliding window as
declarative expressions
Simple syntax: JSON → s-expressions
Efficient: full server-side implementation
Discoverable: in-browser client-side implementation
Reusable: the same expressions usable from any language
binding.
Bonus: applicable to filtering
Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
Machine Learning Workflows
A DSL for Machine Learning
Workflows?
Machine Learning Workflows
A DSL for Machine Learning
Workflows? Absolutely!
Machine Learning Workflows
Same problems, only worse. . .
Complexity Hairy logic and control-flow
Reuse More complex algorithms and behaviour very hard to
port to other languages
Scalability Lots of iterations and intermediate resources very
hard to make efficient on the client side
Machine Learning Workflows
WhizzML, same solution, only better. . .
WhizzML: A sexp-based, domain-specific language
(define apple
"https://s3.amazonaws.com/bigml-public/csv/nasdaq_aapl.csv")
(define source (create-and-wait-source {"remote" apple
"name" "whizz"}))
(define dataset (create-and-wait-dataset {"source" source}))
(define anomaly (create-and-wait-anomaly {"dataset" dataset}))
(define input {"Open" 275 "High" 300 "Low" 250})
(define score
(create-and-wait-anomalyscore {"anomaly" anomaly
"input_data" input}))
(get (fetch score) "score")
WhizzML vs Flatline (as languages)
A better language:
Better data structures (dictionaries, sets. . . )
Better control-flow: (tail) recursion, iteration, loops
Better abstraction: procedures
WhizzML: Lambda Abstraction
Abstraction
(define (score-stock name input)
(let (base "https://s3.amazonaws.com/bigml-public/csv"
stock (str base "/" name)
source (create-and-wait-source {"remote" stock})
dataset (create-and-wait-dataset {"source" source})
anomaly (create-and-wait-anomaly {"dataset" dataset}))
(create-and-wait-anomalyscore {"anomaly" anomaly
"input_data" input})))
WhizzML: Reusable Procedures
Abstraction
(score-stock "aapl" {"Open" 275 "High" 300 "Low" 250})
WhizzML: Server-side fortes
A better server-side:
Better reusability: scripts, executions and libraries as
first-class ML resources
Higher efficiency gains: automatic parallelism
More opportunities for UI extensions
WhizzML Source Code as a Machine Learning Resource
{"library":{
"imports":["12343addb343f2890f23492d"],
"source_code": "(define (mu2) (mu (g 3 8)))",
"exports": [{"name": "mu2", "signature": []}]}}
{"script":{
"parameters": [{"name": "remote_uri", "type": "string"},
{"name": "timeout", "type": "number",
"default": 10000}],
"source_code":
"(define id (create-source {"remote" remote_uri}))
(wait id timeout)",
"outputs": [{"name": "id", "type": "source-id"}]}}
Rich metadata, reuse and shareability of WhizzML code
Executions as a Machine Learning Resource
{"execution": {"script_id": "1a2232bf3498f95dde",
"username": "bittwidler",
"tlp": 4,
"resource_limits": {"total": 50,
"source": 10,
"dataset": 5,
"model": 10},
"max_exection_time": 3600,
"max_execution_steps": 10000,
"max_recursion_depth": 1024}}
Executions as a Machine Learning Resource
{"execution": {"script_id": "1a2232bf3498f95dde",
"username": "bittwidler",
"tlp": 4,
"resource_limits": {"total": 50,
"source": 10,
"dataset": 5,
"model": 10},
"max_exection_time": 3600,
"max_execution_steps": 10000,
"max_recursion_depth": 1024}}
WhizzML: Client-side fortes
A better client-side:
Better interactive experience: read-eval-print loop
Scripts usable from the user’s machine
Interoperability: Java, JavaScript and NodeJS REPLs
Challenge: behaviourial coherence between server and client
sides
Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
Challenges
Solved
Local REPL and remote shared implementation
Automatic parallelization
Error reporting
Traceability: stack traces and stepwise execution
Open
Better error management (dynamic typing, type inferencer)
Resumable workflows
Data locality: optimizing repeated access to the same datasets

Weitere ähnliche Inhalte

Ähnlich wie Automating Machine Learning Workflows

Introduction to AWS Step Functions
Introduction to AWS Step FunctionsIntroduction to AWS Step Functions
Introduction to AWS Step FunctionsAmazon Web Services
 
Machine Learning with Microsoft Azure
Machine Learning with Microsoft AzureMachine Learning with Microsoft Azure
Machine Learning with Microsoft AzureDmitry Petukhov
 
SMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step FunctionsSMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step FunctionsAmazon Web Services
 
ARI Based dialers : A Political Campaign Case
ARI Based dialers : A Political Campaign CaseARI Based dialers : A Political Campaign Case
ARI Based dialers : A Political Campaign Casegdraque
 
NEW LAUNCH! Serverless Apps with AWS Step Functions
NEW LAUNCH! Serverless Apps with AWS Step FunctionsNEW LAUNCH! Serverless Apps with AWS Step Functions
NEW LAUNCH! Serverless Apps with AWS Step FunctionsAmazon Web Services
 
AWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as CodeAWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as CodeAmazon Web Services
 
mobl presentation @ IHomer
mobl presentation @ IHomermobl presentation @ IHomer
mobl presentation @ IHomerzefhemel
 
Delightful steps to becoming a functioning user of Step Functions
Delightful steps to becoming a functioning user of Step FunctionsDelightful steps to becoming a functioning user of Step Functions
Delightful steps to becoming a functioning user of Step FunctionsYan Cui
 
Federico Feroldi - Scala microservices
Federico Feroldi - Scala microservicesFederico Feroldi - Scala microservices
Federico Feroldi - Scala microservicesScala Italy
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Julien SIMON
 
mobl - model-driven engineering lecture
mobl - model-driven engineering lecturemobl - model-driven engineering lecture
mobl - model-driven engineering lecturezefhemel
 
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Codemotion
 
Masterclass Webinar - AWS CloudFormation
Masterclass Webinar - AWS CloudFormationMasterclass Webinar - AWS CloudFormation
Masterclass Webinar - AWS CloudFormationAmazon Web Services
 
ELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardGeorg Sorst
 
Spark for Reactive Machine Learning: Building Intelligent Agents at Scale
Spark for Reactive Machine Learning: Building Intelligent Agents at ScaleSpark for Reactive Machine Learning: Building Intelligent Agents at Scale
Spark for Reactive Machine Learning: Building Intelligent Agents at ScaleJeff Smith
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEDataWorks Summit/Hadoop Summit
 
Declarative UIs with Jetpack Compose
Declarative UIs with Jetpack ComposeDeclarative UIs with Jetpack Compose
Declarative UIs with Jetpack ComposeRamon Ribeiro Rabello
 

Ähnlich wie Automating Machine Learning Workflows (20)

Introduction to AWS Step Functions
Introduction to AWS Step FunctionsIntroduction to AWS Step Functions
Introduction to AWS Step Functions
 
Machine Learning with Microsoft Azure
Machine Learning with Microsoft AzureMachine Learning with Microsoft Azure
Machine Learning with Microsoft Azure
 
SMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step FunctionsSMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step Functions
 
ARI Based dialers : A Political Campaign Case
ARI Based dialers : A Political Campaign CaseARI Based dialers : A Political Campaign Case
ARI Based dialers : A Political Campaign Case
 
NEW LAUNCH! Serverless Apps with AWS Step Functions
NEW LAUNCH! Serverless Apps with AWS Step FunctionsNEW LAUNCH! Serverless Apps with AWS Step Functions
NEW LAUNCH! Serverless Apps with AWS Step Functions
 
AWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as CodeAWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as Code
 
mobl presentation @ IHomer
mobl presentation @ IHomermobl presentation @ IHomer
mobl presentation @ IHomer
 
Serverless Apps with AWS Step Functions
Serverless Apps with AWS Step FunctionsServerless Apps with AWS Step Functions
Serverless Apps with AWS Step Functions
 
Delightful steps to becoming a functioning user of Step Functions
Delightful steps to becoming a functioning user of Step FunctionsDelightful steps to becoming a functioning user of Step Functions
Delightful steps to becoming a functioning user of Step Functions
 
Federico Feroldi - Scala microservices
Federico Feroldi - Scala microservicesFederico Feroldi - Scala microservices
Federico Feroldi - Scala microservices
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)
 
Sailing with Java 8 Streams
Sailing with Java 8 StreamsSailing with Java 8 Streams
Sailing with Java 8 Streams
 
mobl - model-driven engineering lecture
mobl - model-driven engineering lecturemobl - model-driven engineering lecture
mobl - model-driven engineering lecture
 
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
 
Masterclass Webinar - AWS CloudFormation
Masterclass Webinar - AWS CloudFormationMasterclass Webinar - AWS CloudFormation
Masterclass Webinar - AWS CloudFormation
 
ELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboard
 
Spark for Reactive Machine Learning: Building Intelligent Agents at Scale
Spark for Reactive Machine Learning: Building Intelligent Agents at ScaleSpark for Reactive Machine Learning: Building Intelligent Agents at Scale
Spark for Reactive Machine Learning: Building Intelligent Agents at Scale
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
 
mobl
moblmobl
mobl
 
Declarative UIs with Jetpack Compose
Declarative UIs with Jetpack ComposeDeclarative UIs with Jetpack Compose
Declarative UIs with Jetpack Compose
 

Mehr von PAPIs.io

Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...PAPIs.io
 
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017PAPIs.io
 
Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...PAPIs.io
 
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...PAPIs.io
 
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...PAPIs.io
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...PAPIs.io
 
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...PAPIs.io
 
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...PAPIs.io
 
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016PAPIs.io
 
Real-world applications of AI - Daniel Hulme @ PAPIs Connect
Real-world applications of AI - Daniel Hulme @ PAPIs ConnectReal-world applications of AI - Daniel Hulme @ PAPIs Connect
Real-world applications of AI - Daniel Hulme @ PAPIs ConnectPAPIs.io
 
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...PAPIs.io
 
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...PAPIs.io
 
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs ConnectDemystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs ConnectPAPIs.io
 
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs ConnectPredictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs ConnectPAPIs.io
 
Microdecision making in financial services - Greg Lamp @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs ConnectMicrodecision making in financial services - Greg Lamp @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs ConnectPAPIs.io
 
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...PAPIs.io
 
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...PAPIs.io
 
How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
How to predict the future of shopping - Ulrich Kerzel @ PAPIs ConnectHow to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
How to predict the future of shopping - Ulrich Kerzel @ PAPIs ConnectPAPIs.io
 
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...PAPIs.io
 

Mehr von PAPIs.io (20)

Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
 
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
 
Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...
 
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
 
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
 
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
 
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
 
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
 
Real-world applications of AI - Daniel Hulme @ PAPIs Connect
Real-world applications of AI - Daniel Hulme @ PAPIs ConnectReal-world applications of AI - Daniel Hulme @ PAPIs Connect
Real-world applications of AI - Daniel Hulme @ PAPIs Connect
 
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
 
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
 
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs ConnectDemystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
 
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs ConnectPredictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
 
Microdecision making in financial services - Greg Lamp @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs ConnectMicrodecision making in financial services - Greg Lamp @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs Connect
 
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
 
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
 
How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
How to predict the future of shopping - Ulrich Kerzel @ PAPIs ConnectHow to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
 
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
 

Kürzlich hochgeladen

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Kürzlich hochgeladen (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Automating Machine Learning Workflows

  • 1. Automating Machine Learning Features and Workflows jao@bigml.com PAPIs Connect Valencia, 2016
  • 2. Outline Introduction: ML as a System Service Feature Engineering Automation Workflow Automation Challenges and Outlook
  • 3. Outline Introduction: ML as a System Service Feature Engineering Automation Workflow Automation Challenges and Outlook
  • 4. Machine Learning as a System Service The goal Machine Learning as a system level service The means APIs: ML building blocks Abstraction layer over feature engineering Abstraction layer over algorithms Automation
  • 5. Machine Learning Workflows Dr. Natalia Konstantinova (http://nkonst.com/machine-learning-explained-simple-words/)
  • 6. Machine Learning Workflows for real Jeannine Takaki, Microsoft Azure Team
  • 7. Machine Learning Automation Today from bigml.api import BigML api = BigML() project = api.create_project({’name’: ’ToyBoost’}) orig_source = api.create_source(source, {"name": "ToyBoost", "project": project[’resource’]}) api.ok(orig_source) orig_dataset = api.create_dataset(orig_source, {"name": "Boost"}) api.ok(orig_dataset) trainset = api.get_dataset(trainset) for loop in range(0,10): api.ok(trainset) model = api.create_model(trainset, { "name": "ToyBoost - Model%d" % loop, "objective_fields": ["letter"], "excluded_fields": ["weight"], "weight_field": "100011"}) api.ok(model) batchp = api.create_batch_prediction(model, trainset, { "name": "ToyBoost - Result%d" % loop, "all_fields": True, "header": True}) api.ok(batchp) batchp = api.get_batch_prediction(batchp) batchp_dataset = api.get_dataset(batchp[’object’])
  • 8. Machine Learning Automation Today Problems of current solutions Complexity Lots of details outside the problem domain Reuse No inter-language compatibility Scalability Client-side workflows hard to optimize
  • 9. Machine Learning Automation Today Problems of current solutions Complexity Lots of details outside the problem domain Reuse No inter-language compatibility Scalability Client-side workflows hard to optimize Not enough abstraction
  • 10. Machine Learning Automation Tomorrow Solution: Domain-specific languages
  • 11. Outline Introduction: ML as a System Service Feature Engineering Automation Workflow Automation Challenges and Outlook
  • 12. Domain-specific Expressions (sexps) (if (missing? "height") (random-value "height") (field "height")) (window "income" 10) (within-percentiles? "age" 0.5 0.95) (cond (> (field "score") (mean "score")) "above average" (= (field "score") (mean "score")) "below average" "mediocre")
  • 13. Domain-specific Expressions (JSON) ["if", ["missing?", "height"], ["random-value", "height"], ["field", "height"]] ["window", "income", 10] ["within-percentiles?", "age", 0.5, 0.95] ["cond", [">", ["field", "score"], ["mean", "score"]], "above av ["=", ["field", "score"], ["mean", "score"]], "below av "mediocre"]
  • 14. Domain-specific Expressions (sexps) (if (missing? "height") (random-value "height") (field "height")) (window "income" 10) (within-percentiles? "age" 0.5 0.95) (cond (> (field "score") (mean "score")) "above average" (= (field "score") (mean "score")) "below average" "mediocre")
  • 15. Abstraction via the Language ;; (if (missing? "height") ;; (random-value "height") ;; (field "height")) (ensure-value "height") (window "income" 10) (within-percentiles? "age" 0.5 0.95) ;; (cond (> (field "score") (mean "score")) "above average" ;; (= (field "score") (mean "score")) "below average" ;; "mediocre") (discretize "score" "above above" "below average" "mediocre")
  • 16. Abstraction via the User Interface
  • 17. Remote for efficiency and reuse, local for discoverability
  • 18. Flatline: A DSL for Feature Enginering Domain-specific: new fields from an input sliding window as declarative expressions Simple syntax: JSON → s-expressions Efficient: full server-side implementation Discoverable: in-browser client-side implementation Reusable: the same expressions usable from any language binding. Bonus: applicable to filtering
  • 19. Outline Introduction: ML as a System Service Feature Engineering Automation Workflow Automation Challenges and Outlook
  • 20. Machine Learning Workflows A DSL for Machine Learning Workflows?
  • 21. Machine Learning Workflows A DSL for Machine Learning Workflows? Absolutely!
  • 22. Machine Learning Workflows Same problems, only worse. . . Complexity Hairy logic and control-flow Reuse More complex algorithms and behaviour very hard to port to other languages Scalability Lots of iterations and intermediate resources very hard to make efficient on the client side
  • 23. Machine Learning Workflows WhizzML, same solution, only better. . .
  • 24. WhizzML: A sexp-based, domain-specific language (define apple "https://s3.amazonaws.com/bigml-public/csv/nasdaq_aapl.csv") (define source (create-and-wait-source {"remote" apple "name" "whizz"})) (define dataset (create-and-wait-dataset {"source" source})) (define anomaly (create-and-wait-anomaly {"dataset" dataset})) (define input {"Open" 275 "High" 300 "Low" 250}) (define score (create-and-wait-anomalyscore {"anomaly" anomaly "input_data" input})) (get (fetch score) "score")
  • 25. WhizzML vs Flatline (as languages) A better language: Better data structures (dictionaries, sets. . . ) Better control-flow: (tail) recursion, iteration, loops Better abstraction: procedures
  • 26. WhizzML: Lambda Abstraction Abstraction (define (score-stock name input) (let (base "https://s3.amazonaws.com/bigml-public/csv" stock (str base "/" name) source (create-and-wait-source {"remote" stock}) dataset (create-and-wait-dataset {"source" source}) anomaly (create-and-wait-anomaly {"dataset" dataset})) (create-and-wait-anomalyscore {"anomaly" anomaly "input_data" input})))
  • 27. WhizzML: Reusable Procedures Abstraction (score-stock "aapl" {"Open" 275 "High" 300 "Low" 250})
  • 28. WhizzML: Server-side fortes A better server-side: Better reusability: scripts, executions and libraries as first-class ML resources Higher efficiency gains: automatic parallelism More opportunities for UI extensions
  • 29. WhizzML Source Code as a Machine Learning Resource {"library":{ "imports":["12343addb343f2890f23492d"], "source_code": "(define (mu2) (mu (g 3 8)))", "exports": [{"name": "mu2", "signature": []}]}} {"script":{ "parameters": [{"name": "remote_uri", "type": "string"}, {"name": "timeout", "type": "number", "default": 10000}], "source_code": "(define id (create-source {"remote" remote_uri})) (wait id timeout)", "outputs": [{"name": "id", "type": "source-id"}]}} Rich metadata, reuse and shareability of WhizzML code
  • 30. Executions as a Machine Learning Resource {"execution": {"script_id": "1a2232bf3498f95dde", "username": "bittwidler", "tlp": 4, "resource_limits": {"total": 50, "source": 10, "dataset": 5, "model": 10}, "max_exection_time": 3600, "max_execution_steps": 10000, "max_recursion_depth": 1024}}
  • 31. Executions as a Machine Learning Resource {"execution": {"script_id": "1a2232bf3498f95dde", "username": "bittwidler", "tlp": 4, "resource_limits": {"total": 50, "source": 10, "dataset": 5, "model": 10}, "max_exection_time": 3600, "max_execution_steps": 10000, "max_recursion_depth": 1024}}
  • 32. WhizzML: Client-side fortes A better client-side: Better interactive experience: read-eval-print loop Scripts usable from the user’s machine Interoperability: Java, JavaScript and NodeJS REPLs Challenge: behaviourial coherence between server and client sides
  • 33. Outline Introduction: ML as a System Service Feature Engineering Automation Workflow Automation Challenges and Outlook
  • 34. Challenges Solved Local REPL and remote shared implementation Automatic parallelization Error reporting Traceability: stack traces and stepwise execution Open Better error management (dynamic typing, type inferencer) Resumable workflows Data locality: optimizing repeated access to the same datasets