The document discusses automating machine learning workflows and feature engineering. It proposes domain-specific languages like Flatline and WhizzML to provide abstraction and automation. Flatline allows declarative feature engineering expressions. WhizzML provides abstraction, reusable procedures, and handles workflows and algorithms more efficiently by executing remotely on servers. Challenges remain around error management, resumable workflows, and optimizing data access across distributed executions.
2. Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
3. Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
4. Machine Learning as a System Service
The goal
Machine Learning as a system
level service
The means
APIs: ML building blocks
Abstraction layer over
feature engineering
Abstraction layer over
algorithms
Automation
8. Machine Learning Automation Today
Problems of current solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows hard to optimize
9. Machine Learning Automation Today
Problems of current solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows hard to optimize
Not enough abstraction
18. Flatline: A DSL for Feature Enginering
Domain-specific: new fields from an input sliding window as
declarative expressions
Simple syntax: JSON → s-expressions
Efficient: full server-side implementation
Discoverable: in-browser client-side implementation
Reusable: the same expressions usable from any language
binding.
Bonus: applicable to filtering
19. Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
22. Machine Learning Workflows
Same problems, only worse. . .
Complexity Hairy logic and control-flow
Reuse More complex algorithms and behaviour very hard to
port to other languages
Scalability Lots of iterations and intermediate resources very
hard to make efficient on the client side
28. WhizzML: Server-side fortes
A better server-side:
Better reusability: scripts, executions and libraries as
first-class ML resources
Higher efficiency gains: automatic parallelism
More opportunities for UI extensions
29. WhizzML Source Code as a Machine Learning Resource
{"library":{
"imports":["12343addb343f2890f23492d"],
"source_code": "(define (mu2) (mu (g 3 8)))",
"exports": [{"name": "mu2", "signature": []}]}}
{"script":{
"parameters": [{"name": "remote_uri", "type": "string"},
{"name": "timeout", "type": "number",
"default": 10000}],
"source_code":
"(define id (create-source {"remote" remote_uri}))
(wait id timeout)",
"outputs": [{"name": "id", "type": "source-id"}]}}
Rich metadata, reuse and shareability of WhizzML code
32. WhizzML: Client-side fortes
A better client-side:
Better interactive experience: read-eval-print loop
Scripts usable from the user’s machine
Interoperability: Java, JavaScript and NodeJS REPLs
Challenge: behaviourial coherence between server and client
sides
33. Outline
Introduction: ML as a System Service
Feature Engineering Automation
Workflow Automation
Challenges and Outlook
34. Challenges
Solved
Local REPL and remote shared implementation
Automatic parallelization
Error reporting
Traceability: stack traces and stepwise execution
Open
Better error management (dynamic typing, type inferencer)
Resumable workflows
Data locality: optimizing repeated access to the same datasets