SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
The Challenges of
Bringing Machine
Learning to the
Masses
Alice Zheng and Sethu Raman
GraphLab Inc.
NIPS workshop on Software Engineering for Machine Learning
December 13, 2014
Self introduction
ML Research
“Accessible ML”
The need for accessible ML
• So much potential in ML
• Everyone trying to make sense of their data
• ML is transforming lives and industries:
personalized medicine, internet search, social
networks, advertising, etc.
• But success is unattainable to most
Building a predictive app
Was using 217 business rules
hoping world doesn’t change
Have an inspiring idea to
reinvent their business
Key pains:
Hiring Talent
Shortfall in data-savvy workers
needed to make sense out of
big data by 2018 [McKinsey 2011]
35%
Noisy Space of Tools
Data scientists use a variety of tools, across
different programming languages…
require a lot of context-switching…
affects productivity and impedes reproducibility.
Ben Lorica,
Data Analysis: Just one component of
the Data Science workflow
Building a predictive app
Feature
engineering
Model
definition
Training
evaluation
Data
DeploymentMonitoring
Pure ML is not enough
• Building a predictive application involves much
more than just building ML models
• System engineering: data storage, computation
infrastructure, networking…
• Data Science: problem definition, data cleaning,
feature engineering
• Software development: turn prototype model into
bullet-proof production code
• Operations engineering: deploy and monitor app
• …
Pain points
• What are the right features?
• What model should I use?
• How do I train it?
• How do I set the tuning parameters?
• Do I even have the right data?
• Ok, I have a working prototype, now what?
Pain points
• Increase in data size or decrease in
latency requires complete rewrite of code
and new toolset
• GB – R/scikit-learn/Matlab
• TB-PB—Hadoop/Mahout/Spark
• Many forms of data and data structures
• Images, text, speech, logs
• Dense lists, sparse dictionaries, time series
• Tables, graphs, matrices, tensors
The need for an ML platform
• Minimize tool/code switching, maximize
performance (speed/accuracy/scale)
• Graceful transition from small to large
dataset sizes
• Flexible, interoperable data types
• Minimize complexity
• System-agnostic
• Simple API
• Auto-tune parameters
The parallel to databases
• What’s an example of a mega-successful
platform for data operations?
• Databases!
• SQL, Oracle, NoSQL, …
• What lessons can we bring in from the
database world?
Database engine components
Storage
engine
Query
execution
Query
optimizer
Storage
Database engine components
Storage
engine
Query
execution
Query
optimizer
Storage
Complex but self-contained, has clean API,
only changes when there’s new hardware.
Database engine components
Storage
engine
Query
execution
Query
optimizer
Storage
Complex bag of tricks, no formalism,
constantly changing to adapt to
data, query, disk characteristics.
ML engine components
Feature
engineering
Model
definition
Training
evaluation
Data
Bags of tricks,
expert knowledge,
experience,
lots of trial and error
Advances in databases
• Reasonable abstraction—relational DB
• Hardware speedups
• Pragmatic software implementation
Successful platform
• Take-away lesson: fast computation
engine + “good enough” execution plan
To advance ML platforms
• ML will be end-user friendly when the
platform is clever enough to handle less-
than-optimal directions from the user
• What needs to happen?
• The complexity needs to be automated and
wrapped away with neat interfaces between
components
• Fast components, “good enough” directions
GraphLab
• Started as a research project at CMU in
2009
• Now a Seattle-based startup
The GraphLab CreateTM Solution
• Flexible, interoperable data types
• SArray+SFrame+SGraph inter-translatable
• dense list, sparse array, image, text, tables, graphs
• Graceful transition between data sizes
• SFrame: memory to disk to distributed
• One environment, many substrates
• Python front-end
• Localhost, cluster, Hadoop, EC2
• End-to-end
• Data ingestion+feature engineering+model building+
deployment in a single environment
GraphLab Create ML Toolkits
Machine Learning Task
Business
Task
Algorithms & SDK
Recommender, Target, Social
Match, …
Regression, Classification,
Data Matching,…
SVM, Matrix
Factorization, LDA, …
Developers
Savvy Dev
& Data Sci.
ML
experts
Demos
GLC SDK example
• Task: fill in missing value in an array using
previous value
• Existing solution:
• E.g., use Pandas—Python library providing in-
memory dataframes
• Problem:
• Given, say, 25M rows and 50 cols, takes
forever to even load the data
GLC SDK solution
> cat fill.cpp
#include <flexible_type/flexible_type.hpp>
#include <unity/lib/toolkit_function_macros.hpp>
#include <unity/lib/gl_sarray.hpp>
using namespace graphlab;
gl_sarray fill(gl_sarray sa) {
gl_sarray_writer writer(sa.dtype(), 1);
flexible_type last_value = sa[0];
for (const auto &elem: sa.range_iterator()) {
if (elem != FLEX_UNDEFINED)
last_value = elem;
writer.write(last_value, 0);
}
return writer.close();
}
BEGIN_FUNCTION_REGISTRATION
REGISTER_FUNCTION(fill, "sa");
END_FUNCTION_REGISTRATION
GLC SDK solution
> cat Makefile
all: fill.so
fill.so: fill.cpp
g++ -std=c++11 $^ -l graphlab –l ~/graphlab-dev/deps/shared-fPIC
–o $@ -O3
> python
>>> import graphlab as gl
>>> gl.ext_import(‘fill.so’, ‘example’)
>>> sa = gl.Sarray([1, 2, 3, None, 6])
>>> print gl.extensions.example.fill.fill(sa)
[1, 2, 3, 3, 6]
Join the revolution!
• Research methods to make the following
efficient and automatic:
• Feature engineering
• Model selection
• Model debugging
• Problem formulation (??)
• Develop novel algorithms on top of our SDK
• Backed by scalable, flexible typed data structures
• Automatic Python wrappers
• Make them available to many other peple
• We’re hiring! jobs@graphlab.com

Weitere ähnliche Inhalte

Was ist angesagt?

A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningHaptik
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureRouyun Pan
 
History of Artificial Intelligence.pptx
History of Artificial Intelligence.pptxHistory of Artificial Intelligence.pptx
History of Artificial Intelligence.pptxBenjamin Requiero
 
What is Artificial Intelligence?
What is Artificial Intelligence?What is Artificial Intelligence?
What is Artificial Intelligence?Erdogan Dagdelenli
 
Simplified Introduction to AI
Simplified Introduction to AISimplified Introduction to AI
Simplified Introduction to AIDeepu S Nath
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligenceIjajAhmedJaman
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencefalepiz
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 
Intro to Machine Learning & AI
Intro to Machine Learning & AIIntro to Machine Learning & AI
Intro to Machine Learning & AIMostafa Elsheikh
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature EngineeringAlice Zheng
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligencesnehal_152
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligenceUmesh Meher
 
Artificial intelligence - An Overview
Artificial intelligence - An OverviewArtificial intelligence - An Overview
Artificial intelligence - An OverviewGiri Dharan
 
ARTIFICIAL INTELLIGENCE
ARTIFICIAL INTELLIGENCEARTIFICIAL INTELLIGENCE
ARTIFICIAL INTELLIGENCEOmkar Shinde
 

Was ist angesagt? (20)

A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
History of Artificial Intelligence.pptx
History of Artificial Intelligence.pptxHistory of Artificial Intelligence.pptx
History of Artificial Intelligence.pptx
 
What is Artificial Intelligence?
What is Artificial Intelligence?What is Artificial Intelligence?
What is Artificial Intelligence?
 
Simplified Introduction to AI
Simplified Introduction to AISimplified Introduction to AI
Simplified Introduction to AI
 
Turing test
Turing testTuring test
Turing test
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Intro to Machine Learning & AI
Intro to Machine Learning & AIIntro to Machine Learning & AI
Intro to Machine Learning & AI
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Artificial intelligence - An Overview
Artificial intelligence - An OverviewArtificial intelligence - An Overview
Artificial intelligence - An Overview
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
ARTIFICIAL INTELLIGENCE
ARTIFICIAL INTELLIGENCEARTIFICIAL INTELLIGENCE
ARTIFICIAL INTELLIGENCE
 

Andere mochten auch

Understanding Feature Space in Machine Learning
Understanding Feature Space in Machine LearningUnderstanding Feature Space in Machine Learning
Understanding Feature Space in Machine LearningAlice Zheng
 
What the Bleep is Big Data? A Holistic View of Data and Algorithms
What the Bleep is Big Data? A Holistic View of Data and AlgorithmsWhat the Bleep is Big Data? A Holistic View of Data and Algorithms
What the Bleep is Big Data? A Holistic View of Data and AlgorithmsAlice Zheng
 
Feature engineering for diverse data types
Feature engineering for diverse data typesFeature engineering for diverse data types
Feature engineering for diverse data typesAlice Zheng
 
Cassandra synergy
Cassandra synergyCassandra synergy
Cassandra synergyniallmilton
 
Introduction &amp; EHR Benefits Realization
Introduction &amp; EHR Benefits RealizationIntroduction &amp; EHR Benefits Realization
Introduction &amp; EHR Benefits RealizationDave Shiple
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Turi, Inc.
 
Enterprise mHealth Strategy
Enterprise mHealth StrategyEnterprise mHealth Strategy
Enterprise mHealth StrategyDave Shiple
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBigML, Inc
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
IT Strategic Planning - Methodology and Approach
IT Strategic Planning - Methodology and ApproachIT Strategic Planning - Methodology and Approach
IT Strategic Planning - Methodology and ApproachDave Shiple
 
Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing VideoTuri, Inc.
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsTuri, Inc.
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab CreateTuri, Inc.
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsTuri, Inc.
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 

Andere mochten auch (15)

Understanding Feature Space in Machine Learning
Understanding Feature Space in Machine LearningUnderstanding Feature Space in Machine Learning
Understanding Feature Space in Machine Learning
 
What the Bleep is Big Data? A Holistic View of Data and Algorithms
What the Bleep is Big Data? A Holistic View of Data and AlgorithmsWhat the Bleep is Big Data? A Holistic View of Data and Algorithms
What the Bleep is Big Data? A Holistic View of Data and Algorithms
 
Feature engineering for diverse data types
Feature engineering for diverse data typesFeature engineering for diverse data types
Feature engineering for diverse data types
 
Cassandra synergy
Cassandra synergyCassandra synergy
Cassandra synergy
 
Introduction &amp; EHR Benefits Realization
Introduction &amp; EHR Benefits RealizationIntroduction &amp; EHR Benefits Realization
Introduction &amp; EHR Benefits Realization
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
 
Enterprise mHealth Strategy
Enterprise mHealth StrategyEnterprise mHealth Strategy
Enterprise mHealth Strategy
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature Engineering
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
IT Strategic Planning - Methodology and Approach
IT Strategic Planning - Methodology and ApproachIT Strategic Planning - Methodology and Approach
IT Strategic Planning - Methodology and Approach
 
Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing Video
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 

Ähnlich wie The Challenges of Bringing Machine Learning to the Masses

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsHisham Arafat
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglotTugdual Grall
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha Talagala
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Spark Summit
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019VMware Tanzu
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAlberto Diaz Martin
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 

Ähnlich wie The Challenges of Bringing Machine Learning to the Masses (20)

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016Nisha talagala keynote_inflow_2016
Nisha talagala keynote_inflow_2016
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
 
DataOps with Project Amaterasu
DataOps with Project AmaterasuDataOps with Project Amaterasu
DataOps with Project Amaterasu
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 

Kürzlich hochgeladen

Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
Substation Automation SCADA and Gateway Solutions by BRH
Substation Automation SCADA and Gateway Solutions by BRHSubstation Automation SCADA and Gateway Solutions by BRH
Substation Automation SCADA and Gateway Solutions by BRHbirinder2
 
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...gerogepatton
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...IJAEMSJORNAL
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProRay Yuan Liu
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
AntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxAntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxLina Kadam
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organizationchnrketan
 
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...Amil baba
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 

Kürzlich hochgeladen (20)

Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
Substation Automation SCADA and Gateway Solutions by BRH
Substation Automation SCADA and Gateway Solutions by BRHSubstation Automation SCADA and Gateway Solutions by BRH
Substation Automation SCADA and Gateway Solutions by BRH
 
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
March 2024 - Top 10 Read Articles in Artificial Intelligence and Applications...
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision Pro
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
AntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxAntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptx
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organization
 
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
ASME-B31.4-2019-estandar para diseño de ductos
ASME-B31.4-2019-estandar para diseño de ductosASME-B31.4-2019-estandar para diseño de ductos
ASME-B31.4-2019-estandar para diseño de ductos
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 

The Challenges of Bringing Machine Learning to the Masses

  • 1. The Challenges of Bringing Machine Learning to the Masses Alice Zheng and Sethu Raman GraphLab Inc. NIPS workshop on Software Engineering for Machine Learning December 13, 2014
  • 3. The need for accessible ML • So much potential in ML • Everyone trying to make sense of their data • ML is transforming lives and industries: personalized medicine, internet search, social networks, advertising, etc. • But success is unattainable to most
  • 4. Building a predictive app Was using 217 business rules hoping world doesn’t change Have an inspiring idea to reinvent their business Key pains: Hiring Talent Shortfall in data-savvy workers needed to make sense out of big data by 2018 [McKinsey 2011] 35% Noisy Space of Tools Data scientists use a variety of tools, across different programming languages… require a lot of context-switching… affects productivity and impedes reproducibility. Ben Lorica, Data Analysis: Just one component of the Data Science workflow
  • 5. Building a predictive app Feature engineering Model definition Training evaluation Data DeploymentMonitoring
  • 6. Pure ML is not enough • Building a predictive application involves much more than just building ML models • System engineering: data storage, computation infrastructure, networking… • Data Science: problem definition, data cleaning, feature engineering • Software development: turn prototype model into bullet-proof production code • Operations engineering: deploy and monitor app • …
  • 7. Pain points • What are the right features? • What model should I use? • How do I train it? • How do I set the tuning parameters? • Do I even have the right data? • Ok, I have a working prototype, now what?
  • 8. Pain points • Increase in data size or decrease in latency requires complete rewrite of code and new toolset • GB – R/scikit-learn/Matlab • TB-PB—Hadoop/Mahout/Spark • Many forms of data and data structures • Images, text, speech, logs • Dense lists, sparse dictionaries, time series • Tables, graphs, matrices, tensors
  • 9. The need for an ML platform • Minimize tool/code switching, maximize performance (speed/accuracy/scale) • Graceful transition from small to large dataset sizes • Flexible, interoperable data types • Minimize complexity • System-agnostic • Simple API • Auto-tune parameters
  • 10. The parallel to databases • What’s an example of a mega-successful platform for data operations? • Databases! • SQL, Oracle, NoSQL, … • What lessons can we bring in from the database world?
  • 12. Database engine components Storage engine Query execution Query optimizer Storage Complex but self-contained, has clean API, only changes when there’s new hardware.
  • 13. Database engine components Storage engine Query execution Query optimizer Storage Complex bag of tricks, no formalism, constantly changing to adapt to data, query, disk characteristics.
  • 14. ML engine components Feature engineering Model definition Training evaluation Data Bags of tricks, expert knowledge, experience, lots of trial and error
  • 15. Advances in databases • Reasonable abstraction—relational DB • Hardware speedups • Pragmatic software implementation Successful platform • Take-away lesson: fast computation engine + “good enough” execution plan
  • 16. To advance ML platforms • ML will be end-user friendly when the platform is clever enough to handle less- than-optimal directions from the user • What needs to happen? • The complexity needs to be automated and wrapped away with neat interfaces between components • Fast components, “good enough” directions
  • 17. GraphLab • Started as a research project at CMU in 2009 • Now a Seattle-based startup
  • 18. The GraphLab CreateTM Solution • Flexible, interoperable data types • SArray+SFrame+SGraph inter-translatable • dense list, sparse array, image, text, tables, graphs • Graceful transition between data sizes • SFrame: memory to disk to distributed • One environment, many substrates • Python front-end • Localhost, cluster, Hadoop, EC2 • End-to-end • Data ingestion+feature engineering+model building+ deployment in a single environment
  • 19. GraphLab Create ML Toolkits Machine Learning Task Business Task Algorithms & SDK Recommender, Target, Social Match, … Regression, Classification, Data Matching,… SVM, Matrix Factorization, LDA, … Developers Savvy Dev & Data Sci. ML experts
  • 20. Demos
  • 21. GLC SDK example • Task: fill in missing value in an array using previous value • Existing solution: • E.g., use Pandas—Python library providing in- memory dataframes • Problem: • Given, say, 25M rows and 50 cols, takes forever to even load the data
  • 22. GLC SDK solution > cat fill.cpp #include <flexible_type/flexible_type.hpp> #include <unity/lib/toolkit_function_macros.hpp> #include <unity/lib/gl_sarray.hpp> using namespace graphlab; gl_sarray fill(gl_sarray sa) { gl_sarray_writer writer(sa.dtype(), 1); flexible_type last_value = sa[0]; for (const auto &elem: sa.range_iterator()) { if (elem != FLEX_UNDEFINED) last_value = elem; writer.write(last_value, 0); } return writer.close(); } BEGIN_FUNCTION_REGISTRATION REGISTER_FUNCTION(fill, "sa"); END_FUNCTION_REGISTRATION
  • 23. GLC SDK solution > cat Makefile all: fill.so fill.so: fill.cpp g++ -std=c++11 $^ -l graphlab –l ~/graphlab-dev/deps/shared-fPIC –o $@ -O3 > python >>> import graphlab as gl >>> gl.ext_import(‘fill.so’, ‘example’) >>> sa = gl.Sarray([1, 2, 3, None, 6]) >>> print gl.extensions.example.fill.fill(sa) [1, 2, 3, 3, 6]
  • 24. Join the revolution! • Research methods to make the following efficient and automatic: • Feature engineering • Model selection • Model debugging • Problem formulation (??) • Develop novel algorithms on top of our SDK • Backed by scalable, flexible typed data structures • Automatic Python wrappers • Make them available to many other peple • We’re hiring! jobs@graphlab.com