SlideShare a Scribd company logo
1 of 23
Download to read offline
D E C E M B E R 8 - 9 , 2 0 1 6
BigML, Inc 2
Poul Petersen
CIO, BigML, Inc.
Feature Engineering
Creating Machine Learning Ready Data
BigML, Inc 3Feature Engineering
Machine Learning Secret
“…the largest improvements in accuracy often came from
quick experiments, feature engineering, and model tuning
rather than applying fundamentally different algorithms.”
Facebook FBLearner 2016
Feature Engineering: applying domain knowledge of
the data to create features that make machine
learning algorithms work better or at all.
BigML, Inc 4Feature Engineering
Obstacles
• Data Structure
• Scattered across systems
• Wrong "shape"
• Unlabelled data
• Data Value
• Format: spelling, units
• Missing values
• Non-optimal correlation
• Non-existant correlation
• Data Significance
• Unwanted: PII, Non-Preferred
• Expensive to collect
• Insidious: Leakage, obviously correlated
Data Transformation
Feature Engineering
Feature Selection
BigML, Inc 5Feature Engineering
Feature Engineering
2013-09-25 10:02
Automatic Date Transformation
… year month day hour minute …
… 2013 Sep 25 10 2 …
… … … … … … …
NUM NUMCAT NUM NUM
DATE-TIME
BigML, Inc 6Feature Engineering
Feature Engineering
Automatic Categorical Transformation
… alchemy_category …
… business …
… recreation …
… health …
… … …
CAT
business health recreation …
… 1 0 0 …
… 0 0 1 …
… 0 1 0 …
… … … … …
NUM NUM NUM
BigML, Inc 7Feature Engineering
Feature Engineering
Be not afraid of greatness:
some are born great, some achieve
greatness, and some have greatness
thrust upon ‘em.
TEXT
Automatic Text Transformation
… great afraid born achieve …
… 4 1 1 1 …
… … … … … …
NUM NUM NUM NUM
BigML, Inc 8Feature Engineering
Feature Engineering
{
“url":"cbsnews",
"title":"Breaking News Headlines
Business Entertainment World News “,
"body":" news covering all the latest
breaking national and world news
headlines, including politics, sports,
entertainment, business and more.”
}
Fixing "non-optimal correlations"
title body
Breaking News… news covering…
… …
TEXT TEXT
TEXT
BigML, Inc 9Feature Engineering
Feature Engineering
Discretization
Total Spend
7.342,99
304,12
4,56
345,87
8.546,32
NUM
“Predict will spend
$3,521 with error
$1,232”
Spend Category
Top 33%
Bottom 33%
Bottom 33%
Middle 33%
Top 33%
CAT
“Predict customer
will be Top 33% in
spending”
BigML, Inc 10Feature Engineering
Feature Engineering
Combinations of Multiple Features
Kg M2
101,4 3,24
85,2 2,8
56,2 2,9
136,1 3,6
95,9 4,1
NUM NUM
BMI
31,29
30,42
19,38
37,81
23,39
NUM
Kg
M2
BigML, Inc 11Feature Engineering
Feature Engineering
Flatline
• BigML’s Domain-Specific Language (DSL) for
Transforming Datasets
• Limited programming language structures
• let, cond, if, maps, list operators, */+-
• Dataset Fields are first-class citizens
• (field “diabetes pedigree”)
• Built-in transformations
• statistics, strings, timestamps, windows
BigML, Inc 12Basic Transformations
Data Labelling
Data may not have labels needed for doing classification
Create specific metrics for adding labels
Name Month - 3 Month - 2 Month - 1
Joe Schmo 123,23 0 0
Jane Plain 0 0 0
Mary Happy 0 55,22 243,33
Tom Thumb 12,34 8,34 14,56
Un-Labelled Data
Labelled data
Name Month - 3 Month - 2 Month - 1 Default
Joe Schmo 123,23 0 0 FALSE
Jane Plain 0 0 0 TRUE
Mary Happy 0 55,22 243,33 FALSE
Tom Thumb 12,34 8,34 14,56 FALSE
(= 0 (+ (abs ( f "Month - 3" ) ) (abs ( f "Month - 2")) (abs ( f "Month - 1") ) ))
BigML, Inc 13Feature Engineering
Feature Engineering
(/ (- ( f "price") (avg-window "price" -4, -1)) (standard-deviation "price"))
date volume price
1 34353 314
2 44455 315
3 22333 315
4 52322 321
5 28000 320
6 31254 319
7 56544 323
8 44331 324
9 81111 287
10 65422 294
11 59999 300
12 45556 302
13 19899 301
14 21453 302
day-4 day-3 day-2 day-1 4davg
-
314 -
314 315 -
314 315 315 -
314 315 315 321 316,25
315 315 321 320 317,75
315 321 320 319 318,75
Current - (4-day avg)
std dev
Shock: Deviations from a Trend
BigML, Inc 14Feature Engineering
Feature Engineering
(/ (- (f "price") (avg-window "price" -4, -1)) (standard-deviation "price"))
Current - (4-day avg)
std dev
Shock: Deviations from a Trend
Current : (field “price”)
4-day avg: (avg-window “price” -4 -1)
std dev: (standard-deviation “price”)
BigML, Inc 15Feature Engineering
Feature Engineering
Moon Phase%
( / ( mod ( - ( / ( epoch ( field {{date-field}} )) 1000 ) 621300 ) 2551443 ) 2551442 )
BigML, Inc 16Feature Engineering
Feature Engineering
Fixing "non-existant correlations"
Highway Number Direction Is Long
2 East-West FALSE
4 East-West FALSE
5 North-South TRUE
8 East-West FALSE
10 East-West TRUE
… … …
Goal: Predict principle direction from highway number
( = (mod (field "Highway Number") 2) 0)
BigML, Inc 17Feature Engineering
Feature Engineering
Fix Missing Values in a “Meaningful” Way
Filter Zeros
Model 

insulin
Predict 

insulin
Select 

insulin
Fixed

Dataset
Amended

Dataset
Original

Dataset
Clean

Dataset
( if ( = (field "insulin") 0) (field "predicted insulin") (field "insulin"))
BigML, Inc 18
Feature Selection
BigML, Inc 19Feature Engineering
Feature Selection
• Model Summary
• Field Importance
• Algorithmic
• Best-First Feature Selection
• Boruta
• Leakage
• Tight Correlations (AD, Plot, Correlations)
• Test Data
• Perfect future knowledge
cat diabetes.csv diabetes_testset.csv | sort | uniq -d | wc -l
BigML, Inc 20
Evaluate & Automate
BigML, Inc 21Feature Engineering
Evaluate & Automate
• Evaluate
• Did you meet the goal?
• If not, did you discover something else useful?
• If not, start over
• If you did…
• Automate - You don’t want to hand code that every time, right?
• Consider tools that are easy to automate
• scripting interface
• APIs
• Ability to maintenance is important
BigML, Inc 22Feature Engineering
The Process
Data
Transform
Define Goal
Model &
Evaluate
no
yes
Better

Data
Not

Possible
Tune

Algorithm
Goal
Met?
Automate
Feature
Engineer &
Selection
Better

Features
BSSML16 L7. Feature Engineering

More Related Content

What's hot

VSSML16 L6. Feature Engineering
VSSML16 L6. Feature EngineeringVSSML16 L6. Feature Engineering
VSSML16 L6. Feature EngineeringBigML, Inc
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBigML, Inc
 
API, WhizzML and Apps
API, WhizzML and AppsAPI, WhizzML and Apps
API, WhizzML and AppsBigML, Inc
 
VSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data TransformationsVSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data TransformationsBigML, Inc
 
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...BigML, Inc
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsBigML, Inc
 
BigML Summer 2016 Release
BigML Summer 2016 ReleaseBigML Summer 2016 Release
BigML Summer 2016 ReleaseBigML, Inc
 
VSSML18. Data Transformations
VSSML18. Data TransformationsVSSML18. Data Transformations
VSSML18. Data TransformationsBigML, Inc
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1BigML, Inc
 
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML, Inc
 
BSSML17 - Ensembles
BSSML17 - EnsemblesBSSML17 - Ensembles
BSSML17 - EnsemblesBigML, Inc
 
VSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic WorkflowsVSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic WorkflowsBigML, Inc
 
A developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIsA developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIsLouis Dorard
 
Hundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario SpacagnaHundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario SpacagnaSpark Summit
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger AnalyticsItzhak Kameli
 
BigML Fall 2016 Release
BigML Fall 2016 ReleaseBigML Fall 2016 Release
BigML Fall 2016 ReleaseBigML, Inc
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...MLconf
 
MLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, DeepnetsMLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, DeepnetsBigML, Inc
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15MLconf
 

What's hot (20)

VSSML16 L6. Feature Engineering
VSSML16 L6. Feature EngineeringVSSML16 L6. Feature Engineering
VSSML16 L6. Feature Engineering
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzML
 
API, WhizzML and Apps
API, WhizzML and AppsAPI, WhizzML and Apps
API, WhizzML and Apps
 
VSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data TransformationsVSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data Transformations
 
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
VSSML16 L8. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
 
BigML Summer 2016 Release
BigML Summer 2016 ReleaseBigML Summer 2016 Release
BigML Summer 2016 Release
 
VSSML18. Data Transformations
VSSML18. Data TransformationsVSSML18. Data Transformations
VSSML18. Data Transformations
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
 
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 Release
 
BSSML17 - Ensembles
BSSML17 - EnsemblesBSSML17 - Ensembles
BSSML17 - Ensembles
 
VSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic WorkflowsVSSML16 L7. REST API, Bindings, and Basic Workflows
VSSML16 L7. REST API, Bindings, and Basic Workflows
 
A developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIsA developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIs
 
Hundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario SpacagnaHundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario Spacagna
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
 
BigML Fall 2016 Release
BigML Fall 2016 ReleaseBigML Fall 2016 Release
BigML Fall 2016 Release
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
MLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, DeepnetsMLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, Deepnets
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
Data Product Architectures
Data Product ArchitecturesData Product Architectures
Data Product Architectures
 

Similar to BSSML16 L7. Feature Engineering

MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision MakingBigML, Inc
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingBigML, Inc
 
BigML Education - Feature Engineering with Flatline
BigML Education - Feature Engineering with FlatlineBigML Education - Feature Engineering with Flatline
BigML Education - Feature Engineering with FlatlineBigML, Inc
 
BSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBigML, Inc
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015NoSQLmatters
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
 
MLSEV. Use Case: Predicting Manufacturing Results with Machine Learning
MLSEV. Use Case: Predicting Manufacturing Results with Machine LearningMLSEV. Use Case: Predicting Manufacturing Results with Machine Learning
MLSEV. Use Case: Predicting Manufacturing Results with Machine LearningBigML, Inc
 
Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX
Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTXDecision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX
Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTXSanjayKPrasad2
 
Horizon 20110928
Horizon 20110928Horizon 20110928
Horizon 20110928Mike Miller
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2akitda
 
Design Optimization of Safety Critical Component for Fatigue and Strength Usi...
Design Optimization of Safety Critical Component for Fatigue and Strength Usi...Design Optimization of Safety Critical Component for Fatigue and Strength Usi...
Design Optimization of Safety Critical Component for Fatigue and Strength Usi...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
IncQuery Labs Models 2020 MIP Talk
IncQuery Labs Models 2020 MIP TalkIncQuery Labs Models 2020 MIP Talk
IncQuery Labs Models 2020 MIP TalkIncQuery Labs
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBMongoDB
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Databricks
 
The five graphs of telecommunications may 22 2013 webinar final
The five graphs of telecommunications may 22 2013 webinar finalThe five graphs of telecommunications may 22 2013 webinar final
The five graphs of telecommunications may 22 2013 webinar finalNeo4j
 
The five graphs of telecommunications may 22 2013 webinar final
The five graphs of telecommunications may 22 2013 webinar finalThe five graphs of telecommunications may 22 2013 webinar final
The five graphs of telecommunications may 22 2013 webinar finalNeo4j
 
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...Neo4j
 
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingConstraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingEray Cakici
 
Interactive Cad Solutions
Interactive Cad SolutionsInteractive Cad Solutions
Interactive Cad Solutionsrobertconleysr
 
Katmanlı Üretim (Additive manufacturing) bilgi görseli
Katmanlı Üretim (Additive manufacturing) bilgi görseliKatmanlı Üretim (Additive manufacturing) bilgi görseli
Katmanlı Üretim (Additive manufacturing) bilgi görseliAdem Çelik
 

Similar to BSSML16 L7. Feature Engineering (20)

MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision Making
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision Making
 
BigML Education - Feature Engineering with Flatline
BigML Education - Feature Engineering with FlatlineBigML Education - Feature Engineering with Flatline
BigML Education - Feature Engineering with Flatline
 
BSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBSSML17 - Basic Data Transformations
BSSML17 - Basic Data Transformations
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
MLSEV. Use Case: Predicting Manufacturing Results with Machine Learning
MLSEV. Use Case: Predicting Manufacturing Results with Machine LearningMLSEV. Use Case: Predicting Manufacturing Results with Machine Learning
MLSEV. Use Case: Predicting Manufacturing Results with Machine Learning
 
Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX
Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTXDecision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX
Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX
 
Horizon 20110928
Horizon 20110928Horizon 20110928
Horizon 20110928
 
Dimensional Modelling Session 2
Dimensional Modelling Session 2Dimensional Modelling Session 2
Dimensional Modelling Session 2
 
Design Optimization of Safety Critical Component for Fatigue and Strength Usi...
Design Optimization of Safety Critical Component for Fatigue and Strength Usi...Design Optimization of Safety Critical Component for Fatigue and Strength Usi...
Design Optimization of Safety Critical Component for Fatigue and Strength Usi...
 
IncQuery Labs Models 2020 MIP Talk
IncQuery Labs Models 2020 MIP TalkIncQuery Labs Models 2020 MIP Talk
IncQuery Labs Models 2020 MIP Talk
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
 
The five graphs of telecommunications may 22 2013 webinar final
The five graphs of telecommunications may 22 2013 webinar finalThe five graphs of telecommunications may 22 2013 webinar final
The five graphs of telecommunications may 22 2013 webinar final
 
The five graphs of telecommunications may 22 2013 webinar final
The five graphs of telecommunications may 22 2013 webinar finalThe five graphs of telecommunications may 22 2013 webinar final
The five graphs of telecommunications may 22 2013 webinar final
 
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
 
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingConstraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
 
Interactive Cad Solutions
Interactive Cad SolutionsInteractive Cad Solutions
Interactive Cad Solutions
 
Katmanlı Üretim (Additive manufacturing) bilgi görseli
Katmanlı Üretim (Additive manufacturing) bilgi görseliKatmanlı Üretim (Additive manufacturing) bilgi görseli
Katmanlı Üretim (Additive manufacturing) bilgi görseli
 

More from BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 

Recently uploaded

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

BSSML16 L7. Feature Engineering

  • 1. D E C E M B E R 8 - 9 , 2 0 1 6
  • 2. BigML, Inc 2 Poul Petersen CIO, BigML, Inc. Feature Engineering Creating Machine Learning Ready Data
  • 3. BigML, Inc 3Feature Engineering Machine Learning Secret “…the largest improvements in accuracy often came from quick experiments, feature engineering, and model tuning rather than applying fundamentally different algorithms.” Facebook FBLearner 2016 Feature Engineering: applying domain knowledge of the data to create features that make machine learning algorithms work better or at all.
  • 4. BigML, Inc 4Feature Engineering Obstacles • Data Structure • Scattered across systems • Wrong "shape" • Unlabelled data • Data Value • Format: spelling, units • Missing values • Non-optimal correlation • Non-existant correlation • Data Significance • Unwanted: PII, Non-Preferred • Expensive to collect • Insidious: Leakage, obviously correlated Data Transformation Feature Engineering Feature Selection
  • 5. BigML, Inc 5Feature Engineering Feature Engineering 2013-09-25 10:02 Automatic Date Transformation … year month day hour minute … … 2013 Sep 25 10 2 … … … … … … … … NUM NUMCAT NUM NUM DATE-TIME
  • 6. BigML, Inc 6Feature Engineering Feature Engineering Automatic Categorical Transformation … alchemy_category … … business … … recreation … … health … … … … CAT business health recreation … … 1 0 0 … … 0 0 1 … … 0 1 0 … … … … … … NUM NUM NUM
  • 7. BigML, Inc 7Feature Engineering Feature Engineering Be not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon ‘em. TEXT Automatic Text Transformation … great afraid born achieve … … 4 1 1 1 … … … … … … … NUM NUM NUM NUM
  • 8. BigML, Inc 8Feature Engineering Feature Engineering { “url":"cbsnews", "title":"Breaking News Headlines Business Entertainment World News “, "body":" news covering all the latest breaking national and world news headlines, including politics, sports, entertainment, business and more.” } Fixing "non-optimal correlations" title body Breaking News… news covering… … … TEXT TEXT TEXT
  • 9. BigML, Inc 9Feature Engineering Feature Engineering Discretization Total Spend 7.342,99 304,12 4,56 345,87 8.546,32 NUM “Predict will spend $3,521 with error $1,232” Spend Category Top 33% Bottom 33% Bottom 33% Middle 33% Top 33% CAT “Predict customer will be Top 33% in spending”
  • 10. BigML, Inc 10Feature Engineering Feature Engineering Combinations of Multiple Features Kg M2 101,4 3,24 85,2 2,8 56,2 2,9 136,1 3,6 95,9 4,1 NUM NUM BMI 31,29 30,42 19,38 37,81 23,39 NUM Kg M2
  • 11. BigML, Inc 11Feature Engineering Feature Engineering Flatline • BigML’s Domain-Specific Language (DSL) for Transforming Datasets • Limited programming language structures • let, cond, if, maps, list operators, */+- • Dataset Fields are first-class citizens • (field “diabetes pedigree”) • Built-in transformations • statistics, strings, timestamps, windows
  • 12. BigML, Inc 12Basic Transformations Data Labelling Data may not have labels needed for doing classification Create specific metrics for adding labels Name Month - 3 Month - 2 Month - 1 Joe Schmo 123,23 0 0 Jane Plain 0 0 0 Mary Happy 0 55,22 243,33 Tom Thumb 12,34 8,34 14,56 Un-Labelled Data Labelled data Name Month - 3 Month - 2 Month - 1 Default Joe Schmo 123,23 0 0 FALSE Jane Plain 0 0 0 TRUE Mary Happy 0 55,22 243,33 FALSE Tom Thumb 12,34 8,34 14,56 FALSE (= 0 (+ (abs ( f "Month - 3" ) ) (abs ( f "Month - 2")) (abs ( f "Month - 1") ) ))
  • 13. BigML, Inc 13Feature Engineering Feature Engineering (/ (- ( f "price") (avg-window "price" -4, -1)) (standard-deviation "price")) date volume price 1 34353 314 2 44455 315 3 22333 315 4 52322 321 5 28000 320 6 31254 319 7 56544 323 8 44331 324 9 81111 287 10 65422 294 11 59999 300 12 45556 302 13 19899 301 14 21453 302 day-4 day-3 day-2 day-1 4davg - 314 - 314 315 - 314 315 315 - 314 315 315 321 316,25 315 315 321 320 317,75 315 321 320 319 318,75 Current - (4-day avg) std dev Shock: Deviations from a Trend
  • 14. BigML, Inc 14Feature Engineering Feature Engineering (/ (- (f "price") (avg-window "price" -4, -1)) (standard-deviation "price")) Current - (4-day avg) std dev Shock: Deviations from a Trend Current : (field “price”) 4-day avg: (avg-window “price” -4 -1) std dev: (standard-deviation “price”)
  • 15. BigML, Inc 15Feature Engineering Feature Engineering Moon Phase% ( / ( mod ( - ( / ( epoch ( field {{date-field}} )) 1000 ) 621300 ) 2551443 ) 2551442 )
  • 16. BigML, Inc 16Feature Engineering Feature Engineering Fixing "non-existant correlations" Highway Number Direction Is Long 2 East-West FALSE 4 East-West FALSE 5 North-South TRUE 8 East-West FALSE 10 East-West TRUE … … … Goal: Predict principle direction from highway number ( = (mod (field "Highway Number") 2) 0)
  • 17. BigML, Inc 17Feature Engineering Feature Engineering Fix Missing Values in a “Meaningful” Way Filter Zeros Model 
 insulin Predict 
 insulin Select 
 insulin Fixed
 Dataset Amended
 Dataset Original
 Dataset Clean
 Dataset ( if ( = (field "insulin") 0) (field "predicted insulin") (field "insulin"))
  • 19. BigML, Inc 19Feature Engineering Feature Selection • Model Summary • Field Importance • Algorithmic • Best-First Feature Selection • Boruta • Leakage • Tight Correlations (AD, Plot, Correlations) • Test Data • Perfect future knowledge cat diabetes.csv diabetes_testset.csv | sort | uniq -d | wc -l
  • 21. BigML, Inc 21Feature Engineering Evaluate & Automate • Evaluate • Did you meet the goal? • If not, did you discover something else useful? • If not, start over • If you did… • Automate - You don’t want to hand code that every time, right? • Consider tools that are easy to automate • scripting interface • APIs • Ability to maintenance is important
  • 22. BigML, Inc 22Feature Engineering The Process Data Transform Define Goal Model & Evaluate no yes Better Data Not Possible Tune Algorithm Goal Met? Automate Feature Engineer & Selection Better
 Features