SlideShare a Scribd company logo
1 of 35
Download to read offline
Introducing Principal Component Analysis
PCA Release
BigML, Inc BigML PCA Release Webinar
Fall 2018 Release
GREGORY ANTELL, Ph.D. - Machine Learning
Architect and Product Manager
Please enter questions into chat box – We will answer
some via chat and others at the end of the session
https://bigml.com/releases/fall-2018
ATAKAN CETINSOY - VP of Predictive Applications
Resources
Moderator
Speaker
Contact support@bigml.com
Twitter @bigmlcom
Questions
2
BigML, Inc BigML PCA Release Webinar
Agenda: Principal Component Analysis
3
1 Utility in Machine Learning Workflows
2 High Dimensional Data in Machine Learning
3 PCA Intuition and Methodology
4 Use Cases with the BigML Dashboard
5 BigML Implementation
BigML, Inc BigML PCA Release Webinar
Agenda: Principal Component Analysis
4
1 Utility in Machine Learning Workflows
2 High Dimensional Data in Machine Learning
3 PCA Intuition and Methodology
4 Use Cases with the BigML Dashboard
5 BigML Implementation
BigML, Inc BigML PCA Release Webinar
Problem Formulation
Data Acquisition
Feature Engineering
Modeling and Evaluations
Predictions
Measure Results
Data Transformations
Task
5
Steps of a ML Application
BigML, Inc BigML PCA Release Webinar
Steps of a ML Application
Problem Formulation
Data Acquisition
Feature Engineering
Modeling and Evaluations
Predictions
Measure Results
Data Transformations
Task
6
• More often than changing models,
improvement comes from more data
or better features
• Garbage In, Garbage Out principle
• Model training and hyper-parameter
tuning can be automated, feature
engineering (mostly) cannot
BigML, Inc BigML PCA Release Webinar
Steps of a ML Application
Problem Formulation
Data Acquisition
Feature Engineering
Modeling and Evaluations
Predictions
Measure Results
Data Transformations
Today’s release
further expands what
is possible in
Task
7
BigML, Inc BigML PCA Release Webinar
Agenda: Principal Component Analysis
8
1 Utility in Machine Learning Workflows
2 High Dimensional Data in Machine Learning
3 PCA Intuition and Methodology
4 Use Cases with the BigML Dashboard
5 BigML Implementation
BigML, Inc BigML PCA Release Webinar
High-dimensional Data
9
F1 F2 F3 F4 F5 … FN
I1
I2
I3
I4
I5
…
IN
Features (p)
Instances (n)
Machine Learning typically performs better when n >>> p
BigML, Inc BigML PCA Release Webinar
Dangers of high-dimensional Data
• Implicitly increases model complexity, prone to overfitting
• Requires more observations in order to generalize well
• Contains correlated or useless variables
• Data is difficult to visualize
• Takes a longer time to train models or make predictions
10
Principal Component Analysis
addresses all of these issues
BigML, Inc BigML PCA Release Webinar
Model Complexity and Training Data
11
• Models with lower complexity
will converge to higher test error
rates
Number of training examples
TestError
Model 1
Model 2
BigML, Inc BigML PCA Release Webinar
Model Complexity and Training Data
12
• Models with lower complexity
will converge to higher test error
rates
• A threshold exists where
enough training data is available
to favor the more complex
model
• With a fixed amount of data,
less complex models are often
favoredNumber of training examples
TestError
Less Complex
Model Wins
Model 1
Model 2
More Complex
Model Wins
BigML, Inc BigML PCA Release Webinar
Combating High-dimensional Data
13
MODEL Pruning, Node threshold
ENSEMBLE Bagging, Randomization
LOGISTIC
REGRESSION
L1 and L2 penalties
DEEPNET Dropout
BigML, Inc BigML PCA Release Webinar
Dimensionality Reduction
14
Feature Selection
• Preserves the original variables and selects a subset
• Often uses recursive methods or statistical thresholds
• Examples: RFE, Chi-Squared Test, Boruta
Feature Extraction
• Transforms original variables into variables better suited for modeling
• Examples: word vectors, clustering
• PCA falls into this category
Reducing the dimensions will decrease model complexity
BigML, Inc BigML PCA Release Webinar
Agenda: Principal Component Analysis
15
1 Utility in Machine Learning Workflows
2 High Dimensional Data in Machine Learning
3 PCA Intuition and Methodology
4 Use Cases with the BigML Dashboard
5 BigML Implementation
BigML, Inc BigML PCA Release Webinar
Why Consider Using PCA?
1. You want to reduce the number of variables in your model, but
it is not clear which should be eliminated
2. You want to generate variables that are not correlated
3. You are okay with sacrificing some amount of interpretability
for potential downstream performance gains
16
BigML, Inc BigML PCA Release Webinar
PCA in Machine Learning Workflows
17
SOURCE DATASET
TRAIN
TEST
BigML, Inc BigML PCA Release Webinar 18
PCA
PCA in Machine Learning Workflows
SOURCE DATASET
TRAIN
TEST
BigML, Inc BigML PCA Release Webinar 19
BATCH
PROJECTION
PCA in Machine Learning Workflows
BATCH
PROJECTION
SOURCE DATASET
TRAIN
TEST
PCA
BigML, Inc BigML PCA Release Webinar 20
NEW TRAIN
FEATURES
NEW TEST
FEATURES
PCA in Machine Learning Workflows
BATCH
PROJECTION
BATCH
PROJECTION
SOURCE DATASET
TRAIN
TEST
PCA
BigML, Inc BigML PCA Release Webinar 21
PCA in Machine Learning Workflows
NEW TRAIN
FEATURES
NEW TEST
FEATURES
BATCH
PROJECTION
BATCH
PROJECTION
SOURCE DATASET
TRAIN
TEST
PCA
What’s special about
these new features?
BigML, Inc BigML PCA Release Webinar 22
Original Data Matrix
F1 F2 F3 F4 F5 … FN
I1
I2
I3
I4
I5
…
IN
Transformed Data Matrix
PC1 PC2 PC3 PC4 PC5 … PCN
I1
I2
I3
I4
I5
…
IN
The new variables are the “principal components”
What Does PCA Yield?
BigML, Inc BigML PCA Release Webinar 23
Properties of Principal Components
Each PC is a linear combination of original variables
PC1 = w1F1 + w2F2 + w3F3 + … + wNFN
PC2 = w1F1 + w2F2 + w3F3 + … + wNFN
PCN = w1F1 + w2F2 + w3F3 + … + wNFN
…
BigML, Inc BigML PCA Release Webinar 24
Geometric Interpretation of PCA
BigML, Inc BigML PCA Release Webinar 25
Intuition Behind Principal Components
BigML, Inc BigML PCA Release Webinar 26
Intuition Behind Principal Components
BigML, Inc BigML PCA Release Webinar 27
Properties of Principal Components
Original Data Transformed Data
Principal Components are not correlated
BigML, Inc BigML PCA Release Webinar 28
Properties of Principal Components
Principal Components are sorted by the percentage
of variance explained in the original data
BigML, Inc BigML PCA Release Webinar 29
How to Reduce Dimensions
Approach #1
Directly select how
many PCs to keep
BigML, Inc BigML PCA Release Webinar 30
How to Reduce Dimensions
Approach #2
Select a threshold for
the cumulative Percent
Variance Explained
BigML, Inc BigML PCA Release Webinar
Agenda: Principal Component Analysis
31
1 Utility in Machine Learning Workflows
2 High Dimensional Data in Machine Learning
3 PCA Intuition and Methodology
4 Use Cases with the BigML Dashboard
5 BigML Implementation
BigML, Inc BigML PCA Release Webinar
Agenda: Principal Component Analysis
32
1 Utility in Machine Learning Workflows
2 High Dimensional Data in Machine Learning
3 PCA Intuition and Methodology
4 Use Cases with BigML Dashboard
5 BigML Implementation
BigML, Inc BigML PCA Release Webinar 33
BigML-Specific Implementation
• Standard PCA only applies to numerical data
• BigML uses three different data transformation methods in order to
handle different data types
• Numeric data: Principal Component Analysis (PCA)
• Categorical data: Multiple Correspondence Analysis (MCA)
• Mixed data: Factorial Analysis of Mixed Data (FAMD)
• BigML will automatically handle numeric, text, items, and categorical
data without needing user input
BigML, Inc BigML PCA Release Webinar
https://bigml.com/releases/fall-2018
34
More Info
Questions?
@bigmlcom support@bigml.com

More Related Content

What's hot

Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez
 
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
Dataconomy Media
 
DN18 | Technical Debt in Machine Learning | Jaroslaw Szymczak | OLX
DN18 | Technical Debt in Machine Learning | Jaroslaw Szymczak | OLXDN18 | Technical Debt in Machine Learning | Jaroslaw Szymczak | OLX
DN18 | Technical Debt in Machine Learning | Jaroslaw Szymczak | OLX
Dataconomy Media
 

What's hot (20)

FrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyFrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and Cheaply
 
Demystifying Data Science
Demystifying Data ScienceDemystifying Data Science
Demystifying Data Science
 
MLSEV Virtual. Applying Topic Modelling to improve Operations
MLSEV Virtual. Applying Topic Modelling to improve OperationsMLSEV Virtual. Applying Topic Modelling to improve Operations
MLSEV Virtual. Applying Topic Modelling to improve Operations
 
The Past, Present, and Future of Machine Learning APIs
The Past, Present, and Future of Machine Learning APIsThe Past, Present, and Future of Machine Learning APIs
The Past, Present, and Future of Machine Learning APIs
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
 
MLSD18. Supervised Workshop
MLSD18. Supervised WorkshopMLSD18. Supervised Workshop
MLSD18. Supervised Workshop
 
Machine Learning: Past, Present and Future - by Tom Dietterich
Machine Learning: Past, Present and Future - by Tom DietterichMachine Learning: Past, Present and Future - by Tom Dietterich
Machine Learning: Past, Present and Future - by Tom Dietterich
 
MLSD18. Supervised Summary
MLSD18. Supervised SummaryMLSD18. Supervised Summary
MLSD18. Supervised Summary
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
MLSD18 Evaluations
MLSD18 EvaluationsMLSD18 Evaluations
MLSD18 Evaluations
 
MLSD18. End-to-End Machine Learning
MLSD18. End-to-End Machine LearningMLSD18. End-to-End Machine Learning
MLSD18. End-to-End Machine Learning
 
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Building A Feature Factory
Building A Feature FactoryBuilding A Feature Factory
Building A Feature Factory
 
MLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, DeepnetsMLSD18. Ensembles, Logistic Regression, Deepnets
MLSD18. Ensembles, Logistic Regression, Deepnets
 
DN18 | Technical Debt in Machine Learning | Jaroslaw Szymczak | OLX
DN18 | Technical Debt in Machine Learning | Jaroslaw Szymczak | OLXDN18 | Technical Debt in Machine Learning | Jaroslaw Szymczak | OLX
DN18 | Technical Debt in Machine Learning | Jaroslaw Szymczak | OLX
 
MLSD18. OptiML and Fusions
MLSD18. OptiML and FusionsMLSD18. OptiML and Fusions
MLSD18. OptiML and Fusions
 
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature Engineering
 
Pm.ais ummit 180917 final
Pm.ais ummit 180917 finalPm.ais ummit 180917 final
Pm.ais ummit 180917 final
 
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
 

Similar to BigML Release: PCA

Similar to BigML Release: PCA (20)

BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
BigML Release: OptiML
BigML Release: OptiMLBigML Release: OptiML
BigML Release: OptiML
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering
 
VSSML18. Introduction to Machine Learning and the BigML Platform
VSSML18. Introduction to Machine Learning and the BigML PlatformVSSML18. Introduction to Machine Learning and the BigML Platform
VSSML18. Introduction to Machine Learning and the BigML Platform
 
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
Auto­matic Para­meter Tun­ing for Data­bases and Big Data Sys­tems
 
Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX
Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTXDecision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX
Decision Optimization - CPLEX Optimization Studio - Product Overview(2).PPTX
 
Certified Data Science Specialist Course Preview Dr. Nickholas
Certified Data Science Specialist Course Preview Dr. NickholasCertified Data Science Specialist Course Preview Dr. Nickholas
Certified Data Science Specialist Course Preview Dr. Nickholas
 
Data Science: Good, Bad and Ugly by Irina Kukuyeva
Data Science: Good, Bad and Ugly by Irina KukuyevaData Science: Good, Bad and Ugly by Irina Kukuyeva
Data Science: Good, Bad and Ugly by Irina Kukuyeva
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Productionalizing Machine Learning Models: The Good, The Bad and The Ugly
Productionalizing Machine Learning Models: The Good, The Bad and The UglyProductionalizing Machine Learning Models: The Good, The Bad and The Ugly
Productionalizing Machine Learning Models: The Good, The Bad and The Ugly
 
VSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsVSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and Deepnets
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive Sector
 
Scaling up deep learning by scaling down
Scaling up deep learning by scaling downScaling up deep learning by scaling down
Scaling up deep learning by scaling down
 
Experiment Management for the Enterprise
Experiment Management for the EnterpriseExperiment Management for the Enterprise
Experiment Management for the Enterprise
 
VSSML18. Advanced WhizzML Workflows
VSSML18. Advanced WhizzML WorkflowsVSSML18. Advanced WhizzML Workflows
VSSML18. Advanced WhizzML Workflows
 
Productionalizing Machine Learning Models: The Good, the Bad, and the Ugly
Productionalizing Machine Learning Models: The Good, the Bad, and the UglyProductionalizing Machine Learning Models: The Good, the Bad, and the Ugly
Productionalizing Machine Learning Models: The Good, the Bad, and the Ugly
 
Tuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarTuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques Webinar
 
BigML Release: Fusions
BigML Release: FusionsBigML Release: Fusions
BigML Release: Fusions
 
Scaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling DownScaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling Down
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
 

More from BigML, Inc

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 

Recently uploaded

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 

Recently uploaded (20)

Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 

BigML Release: PCA

  • 1. Introducing Principal Component Analysis PCA Release
  • 2. BigML, Inc BigML PCA Release Webinar Fall 2018 Release GREGORY ANTELL, Ph.D. - Machine Learning Architect and Product Manager Please enter questions into chat box – We will answer some via chat and others at the end of the session https://bigml.com/releases/fall-2018 ATAKAN CETINSOY - VP of Predictive Applications Resources Moderator Speaker Contact support@bigml.com Twitter @bigmlcom Questions 2
  • 3. BigML, Inc BigML PCA Release Webinar Agenda: Principal Component Analysis 3 1 Utility in Machine Learning Workflows 2 High Dimensional Data in Machine Learning 3 PCA Intuition and Methodology 4 Use Cases with the BigML Dashboard 5 BigML Implementation
  • 4. BigML, Inc BigML PCA Release Webinar Agenda: Principal Component Analysis 4 1 Utility in Machine Learning Workflows 2 High Dimensional Data in Machine Learning 3 PCA Intuition and Methodology 4 Use Cases with the BigML Dashboard 5 BigML Implementation
  • 5. BigML, Inc BigML PCA Release Webinar Problem Formulation Data Acquisition Feature Engineering Modeling and Evaluations Predictions Measure Results Data Transformations Task 5 Steps of a ML Application
  • 6. BigML, Inc BigML PCA Release Webinar Steps of a ML Application Problem Formulation Data Acquisition Feature Engineering Modeling and Evaluations Predictions Measure Results Data Transformations Task 6 • More often than changing models, improvement comes from more data or better features • Garbage In, Garbage Out principle • Model training and hyper-parameter tuning can be automated, feature engineering (mostly) cannot
  • 7. BigML, Inc BigML PCA Release Webinar Steps of a ML Application Problem Formulation Data Acquisition Feature Engineering Modeling and Evaluations Predictions Measure Results Data Transformations Today’s release further expands what is possible in Task 7
  • 8. BigML, Inc BigML PCA Release Webinar Agenda: Principal Component Analysis 8 1 Utility in Machine Learning Workflows 2 High Dimensional Data in Machine Learning 3 PCA Intuition and Methodology 4 Use Cases with the BigML Dashboard 5 BigML Implementation
  • 9. BigML, Inc BigML PCA Release Webinar High-dimensional Data 9 F1 F2 F3 F4 F5 … FN I1 I2 I3 I4 I5 … IN Features (p) Instances (n) Machine Learning typically performs better when n >>> p
  • 10. BigML, Inc BigML PCA Release Webinar Dangers of high-dimensional Data • Implicitly increases model complexity, prone to overfitting • Requires more observations in order to generalize well • Contains correlated or useless variables • Data is difficult to visualize • Takes a longer time to train models or make predictions 10 Principal Component Analysis addresses all of these issues
  • 11. BigML, Inc BigML PCA Release Webinar Model Complexity and Training Data 11 • Models with lower complexity will converge to higher test error rates Number of training examples TestError Model 1 Model 2
  • 12. BigML, Inc BigML PCA Release Webinar Model Complexity and Training Data 12 • Models with lower complexity will converge to higher test error rates • A threshold exists where enough training data is available to favor the more complex model • With a fixed amount of data, less complex models are often favoredNumber of training examples TestError Less Complex Model Wins Model 1 Model 2 More Complex Model Wins
  • 13. BigML, Inc BigML PCA Release Webinar Combating High-dimensional Data 13 MODEL Pruning, Node threshold ENSEMBLE Bagging, Randomization LOGISTIC REGRESSION L1 and L2 penalties DEEPNET Dropout
  • 14. BigML, Inc BigML PCA Release Webinar Dimensionality Reduction 14 Feature Selection • Preserves the original variables and selects a subset • Often uses recursive methods or statistical thresholds • Examples: RFE, Chi-Squared Test, Boruta Feature Extraction • Transforms original variables into variables better suited for modeling • Examples: word vectors, clustering • PCA falls into this category Reducing the dimensions will decrease model complexity
  • 15. BigML, Inc BigML PCA Release Webinar Agenda: Principal Component Analysis 15 1 Utility in Machine Learning Workflows 2 High Dimensional Data in Machine Learning 3 PCA Intuition and Methodology 4 Use Cases with the BigML Dashboard 5 BigML Implementation
  • 16. BigML, Inc BigML PCA Release Webinar Why Consider Using PCA? 1. You want to reduce the number of variables in your model, but it is not clear which should be eliminated 2. You want to generate variables that are not correlated 3. You are okay with sacrificing some amount of interpretability for potential downstream performance gains 16
  • 17. BigML, Inc BigML PCA Release Webinar PCA in Machine Learning Workflows 17 SOURCE DATASET TRAIN TEST
  • 18. BigML, Inc BigML PCA Release Webinar 18 PCA PCA in Machine Learning Workflows SOURCE DATASET TRAIN TEST
  • 19. BigML, Inc BigML PCA Release Webinar 19 BATCH PROJECTION PCA in Machine Learning Workflows BATCH PROJECTION SOURCE DATASET TRAIN TEST PCA
  • 20. BigML, Inc BigML PCA Release Webinar 20 NEW TRAIN FEATURES NEW TEST FEATURES PCA in Machine Learning Workflows BATCH PROJECTION BATCH PROJECTION SOURCE DATASET TRAIN TEST PCA
  • 21. BigML, Inc BigML PCA Release Webinar 21 PCA in Machine Learning Workflows NEW TRAIN FEATURES NEW TEST FEATURES BATCH PROJECTION BATCH PROJECTION SOURCE DATASET TRAIN TEST PCA What’s special about these new features?
  • 22. BigML, Inc BigML PCA Release Webinar 22 Original Data Matrix F1 F2 F3 F4 F5 … FN I1 I2 I3 I4 I5 … IN Transformed Data Matrix PC1 PC2 PC3 PC4 PC5 … PCN I1 I2 I3 I4 I5 … IN The new variables are the “principal components” What Does PCA Yield?
  • 23. BigML, Inc BigML PCA Release Webinar 23 Properties of Principal Components Each PC is a linear combination of original variables PC1 = w1F1 + w2F2 + w3F3 + … + wNFN PC2 = w1F1 + w2F2 + w3F3 + … + wNFN PCN = w1F1 + w2F2 + w3F3 + … + wNFN …
  • 24. BigML, Inc BigML PCA Release Webinar 24 Geometric Interpretation of PCA
  • 25. BigML, Inc BigML PCA Release Webinar 25 Intuition Behind Principal Components
  • 26. BigML, Inc BigML PCA Release Webinar 26 Intuition Behind Principal Components
  • 27. BigML, Inc BigML PCA Release Webinar 27 Properties of Principal Components Original Data Transformed Data Principal Components are not correlated
  • 28. BigML, Inc BigML PCA Release Webinar 28 Properties of Principal Components Principal Components are sorted by the percentage of variance explained in the original data
  • 29. BigML, Inc BigML PCA Release Webinar 29 How to Reduce Dimensions Approach #1 Directly select how many PCs to keep
  • 30. BigML, Inc BigML PCA Release Webinar 30 How to Reduce Dimensions Approach #2 Select a threshold for the cumulative Percent Variance Explained
  • 31. BigML, Inc BigML PCA Release Webinar Agenda: Principal Component Analysis 31 1 Utility in Machine Learning Workflows 2 High Dimensional Data in Machine Learning 3 PCA Intuition and Methodology 4 Use Cases with the BigML Dashboard 5 BigML Implementation
  • 32. BigML, Inc BigML PCA Release Webinar Agenda: Principal Component Analysis 32 1 Utility in Machine Learning Workflows 2 High Dimensional Data in Machine Learning 3 PCA Intuition and Methodology 4 Use Cases with BigML Dashboard 5 BigML Implementation
  • 33. BigML, Inc BigML PCA Release Webinar 33 BigML-Specific Implementation • Standard PCA only applies to numerical data • BigML uses three different data transformation methods in order to handle different data types • Numeric data: Principal Component Analysis (PCA) • Categorical data: Multiple Correspondence Analysis (MCA) • Mixed data: Factorial Analysis of Mixed Data (FAMD) • BigML will automatically handle numeric, text, items, and categorical data without needing user input
  • 34. BigML, Inc BigML PCA Release Webinar https://bigml.com/releases/fall-2018 34 More Info