SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Principal
Component
Analysis
Ricardo Wendell
Aug 2013
2
Feature Engineering
(Our motivation)
Introduction to Principal Component
Analysis
(And some statistical concepts)
Agile Analytics and PCA
(Helping visualization…)
Agenda
3
Feature
Engineering
4
Given a
classification
problem…
How do we choose
the right features?
5
Intuition
fails in high
dimensions
Building a classifier in two or three
dimensions is relatively easy…
It’s usually possible to find a
reasonable frontier between
examples of different
classes just by visual inspection.
6
Feature
engineering
Intuitively, one might
think that gathering
more features never
hurts, right?
At worst they provide
no new information
about the domain…
7
The curse of
dimensionality
Many algorithms that work fine in low
dimensions become intractable
when the input is high-dimensional.
Bellman, 1961
8
How do we
solve it?
Feature Selection
Feature Extraction
9
Feature
extraction
“In most applications
examples are not spread
uniformly throughout the
examples space, but are
concentrated on or near
a lower-dimensional
subspace.”
10
Introduction to
PCA
11
Objective of
PCA
To perform dimensionality
reduction while preserving
as much of the randomness
in the high-dimensional
space as possible
12
Principal
Component
Analysis
It takes your cloud of data
points, and rotates it such
that the maximum variability
is visible.
PCA is mainly concerned
with identifying correlations
in the data.
13
Measuring
Correlation
Degree and type of relationship
between any two or more quantities
(variables) in which they vary together
over a period
Correlation can vary from +1 to -1.
Values close to +1 indicate a high-
degree of positive correlation, and
values close to -1 indicate a high
degree of negative correlation.
Values close to zero indicate poor
correlation of either kind, and 0
indicates no correlation at all
14
Measuring
Correlation
15
Beware: Correlation does not
imply causation
16
Correlation
matrix
It shows at a glance how
variables correlate with
each other
17
Eingenvalues
and
eingevectors
18
Steps for PCA 1. Standardize the data
2. Calculate the covariance matrix
3. Find the eigenvalues and
eingenvectors of the covariance
matrix
4. Plot the eigenvectors / principal
components over the scaled data
19
Demo
with R
Let’s check the products
of PCA…
20
Agile analytics
and PCA
21
Agile
Analytics
Machine learning and data
mining tools and techniques
+
Knowledge of the
domain at hand
+
Short feedback cycles
22
Agile
Analytics
We could use PCA as a tool to
quickly identify correlation
between features, helping
feature extraction and
selection.
Reducing dimensionality using
PCA or other similar technique
can help us achieve better and
quicker results.
23	

QA & Next Steps
23

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Pca
PcaPca
Pca
 
Lasso and ridge regression
Lasso and ridge regressionLasso and ridge regression
Lasso and ridge regression
 
Pca(principal components analysis)
Pca(principal components analysis)Pca(principal components analysis)
Pca(principal components analysis)
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
pca.ppt
pca.pptpca.ppt
pca.ppt
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 

Ähnlich wie Principal Component Analysis

Machine Learning for the System Administrator
Machine Learning for the System AdministratorMachine Learning for the System Administrator
Machine Learning for the System Administrator
butest
 
fmelleHumanActivityRecognitionWithMobileSensors
fmelleHumanActivityRecognitionWithMobileSensorsfmelleHumanActivityRecognitionWithMobileSensors
fmelleHumanActivityRecognitionWithMobileSensors
Fridtjof Melle
 
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutionDA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
gitikasingh2004
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
Shrey Nishchal
 

Ähnlich wie Principal Component Analysis (20)

Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Data analysis
Data analysisData analysis
Data analysis
 
Knowledge And Patterns
Knowledge And PatternsKnowledge And Patterns
Knowledge And Patterns
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine Learning
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Data visualization
Data visualizationData visualization
Data visualization
 
Six sigma tools an overview
Six sigma tools  an overviewSix sigma tools  an overview
Six sigma tools an overview
 
Machine Learning for the System Administrator
Machine Learning for the System AdministratorMachine Learning for the System Administrator
Machine Learning for the System Administrator
 
fmelleHumanActivityRecognitionWithMobileSensors
fmelleHumanActivityRecognitionWithMobileSensorsfmelleHumanActivityRecognitionWithMobileSensors
fmelleHumanActivityRecognitionWithMobileSensors
 
Analyzing Performance Test Data
Analyzing Performance Test DataAnalyzing Performance Test Data
Analyzing Performance Test Data
 
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solutionDA ST-1 SET-B-Solution.pdf we also provide the many type of solution
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
 
Machine Learning Algorithm for Business Strategy.pdf
Machine Learning Algorithm for Business Strategy.pdfMachine Learning Algorithm for Business Strategy.pdf
Machine Learning Algorithm for Business Strategy.pdf
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
 
AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Software/Application Development Estimation
Software/Application Development EstimationSoftware/Application Development Estimation
Software/Application Development Estimation
 
Drupalcon la estimation john_nollin
Drupalcon la estimation john_nollinDrupalcon la estimation john_nollin
Drupalcon la estimation john_nollin
 
PG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisPG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data Analysis
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
 

Mehr von Ricardo Wendell Rodrigues da Silveira (6)

Data Lakes com Hadoop e Spark: Agile Analytics na prática
Data Lakes com Hadoop e Spark: Agile Analytics na práticaData Lakes com Hadoop e Spark: Agile Analytics na prática
Data Lakes com Hadoop e Spark: Agile Analytics na prática
 
Data Science e Python: entendendo e aplicando
Data Science e Python: entendendo e aplicandoData Science e Python: entendendo e aplicando
Data Science e Python: entendendo e aplicando
 
Você, apresentador
Você, apresentadorVocê, apresentador
Você, apresentador
 
Kintsugi: The beauty in imperfection
Kintsugi: The beauty in imperfectionKintsugi: The beauty in imperfection
Kintsugi: The beauty in imperfection
 
Apresentando Groovy e Grails
Apresentando Groovy e GrailsApresentando Groovy e Grails
Apresentando Groovy e Grails
 
Machine learning
Machine learningMachine learning
Machine learning
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Principal Component Analysis