SlideShare a Scribd company logo
1 of 26
A Comparative Study between
ICA and PCA

Md. Sahidul Islam
Roll No. 08054718
Department of Statistics
University of Rajshahi
ripon.ru.statistics@gmail.com
1
Overview
 Motivation of the study
 Objective
 Definition of ICA
 FastICA algorithm
 Results of the study
 Latent structure
 Cluster analysis
 Outlier detection

 Conclusions
Department of Statistics, University of Rajshahi-6205

2
Motivation of the study
o In multivariate statistics Latent structure detection, cluster
analysis, and outlier detection using PCA is a promising old
technique.

o In many cases ICA perform better than PCA.

o Our motivation in this thesis is to perform latent structure,
cluster analysis and outlier detection using ICA and compare it
with that of PCA

Department of Statistics, University of Rajshahi-6205

3
Objectives
o Study algorithms of ICA
o Applying ICA for Latent structure detection, cluster analysis
and outlier detection.
o Comparing its performance with that of PCA

Department of Statistics, University of Rajshahi-6205

4
Independent Component Analysis
The simple “Cocktail Party” Problem
Mixing matrix A

x1

s1

a 12 s 2

x2

a 11

a 11 s1
a 21 s1

a 22 s 2

x1
a 21

Sources

s2

a 12

a11

a12

s1

x2

Observations

x1

a 21

a 22

s2

x2

ICA

y= WTx

a 22

x=As

PCA

Department of Statistics, University of Rajshahi-6205

5
Non-gaussianity is independent
Central limit theorem
The distribution of a sum of independent random variables tends
toward a Gaussian distribution

Observed signal

toward Gaussian

= a1

S1

Non-Gaussian

+ a2

S2

….+ an

Non-Gaussian

Department of Statistics, University of Rajshahi-6205

Sn

Non-Gaussian
6
Non-guassianity is Independent
Nongaussianity estimates independent
 Estimation of y = wT x =wTAs = zTs
 let z = AT w, so y = wTAs = zTs
 y is a linear combination of si, therefore zTs is more gaussian
than any of si
 zTs becomes least gaussian when it is equal to one of the si
 wTx = zTs equals an independent component

Maximizing nongaussianity of wTx gives us one of the
independent components

Department of Statistics, University of Rajshahi-6205

7
FastICA algorithm
Iteration procedure for maximizing
nongaussianity
Step1: choose an initial weight vector w
Step2: Let w+=E[xg(wTx)]-E[g’(wTx)]w
(g: a non-quadratic function)
Step3: Let w=w+/||w+||
Step4: if not converged, go back to
Step2

Department of Statistics, University of Rajshahi-6205

8
Results and Discussions

Latent structure detection

Department of Statistics, University of Rajshahi-6205

9
Simulated dataset -1

Figure: Matrix plot of original source of 10
uniform distribution.
Department of Statistics, University of Rajshahi-6205

10
Simulated dataset -1

Figure: (a) Matrix plot of 10 principal components. (b) Matrix plot of source variables.
Department of Statistics, University of Rajshahi-6205

11
Simulated dataset -1

Figure: (a) Matrix plot of 10 independent components. (b) Matrix plot of source variables
Department of Statistics, University of Rajshahi-6205

12
Simulated dataset-2

Simulated dataset-2 consists of
5 variables comes from Laplace
(super-gaussian), uniform
(sub-gaussian), binomial,
multinomial and normal
distribution each have 10000
observation.

Figure: Matrix plot of original source of 5 variables each
comes form different distribution.

Department of Statistics, University of Rajshahi-6205

13
Simulated dataset-2

Figure: (Left)Matrix plot of principle components. (Right) Original source of 5 variables
each comes form different distribution.

Department of Statistics, University of Rajshahi-6205

14
Simulated dataset-2

Figure: (Left)Matrix plot of independent components. (Right) Original source of 5
variables each comes form different distribution.

Department of Statistics, University of Rajshahi-6205

15
Cluster Analysis

Department of Statistics, University of Rajshahi-6205

16
Australian Crabs dataset
The first experiment of real data set for clustering is Australian crabs data set where
there are 200 rows and 8 columns describing the 5 morphological measurements
(Frontal lob size, Rear width, Carapace length, Carapace width, Body depth). There
are two species in the data set each have both sexes (male, female) of the genus
Leptograpsus. There are 50 specimens of each sex of each species, collected on site
at Fremantle, Western Australia. (N. A. Campbell et al., 1974).

Department of Statistics, University of Rajshahi-6205

17
Fisher Iris dataset
The second example of real data set is world famous Fishers Iris data set
where the data report four characteristics (sepal width, sepal length, petal
width and petal length) of three species (setosa, versicolor, virginica) of Iris
flower.

Department of Statistics, University of Rajshahi-6205

18
Outlier detection

Department of Statistics, University of Rajshahi-6205

19
Scottish hill racing dataset

The data gives the record wining times for 35 hill races in Scotland (Atkinson,
1986). The purpose of that study was to investigate the relationship of record
time 35 hill races.

Department of Statistics, University of Rajshahi-6205

20
Epilepsy dataset
Thal and Vail reported data from clinical trial of 59 patients with
epilepsy, 31 of whom were randomized to receive the anti-epilepsy
drug Progabide and 28 receive placebo

Department of Statistics, University of Rajshahi-6205

21
Stackloss data
This data consists of 21 days of operation for a plant for the
oxidation of ammonia as a stage in the production of nitric acid. The
response is called stack loss which is percent of uncovered
ammonia that escapes from the planet. There are three explanatory
and one response variable in the dataset.

Department of Statistics, University of Rajshahi-6205

22
Education expenditure dataset
These data are used by Chatterjee, Hadi, and Price as an example
of heteroscedasticity. The data gives the education expenditures of
U.S. states as projected in 1975.

Department of Statistics, University of Rajshahi-6205

23
Conclusions
If the subject domain supports the assumption of
independent non-gaussian source variables, we
recommended of using ICA in place of PCA for latent
structure detection, clustering and outlier detection.

Department of Statistics, University of Rajshahi-6205

24
Future Research
The following are the areas in which we want to study
o Use Kernel technique of ICA for shape study, clustering and outlier
detection.
o Separation of Nonlinear mixture.
o Data mining (sometimes called data or knowledge discovery) is the
most recent technique in multivariate analysis to extract information
from a data set and transform it into an understandable structure for
further use. Text data mining or Medical data mining using ICA wolud
be future research.

Department of Statistics, University of Rajshahi-6205

25
Thank you

Department of Statistics, University of Rajshahi-6205

26

More Related Content

Similar to A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

Scalable Simple Random Sampling Algorithms
Scalable Simple Random Sampling AlgorithmsScalable Simple Random Sampling Algorithms
Scalable Simple Random Sampling AlgorithmsXiangrui Meng
 
Joint and Marginal PDF.pptx
Joint and Marginal PDF.pptxJoint and Marginal PDF.pptx
Joint and Marginal PDF.pptxSadhikaArora2
 
Delivery Feet Data using K Mean Clustering with Applied SPSS
Delivery Feet Data using K Mean Clustering with Applied SPSSDelivery Feet Data using K Mean Clustering with Applied SPSS
Delivery Feet Data using K Mean Clustering with Applied SPSSijtsrd
 
PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...
PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...
PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...Aboul Ella Hassanien
 
UHDMML.pps
UHDMML.ppsUHDMML.pps
UHDMML.ppsbutest
 
Application of Exponential Gamma Distribution in Modeling Queuing Data
Application of Exponential Gamma Distribution in Modeling Queuing DataApplication of Exponential Gamma Distribution in Modeling Queuing Data
Application of Exponential Gamma Distribution in Modeling Queuing Dataijtsrd
 
An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396Nida Rashid
 
An Algorithm Analysis on Data Mining
An Algorithm Analysis on Data MiningAn Algorithm Analysis on Data Mining
An Algorithm Analysis on Data Miningpaperpublications3
 
Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...Bulbul Agrawal
 
Data Imputation by Soft Computing
Data Imputation by Soft ComputingData Imputation by Soft Computing
Data Imputation by Soft Computingijtsrd
 
Approaches for the Integration of Visual and Computational Analysis of Biomed...
Approaches for the Integration of Visual and Computational Analysis of Biomed...Approaches for the Integration of Visual and Computational Analysis of Biomed...
Approaches for the Integration of Visual and Computational Analysis of Biomed...Nils Gehlenborg
 
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
IRJET -  	  Prediction of Autistic Spectrum Disorder based on Behavioural Fea...IRJET -  	  Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...IRJET Journal
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetIJERA Editor
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
 
PPT slides
PPT slidesPPT slides
PPT slidesbutest
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithmsFarhan Zaki
 

Similar to A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis) (20)

Scalable Simple Random Sampling Algorithms
Scalable Simple Random Sampling AlgorithmsScalable Simple Random Sampling Algorithms
Scalable Simple Random Sampling Algorithms
 
Joint and Marginal PDF.pptx
Joint and Marginal PDF.pptxJoint and Marginal PDF.pptx
Joint and Marginal PDF.pptx
 
Delivery Feet Data using K Mean Clustering with Applied SPSS
Delivery Feet Data using K Mean Clustering with Applied SPSSDelivery Feet Data using K Mean Clustering with Applied SPSS
Delivery Feet Data using K Mean Clustering with Applied SPSS
 
PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...
PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...
PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...
 
UHDMML.pps
UHDMML.ppsUHDMML.pps
UHDMML.pps
 
Ijetr021251
Ijetr021251Ijetr021251
Ijetr021251
 
Application of Exponential Gamma Distribution in Modeling Queuing Data
Application of Exponential Gamma Distribution in Modeling Queuing DataApplication of Exponential Gamma Distribution in Modeling Queuing Data
Application of Exponential Gamma Distribution in Modeling Queuing Data
 
An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396
 
An Algorithm Analysis on Data Mining
An Algorithm Analysis on Data MiningAn Algorithm Analysis on Data Mining
An Algorithm Analysis on Data Mining
 
Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...
 
Data Imputation by Soft Computing
Data Imputation by Soft ComputingData Imputation by Soft Computing
Data Imputation by Soft Computing
 
Approaches for the Integration of Visual and Computational Analysis of Biomed...
Approaches for the Integration of Visual and Computational Analysis of Biomed...Approaches for the Integration of Visual and Computational Analysis of Biomed...
Approaches for the Integration of Visual and Computational Analysis of Biomed...
 
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
IRJET -  	  Prediction of Autistic Spectrum Disorder based on Behavioural Fea...IRJET -  	  Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
 
Clusterix at VDS 2016
Clusterix at VDS 2016Clusterix at VDS 2016
Clusterix at VDS 2016
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal Clusters
 
PPT slides
PPT slidesPPT slides
PPT slides
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
 

Recently uploaded

Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 

Recently uploaded (20)

Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 

A Comparative Study between ICA (Independent Component Analysis) and PCA (Principal Component Analysis)

  • 1. A Comparative Study between ICA and PCA Md. Sahidul Islam Roll No. 08054718 Department of Statistics University of Rajshahi ripon.ru.statistics@gmail.com 1
  • 2. Overview  Motivation of the study  Objective  Definition of ICA  FastICA algorithm  Results of the study  Latent structure  Cluster analysis  Outlier detection  Conclusions Department of Statistics, University of Rajshahi-6205 2
  • 3. Motivation of the study o In multivariate statistics Latent structure detection, cluster analysis, and outlier detection using PCA is a promising old technique. o In many cases ICA perform better than PCA. o Our motivation in this thesis is to perform latent structure, cluster analysis and outlier detection using ICA and compare it with that of PCA Department of Statistics, University of Rajshahi-6205 3
  • 4. Objectives o Study algorithms of ICA o Applying ICA for Latent structure detection, cluster analysis and outlier detection. o Comparing its performance with that of PCA Department of Statistics, University of Rajshahi-6205 4
  • 5. Independent Component Analysis The simple “Cocktail Party” Problem Mixing matrix A x1 s1 a 12 s 2 x2 a 11 a 11 s1 a 21 s1 a 22 s 2 x1 a 21 Sources s2 a 12 a11 a12 s1 x2 Observations x1 a 21 a 22 s2 x2 ICA y= WTx a 22 x=As PCA Department of Statistics, University of Rajshahi-6205 5
  • 6. Non-gaussianity is independent Central limit theorem The distribution of a sum of independent random variables tends toward a Gaussian distribution Observed signal toward Gaussian = a1 S1 Non-Gaussian + a2 S2 ….+ an Non-Gaussian Department of Statistics, University of Rajshahi-6205 Sn Non-Gaussian 6
  • 7. Non-guassianity is Independent Nongaussianity estimates independent  Estimation of y = wT x =wTAs = zTs  let z = AT w, so y = wTAs = zTs  y is a linear combination of si, therefore zTs is more gaussian than any of si  zTs becomes least gaussian when it is equal to one of the si  wTx = zTs equals an independent component Maximizing nongaussianity of wTx gives us one of the independent components Department of Statistics, University of Rajshahi-6205 7
  • 8. FastICA algorithm Iteration procedure for maximizing nongaussianity Step1: choose an initial weight vector w Step2: Let w+=E[xg(wTx)]-E[g’(wTx)]w (g: a non-quadratic function) Step3: Let w=w+/||w+|| Step4: if not converged, go back to Step2 Department of Statistics, University of Rajshahi-6205 8
  • 9. Results and Discussions Latent structure detection Department of Statistics, University of Rajshahi-6205 9
  • 10. Simulated dataset -1 Figure: Matrix plot of original source of 10 uniform distribution. Department of Statistics, University of Rajshahi-6205 10
  • 11. Simulated dataset -1 Figure: (a) Matrix plot of 10 principal components. (b) Matrix plot of source variables. Department of Statistics, University of Rajshahi-6205 11
  • 12. Simulated dataset -1 Figure: (a) Matrix plot of 10 independent components. (b) Matrix plot of source variables Department of Statistics, University of Rajshahi-6205 12
  • 13. Simulated dataset-2 Simulated dataset-2 consists of 5 variables comes from Laplace (super-gaussian), uniform (sub-gaussian), binomial, multinomial and normal distribution each have 10000 observation. Figure: Matrix plot of original source of 5 variables each comes form different distribution. Department of Statistics, University of Rajshahi-6205 13
  • 14. Simulated dataset-2 Figure: (Left)Matrix plot of principle components. (Right) Original source of 5 variables each comes form different distribution. Department of Statistics, University of Rajshahi-6205 14
  • 15. Simulated dataset-2 Figure: (Left)Matrix plot of independent components. (Right) Original source of 5 variables each comes form different distribution. Department of Statistics, University of Rajshahi-6205 15
  • 16. Cluster Analysis Department of Statistics, University of Rajshahi-6205 16
  • 17. Australian Crabs dataset The first experiment of real data set for clustering is Australian crabs data set where there are 200 rows and 8 columns describing the 5 morphological measurements (Frontal lob size, Rear width, Carapace length, Carapace width, Body depth). There are two species in the data set each have both sexes (male, female) of the genus Leptograpsus. There are 50 specimens of each sex of each species, collected on site at Fremantle, Western Australia. (N. A. Campbell et al., 1974). Department of Statistics, University of Rajshahi-6205 17
  • 18. Fisher Iris dataset The second example of real data set is world famous Fishers Iris data set where the data report four characteristics (sepal width, sepal length, petal width and petal length) of three species (setosa, versicolor, virginica) of Iris flower. Department of Statistics, University of Rajshahi-6205 18
  • 19. Outlier detection Department of Statistics, University of Rajshahi-6205 19
  • 20. Scottish hill racing dataset The data gives the record wining times for 35 hill races in Scotland (Atkinson, 1986). The purpose of that study was to investigate the relationship of record time 35 hill races. Department of Statistics, University of Rajshahi-6205 20
  • 21. Epilepsy dataset Thal and Vail reported data from clinical trial of 59 patients with epilepsy, 31 of whom were randomized to receive the anti-epilepsy drug Progabide and 28 receive placebo Department of Statistics, University of Rajshahi-6205 21
  • 22. Stackloss data This data consists of 21 days of operation for a plant for the oxidation of ammonia as a stage in the production of nitric acid. The response is called stack loss which is percent of uncovered ammonia that escapes from the planet. There are three explanatory and one response variable in the dataset. Department of Statistics, University of Rajshahi-6205 22
  • 23. Education expenditure dataset These data are used by Chatterjee, Hadi, and Price as an example of heteroscedasticity. The data gives the education expenditures of U.S. states as projected in 1975. Department of Statistics, University of Rajshahi-6205 23
  • 24. Conclusions If the subject domain supports the assumption of independent non-gaussian source variables, we recommended of using ICA in place of PCA for latent structure detection, clustering and outlier detection. Department of Statistics, University of Rajshahi-6205 24
  • 25. Future Research The following are the areas in which we want to study o Use Kernel technique of ICA for shape study, clustering and outlier detection. o Separation of Nonlinear mixture. o Data mining (sometimes called data or knowledge discovery) is the most recent technique in multivariate analysis to extract information from a data set and transform it into an understandable structure for further use. Text data mining or Medical data mining using ICA wolud be future research. Department of Statistics, University of Rajshahi-6205 25
  • 26. Thank you Department of Statistics, University of Rajshahi-6205 26