This PowerPoint helps students to consider the concept of infinity.
Pricipal Component Analysis Using R
1.
2. R is a language and environment for statistical
computing and graphics
R provides a wide variety of statistical and
graphical techniques, including linear and
nonlinear modeling, classical statistical tests,
time-series analysis, classification, clustering, and
others.
R can be considered as a different implementation
of S.
It compiles and runs on a wide variety of platforms
such as UNIX,Windows and Mac OS.
3. An effective data handling and storage facility
A suite of operators for calculations on arrays
and matrices
A large, coherent, integrated collection of tools
for data analysis
Graphical facilities for data analysis and
display either on-screen or on hardcopy
A well-developed, simple and effective
programming language which includes
conditionals, loops, user-defined recursive
functions and input and output facilities.
4. R provides a comprehensive set of statistical
analysis techniques
• Classical statistical tests
• Linear and nonlinear modeling
• Time-series analysis
• Classification & cluster analysis
• Spatial statistics
• Basically any statistical technique you can think of is
part of a contributed package to R
5. Why Principal Component Analysis used?
Data Dimension Reduction Technique.
Principal Component Analysis (PCA) is a powerful tool
during the Analysis, when the data have ‘n’ variables. PCA
finds the combination of each and every variable without
losing the original data.
PCA are formed some as linear combinations of the data
which is used to preserve the information
Principal Component Analysis - the extraction of hidden
predictive information from large database organizations,
can identify valuable customers, predict future behaviors,
and enable firms to make proactive, knowledge-driven
decisions.
6. There are four students application
Graduate Admission Office wants to select two graduate students
Who should be selected ?
STUDENT GPA GRE
PROFESSOR
RATING
1. 3.2 1270 38
2. 3.9 1600 42
3. 2.9 1500 22
4. 3.0 1400 32
7. There are five steps by PCA using R-STATISTICS to select two
best graduate students from rest of the other in the given table.
Implementing data in R-statistics.
Calculate the correlation matrix.
Calculate the eigenvectors and eigen values of the correlation
matrix
Choose the number of principal components to be retained
Derive the new data set.
9. >data= cor(Student)
> stud
Gpa Gre Prof.rat
Gpa 1.0000000 0.531991767 0.824316301
Gre 0.5319918 1.000000000 0.009509527
Prof.rat 0.8243163 -0.009509527 1.000000000
It is used to find the linear relationship between two random
variables
13. Student 2 and 3 will be selected if first component (pc1) is used for
calculating the score.
STUDENT GPA GRE
PROFESSOR
RATING SCORE
1. 3.2 1270 38 507.6873
2. 3.9 1600 42 636.0216
3. 2.9 1500 22 585.4074
4. 3.0 1400 32 553.4034
14. PCA is limited to re-expressing the data
as a linear combination of its basis
vectors.
• PCA is a non-parametric method –
independent of user and can’t be
configured for specific inputs.
• Principal components are orthogonal.
• Mean and variance are sufficient