Statistical Pattern recognition(1)

1
Statistical Pattern
Recognition
A Review
Presented by : SYED ATIF CHISHTI

The Review Paper is divided in to 9 section:
 Introduction.
 Statistical Pattern Recognition.
 The Curse of Dimensionality
 Dimensionality Reduction.
 Classifiers.
 Classifier Combination.
 Error Estimation.
 Unsupervised Classification.
 Frontiers of Pattern Recognition.
2

3
Introduction
Topics covered:
Pattern Recognition & Example.
Template Plating.
Statistical Approach
Syntactic Approach
Neural Networks.

4
Objective
 To summarize and compare well known methods
Goal
 Goal of PR is to supervised or unsupervised Classification.
Pattern
 As opposite of a Coas
 It is an Entity
 vaguely defined
 Example: Finger Print image, Human Face, Speech signal,
hand written cursive

 Selection of Training and Test samples.
 Definition of pattern classes
 Sensing environment
 Pattern representation
 Feature extraction and selection
 Cluster analysis
 Classifier design
5

 A template having 2 D shape or prototype of
pattern is matched against the stored
template.
 Determines the similarity between 2 entities
 Correlation.
Disadvantage
 Patterns are distorted.
8

 Each pattern is represented in D features in d
dimensional space as a point.
 Objective to establish decision boundaries in
the feature space which separate pattern of
different classes.
 Discriminate analysis based approach for
classification
 Using mean squared error criteria
 Construct the decision boundaries of the
specified form
9

 Simplest/Elementary sub patterns are called
primitives
 Complex pattern are represented as the
interrelation of these primitives
 A formal analogy is drawn between structure
of Patterns and syntax language in which
pattern viewed as sentences and primitives
viewed as alphabet of language.
Challenges
 Segmentation of noisy patterns.
10

 Massively parallel computing system
Consists of an extremely large number of
simple processors with many interconnection.
 Ability to learn complex non linear
input/output relationship.
 Feed forward network, Self-Organizing
map(SOM).
11

 Pattern is represented by set of d
features/attributes viewed as D-dimensional
feature space.
 System is operating in two modes i.e Training
and classification.
13

Decision Making Process
 Pattern assign to one of the C categories/Class
W1,W2,...,Wc based on a vector of d features values
x=(x1,x2,...,xd)
 Class conditional Probability = P(x|wi)
 Conditional Risk = R(wi|x)=∑L(wi,wj).P(wj|X)
where L(wi,wj) is loss in curred in deciding wi when true class
is wj.
 Posterior Probability = P(Wj|X)
 For 0/1 loss function = L(wi,wj)={0,i=j
{1,i≠j
 Assign input pattern x to class wi if
P(Wi|X)› P(Wj|X) for all j≠i
14

15
 If all of the class conditional densities is known then Bayes
decision rule can be used to design a classifier.
 If the form of class conditional densities is known
(multivariate gaussian) but parameter like an mean vectors
and covariance matrix) not known then we have a
parametric decision problem. Replace the unknown
paramters with estimated value.
 If form of class conditional density not known that we are
in non parametric mode. In such cases we used Parzen
window (estimate the density function) or directly construct
boundry by using KNN rule.
 Optimizing the classifier to maximize its performance on
training data will NOT give such result on test data.
Statistical Pattern Recognition
(cont..)

 The number of features is too large relative to the number
of training samples.
Performance of classifier depend on
◦ The sample size,
◦ number of features and
◦ classifier complexity.
Curse of dimensionality
◦ Naive table-lookup technique requires the number of
training data points to be exponential function of feature
dimension.
 Small number of feature can reduce the curse of
dimensionality when Training sample is limited.
17

 If number of training sample is small relative to the number
of feature then it degrade the performance of classifier
Trunk Example
 Two class classification with equal Prior probabilites,
multivariate Gaussian and identity covariance matrix.
 The mean vector have following component
18

19
Case 1: Mean vector m is known:
 Use bayes decision rule with 0/1 loss
function to construct decision boundry.
Case 2 : Mean vector m is unknown:
Pe(n,d)=1/2
Cases

20
Result
 We can’t increase the number of features
when parameters of class conditional
densities estimated from a finite number of
samples.

 Dimensionality of pattern or number of features should be small due to
 Measurement cost and classification accuracy.
 Can reduce the curse of dimensionality when training sample is
limited.
Disadvantage :
 Reduction in number of features lead to a loss in the discrimination power
and lower the accuracy of Rs
Feature Selection :
 Feature selection refers to algorithm which select the best subset of the
input feature set.
Feature extraction
 Feature extraction algorithm are methods which create new feature after
transformation of original feature set.
21

 Chernoff represent each pattern as cartoon face with
nose length, face curvature & eye size as features.
 Setosa looks quite different from others two.
 Two dimensional Plot : PCA and Fisher mapping
22

 Designer have access to multiple classifier.
 A single training set which is collected at different time and
environment uses different feature .
 Each classifier has its own region in feature space
 Some classifier show different result with different
initialization
Schemes to Combine multiple Classifier
 Parallel: All individual classifier invoked independently
 Cascading: Individual classifiers invoked in linear sequence.
 Tree like: Individual classifiers are combined into structure
similar to decision tree classifier.
26

 Stacking
 Bagging
 Boosting
Combiner
 Trainability
 Adaptive
Expectation output
 Confidence
 Rank
 Abstract
27

 Classification error or error rate Pe is the ultimate
measure of the performance of classifier.
 Error probability.
 For consistent training rule the value of Pe
approaches to bayes error for increasing sample
size.
 A simple analytical expression for Pe is
impossible to write even in multivariate Gaussian
densities.
 Maximum Likelihood estimate Pe˄ of Pe is =T/N
30

 The Objective is to construct decision
boundaries based on unlabeled training data.
 Clustering algorithm based on two technique
◦ Iterative square error clustering.
◦ Agglomerative hierarchical clustering.
34

 A given set of n patterns in d dimension has
partitioned in to k clusters. Mean vector
defined as :
 The square error for cluster Ck is the sum of
squared Euclidean distances between each
pattern in Ck and cluster centre m.
36

Statistical Pattern recognition(1)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie Statistical Pattern recognition(1)

Ähnlich wie Statistical Pattern recognition(1) (20)

Mehr von Syed Atif Naseem

Mehr von Syed Atif Naseem (20)

Statistical Pattern recognition(1)