Feature selection (FS) is a classical combinatorial problem in pattern recognition and data mining. It finds major importance in classification and regression scenarios. In this paper, a hybrid approach that combines branch-and-bound (BB) search with Bhattacharya distance based feature selection is presented for classifying hyperspectral data using Support Vector Machine (SVM) classifiers. The performance of this hybrid approach is compared to another hybrid approach that uses genetic algorithm (GA) based feature selection in place of BB. It is also compared to baseline SVMs with no feature reduction. Experimental results using hyperspectral data show that under small sample size situations, BB approach performs better than GA and SVM with no feature selection.
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Branch and Bound Feature Selection for Hyperspectral Image Classification
1. BRANCH AND BOUND BASED FEATURE
ELIMINATION FOR SUPPORT VECTOR
MACHINE BASED CLASSIFICATION OF
HYPERSPECTRAL IMAGES
Sathishkumar Samiappan
Saurabh Prasad
Lori M. Bruce
&
Eric Hansen
Mississippi State University
2. INTRODUCTION
• Hyperspectral Images (HSI) are widely used for ground cover
classification problems.
• The problem is very challenging because,
1) High dimensional feature space
2) High correlation between successive features
• In last decade, Support Vector Machines(SVM) shown to perform well for
the problem
• Traditional View about SVMs:
They can handle higher dimensionality, Hence Feature Selection (FS) is
not required
• Recently Waske et al showed that FS can improve the classification
performance of SVMs
3. MOTIVATION
FS based on Metrics such as
Bhattacharya Distance,
Mutual Information
& Correlation etc..
FS based on Search Approach
Feature Selection Algorithms
What is good or Bad about this?
• Easy computation
• Often Sub - Optimal
Exhaustive Search Search based on
Intelligence
Can both of them get married?
A HYBRID approach?
4. SELECTION OF ALGORITHMS
• Rank Based Approach : Feature selection based on Bhattacharya
distance(BD) and correlation
• Features are ranked according to descending order of their BDs
and Correlation
• Select the first m features
• BDs are generally used in selecting features for hyperspectral
image classification
Bhattacharya Distance
µi & µj are the means of two classes
∑i & ∑j be their covariance
5. SELECTION OF ALGORITHMS
• Search Approach : Branch and Bound (B&B) Search and Genetic
Algorithms(GA)
• Branch and Bound is a modification of simple tree search with
back tracking.
• With a good estimate of upper bound, B&B almost converges to
the solution of exhaustive search
• Genetic Algorithm is a very popular optimization procedure
inspired from human evolution.
• GA is basically a random search but guaranteed to converge to the
optimal solution
7. OBJECTIVE
• To remove a subset of features such that the remaining features
achieves the best performance during SVM classification
• Bhattacharya Distance and Correlation:
To rank the features based on their usefulness in discriminating
between classes
• Branch and Bound or Genetic Algorithms:
To select a subset of lower ranked features to remove from
feature set.
• To create a elimination strategy in different combinations among
the lower ranked features.
8. BRANCH AND BOUND
• B&B is a general algorithmic strategy used to solve optimization
problems
• It divides the problem to be solved as number of sub-problems
• Instead of solving all the sub problems, B&B tries to find one viable
solution and notes its value as Upper Bound
• All following calculations are terminated as soon as its cost reaches
upper bound
• If a better solution is found then Upper Bound will be updated
• In this way many sub problems can be left unsolved safely
9. EXAMPLE : BRANCH AND BOUND
Step 1: Rank the features in
descending order
of its importance
and select the first
+1m
B1
Total Number of Features q = 6
Features to be Removed m = 4
Features Selected = (q-m) = 2
Initial Upper Bound B = Bo
m
Step 2: Compute the
cost B1 at node 1
If B1<B
Goto Step 1
else
Back track
& select B2
B1
The entire subtree can be
discarded
10. EXAMPLE : BRANCH AND BOUND
S1
S2
S3
S4
Maximum Depth of the tree = m
So if depth reaches m,
Compute new Bound for the path
If new bound is better
Update bound B
else
backtrack
12. The parameters used for GA
Fitness Function - Multiclass Spider SVM Implementation
using RBF kernal with sigma = 0.5
Number of generations = 20
Length of chromosome = 50
Population size = 30
Crossover Probability = 0.6
Mutation Probability = 0.003
GENETIC ALGORITHMS
13. RESULTS
• Dataset used – AVIRIS Indian
Pines with 220 features.
• It’s a 7 class data with 200
training samples from each
class.
• The classes are corn no-till,
corn min-till, grass pasture,
hay windrowed, soybeans no-
till, soybeans clean and
woods.
14. • Pros
- Compromise between rank based FS and
exhaustive search
- Computationally efficient when compared to
GA and other search techniques
- Potential to provide significant increase in
performance of SVMs
- Robust with small sample sizes(few training
samples)
• Cons
- potential of overtraining
DISCUSSION