2. OUTLINE
1. Introduction
2. Comparison with previous work
3. Algorithm
4. Experiment and Results
5. Conclusion
3. Feature Hierarchies for Object
Classification
Automatically extracting
informative feature
hierarchies for object
classification
Top-down manner
Entire hierarchy are
learned during a training
phase
4. Overview of Feature Hierarchy
Hierarchies are significantly more informative
compared with holistic features.
Selection of effective image features is crucial
Identify common object parts
Allows variations learned from training data
Input: A set of class & non-class images
Output: Hierarchical features with learned
parameters
5. Previous Work
Non-hierarchical
Feature hierarchies
Architecture of the hierarchy is pre-defined
Advantages of both method are combined in
this paper
6. Construction of Feature
Hierarchies
Algorithm
Initial informative fragments are selected
Selected fragments are used to extract the
sub-features
Optimize parameters of features hierarchy
Classification
7. Selecting informative image
fragment
Detection threshold, for each fragment is
selected to maximize MI(fi;C)
Identifies next fragment that delivers maximal
amount of additional information
8. Extracting sub-fragments
Constructing Positive and negative
examples
• Positive examples
are thus fragments in
class positive images
where the feature is
detected or almost
detected
• Negative examples
are fragments in
class negative
images where the
feature is detected or
9. Extracting sub-fragments
Parent fragment
Child fragment
If it increases
Keep decomposition
delivered
information
10. Extracting sub-fragments
Grand parent
fragment
Parent fragment
Child fragment
If it does NOT Atomic fragment
Stop decomposition
increases delivered
information
11. Optimizing ROI
Size of ROI
ROI too smallinformation
low
ROI too largeinformation
low
The size of ROI should be
chosen to maximize the
mutual information between
the fragment and the class
Top-down manner
12. Classification by hierarchy
The response of all sub-features
Final response
-1< Sp <1
At top level, compare Sp with 0
13. Classification by hierarchy
During training updating weights and positions
alternatively
Position step:
Fixed weights
Optimize positions
Weight step:
Fixed position
Optimize weights
14. Summary of algorithm
Hierarchical Feature
Construction
Positive
Images
S(f)
Negative
Images
Evaluate MI
15. Summary of algorithm
Hierarchical Feature Construction
Hierarchical Feature
Construction H
Positive
Images
S(f)
Evaluate MI
Negative
Images Atom
Optimize
ROI
20. Conclusions
Pros:
The extraction of image fragments is automatic
The hierarchies outperforms the holistic features
Feature hierarchies can be used to improve the
performance of classification schemes
Cons:
Optimization of features is not quite complete
Application process is not as computationally
efficient
Feature Hierarchies for Object Classification is a method which used to Automatically extracting informative feature hierarchies for object classificationa top-down manner:informative top-level fragments are extracted first, and bya repeated application of the same feature extractionprocess;the classification fragments are broken downsuccessively into their own optimal components. hierarchical decomposition terminates with atomicfeatures that cannot be usefully decomposed into simplerfeatures.entire hierarchy, the different features andsub-features, and their optimal parameters, are learnedduring a training phase using training examples.
hierarchies are significantly more informative and better for classificationExperimental evaluations show that the decompositionby our method increases the amount of informationdelivered by the fragments by a wide margin, improvesthe detection rate, and increases the tolerance for localdistortions and illumination changes.selection of effective image features is crucial for a successful classification scheme.first, they identify common object parts thatcharacterize the different objects within the class, andsecond, the parts are combined in a manner that allowsvariations learned from training data.Output: hierarchical features with learned parameters (combination weights, geometric relations)
The features used by thesemethods were non-hierarchical, that is, they were notbroken down into distinct simpler sub-parts, but detecteddirectly by comparing the fragment to the image. Theirsimilarity can be measured by different measures,including normalized cross-correlation, affine-invariantmeasures [6], and the SIFT measure [7].A number of classification schemes have also usedfeature hierarchies rather than holistic features. Suchschemes were often based on biological modeling,motivated by the structure of the primate visual system,which has been shown to use a hierarchy of features ofincreasing complexity, from simple local features in theprimary visual cortex, to complex shapes and object viewsin higher cortical areas.In a number of these models, [8,9], the architecture of the hierarchy (size, position andshape of features and their sub-features) is pre-definedrather than learned for different classification tasks.The study uses a network model in whichboth the combination weights and the convolutiontemplates were learned from examples by backpropagation,whereas the number of hierarchy levels andpositional tolerance were pre-defined.In the present work, we combine the advantages oflearning informative classification fragments, with thelearning of hierarchical structure with adaptiveparameters.In summary, classification features used inthe past were either highly informative but non-hierarchical,or hierarchical features which were lessinformative and not as useful.
These two steps are applied recursivelyuntil a level of ‘atomic fragments’ is reached
The process identifies fragments thatdeliver the maximal amount of information about theclass.The mutual information is a function of the detectionthreshold θi. If the threshold is too low, the informationdelivered by the fragment about the class will be low,because the fragment will be detected with high frequencyin both class and non-class images. A high threshold willalso yield low mutual information, since the fragment willbe seldom detected in both class and non-class images. Atsome intermediate value of threshold, the mutualinformation reaches a maximum. The detection thresholdfor each fragment is selected to maximize the informationMI(fi;C) between the fragment and the class.After finding thefragment with the highest mutual information score, thesearch identifies the next fragment that delivers themaximal amount of additional information with respect topreviously selected fragments. At iteration i the fragmentfi was selected to increase the mutual information of thefragment set by maximizing the minimal addition inmutual information withHere Ki is the set of candidate fragments, Si is the set ofselected fragments up to iteration i, fi is the fragment to beselected at iteration i. The min is taken over all previouslyselected fj, to avoid redundancy: if fk is similar to one ofthe selected fragments, this minimum will be small. Themax stage then finds the candidate in the pool with thelargest additional contribution. In empirical testing, thisalgorithm was shown to
For any feature, the aim is to repeat the above process using the new feature as the target, rather than the overall object. First, we need to prepare trainingexample sets. For the top level of features, example sets only include detected or non detected features. But with hierarch structure, features can be decomposed to many sub-features, which can detect more difficult examples.
After the positive and negative examples are set up, now we can extract sub-fragments using the same procedure as introduced by Eng. if set of sub-features increases information, they are added to the hierarchy. This process is easy to be explained using a family tree of fragment.
After the positive and negative examples are set up, now we can extract sub-fragments using the same procedure as introduced by Eng. if set of sub-features increases information, they are added to the hierarchy. This process is easy to be explained using a family tree of fragment. it is considered as atomic fragment, it cannot be decomposed again. Usually it contains edges, corners or lines.
The input is a set of Positive class image, which is the face and a set of Negative non-class images. First, we initialize H as a tree with a single node. We extract a set of first level fragments and add as children. Then, evaluate the mutual information between the trees and the class.For e ach leaf fragment f, we determine whether the set is positive or negative.And find the set of the most informative from there sub-fragments of f
We add these fragment as children and evaluate their mutual information again. If it does not increase (compared with the case when we do not use these fragments), we remove it and mark the leaf node as ‘atomic’ fragment. Otherwise, add these fragment in our tree.We repeat these steps until all of the fragments are marked as atomic. Also, we should optimize the ROI size as described before.
For classification stage, we compute the correlations of ALL the leaf nodes of H with the image and store these values, so that we get the response maps.
For every node of H whose children’s response maps have been computed, we should computed its own response map. How?In each position of the feature within the image, find maximal response of its children in their ROIs and combined their responses.We repeat these process from the bottom or children to the parents or top of the hierarchy.Using the response map of the top node, we can get the maximal response within its ROI and we compare it to 0. If it is higher, classify it as 1. Otherwise, set it as 0.
For each object classes, the most informative holistic feature was determined and for comparison, a hierarchy of sub feature was extracted. Using 150 top-level fragments in the hierarchies, the ROC is better than using the holistic features. As we can see, the ROC detection curves also improved when we use a full hierarchy for classification. But, the lowest ROC obtained for a decomposition using fixed spacing and sub-fragment size. Here, we conclude that optimizing the size and location of sub-fragments adds significantly to the MI.This curve shows the mean difference between the ROC curves of classifier based on a single holistic feature and its hierarchical decomposition.
Next, we compare the performances of classifier. First, to determine the number of fragments required for full classifier, the Equal Error Probabilities or EEP were calculated using 50 fragments. And the classifier performs best at 30-40 fragments.Then, the performances of full classifiers using 50 holistic features and 50 hierarchical features were compared. A higher ROC curve here shows the advantage of hierarchical features.
Including the selection of the sub-features as well as their combination weights and ROIsBoth in the amount of delivered information and recognition performanceAnd also to extend them to provide fuller description of the object: with their part and sub-part at different levels.Features are chosen for maximum mutual information, then have ROIs optimized: it's possible that some ultimately more optimal features are excludedResponse map must be calculated over entire image for every node in hierarchyUse of actual image fragments as features seems sub-optimal