This document proposes two active learning methods, SVM-CC and SVM-CCMS, for hyperspectral image classification that focus on identifying and sampling from critical classes. The methods use a shifting hyperplane model to identify critical class pairs with high probability of being difficult to classify. SVM-CC randomly samples from the critical class set, while SVM-CCMS samples points closest to the decision margin within critical classes. Experimental results on two hyperspectral datasets show the proposed methods outperform random sampling and concentrate samples on support vectors, particularly improving performance for hard classes.
How to Remove Document Management Hurdles with X-Docs?
WEI_slides_Igarss11_July10-11.mc.final.pptx
1. Critical Class Oriented Active Learning for Hyperspectral Image Classification Wei Di and Melba Crawford School of Civil Engineering, Purdue University and Laboratory for Applications of Remote Sensing Email: {wdi@purdue.edu1, mcrawford2}@purdue.edu July 28, 2011 IEEE International Geoscience and Remote Sensing Symposium
7. Focus on a specific task or requirement Target H
8. Active Learning Active Learning (AL) - Iterative learning circle Passive Learning Supervised Classifier Query Strategy DL Pool New xL DU Pool Output Classifier Training xU f(xu) Uncertainty & Critical Class
9.
10. Classification: Tuia et al. [2009], Patraand Bruzzone [2011] Demiret al. [2011], Di and Crawford [2011], .
16. Key Idea: Shifting Hyperplane Pair-wise Class A and B Changing Hyperplane Hyperplane w Hyperplane Margin Margin Support Vectors Class A Class B New Samples
39. Shifting Hyperplane – Provides valuable information for identifying difficult classes.
40. Critical Class Oriented Margin Sampling– Focuses on difficult classes, as well as informative samples; improves performance in multi-class problem.
41. Support Vectors - Concentrate on samples likely to be support vectors.
The added earth logo is from the website: http://rst.gsfc.nasa.gov/Sect19/Sect19_2a.html
Background – Introduce the motivation and concepts related to Active Learning, Guided learning. Proposed Method :Two proposed method: SVM-CC, SVM-CCMSThey are guided active learning, which aims to focus sampling difficult classes to improve the performance in multi-class problem. SVM-CC is simply critical class oriented class querySVM-CC-MS further incorporate the margin sampling idea into the critical class based category sampling. It uses the margin sampling as the uncertainty criteria, thus, combines the guided learning with informative sampling. Experiments are on two data set: KSC and BOT
Supervised classifier depends on the quality of the training data So, raise the question : what is the appropriate sampling strategy that can most explore the information in the data, and construct the best training data set for a given problem. - Simple random sampling which often results in a uniformly approximation of the given data proportion is unlikely to be the best strategy, especially in many real applications with complex data sets. The constructed training set may consist of a lot of redundancy and not be suitable for the chosen classifier. This has been evidenced a lot in the active learning field. - A desired good strategy should:Achieve better performance with less expenses. Economically/ realizably allocate resources for labelingAccomplish for a specific task or customer purpose. One example is “active learning”
Active learning is an iterative learning circle. An additional component “Query strategy” is added into the supervised learning process, which brings feedback information from the performance of the classifier on the data, then guides the sampling strategy to select the most useful training data for labeling. Active learning helps to build a smaller training data set with higher training utility, which reduces the expenses and time on labeling the data. The Query Strategy:Beyond the normal strategies (e.g. uncertainty criterion), it can also be designed to guide the learning for a specific task or purpose, or concept level query.Here we focus on sampling difficult classes in a multi-class problem. The basic assumption is that, classes in the multi-class problem often vary in terms of their difficulty level to obtain good classification result. Thus, by focusing on these classes, we could potentially adjust the proportion of the training data to favor those hard classes to improve the performance. Backup: Motivation for Active Learning Problems in passive learning: Labeling – expensive & time consumingLimited labeled data vs. abundant unlabeled dataManually selection Dtrain- subjective & redundancyGoal of AL: Smaller training set with higher training utility
Recently, there has been increasing interest in this Active learning topic. In many multi-class problems, class complexity is highly skewed. Certain classes have more complex distribution that is hard to be well represented. Associated with this individual property, significant differences also exist for discriminating any class pair. Certain class pair may have verycomplex boundaries that require more training data to achieve a good modeling. Those are often the classes that can dramatically damage the overall classification results, and typically are of most interest. Thus, rather than guiding for which instance to label, guidance for querying additional training samples for those “critical classes” may help to tighten the worst classification boundary and yield a better overall performance [9]. Our proposed method is critical class oriented learning:It uses the shifting hyperplane to identify the “hard classes” in multi-class classification problem. Class query combined with margin sampling based uncertainty query. (Using Naïve random sampling and margin based uncertainty sampling to query candidate samples.)Combine “Guided Learning” & “Active Learning”Shifting hyperplane by pair-wise SVMIdentify “trouble classes”Concept level class based queryGuided Learning Reference:1. R. Lomasky, C. E. Brodley, M. Aernecke, and S. Bencic, “Guiding class selection for an artificial nose,” in Proc. NIPS, 2006. 2. R. Lomasky, C.E. Brodley, M. Aernecke, D. Walt, and M. Friedl, “Active class Selection,” in Proc. ECML, 2007.3. J. Attenberg and F. Provost, “Why label when you can search? alternatives to active learning for applying human resources to build classification models under extreme class imbalance,” in Proc. KDD 2010.Most recent paper:[1] S. Patra and L. Bruzzone, “A fast cluster-assumption based active learning technique for classification of remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, 2011. In press.[2] B. Demir, C. Persello, and L. Bruzzone, “Batch mode active learning methods for the interactive classification of remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol.49, no.3, pp.1014-1031, 2011.[3] J. Li, J. Bioucas-Dias and A. Plaza, “Semi-Supervised Hyperspectral Image Segmentation Using Multinomial Logistic Regression with Active Learning”, IEEE Transactions on Geoscience and Remote Sensing, vol. 48, no. 11, pp. 4085-4098, 2011.
1.We use one-versus-one SVM classifier to learn binary classifiers for Nc classes (e.g. 45 for 10)s2.The hyperplanew of each class pair can be expressed as the weighted sum of the support vectors. It also represents the inverse of the margin which naturally provides the information about the separability between the class pair. 3. In a sequence learning scenario, if the decision boundary between two classes is well established, fewer additional support vectors will be required, leading to less change of the hyperplane. A bigger change of w indicates insufficiency of the corresponding pair-wise hyperplane, or greater complexity of this class pair. 4. To query samples from those classes may help to concentrate on the most critical classes in the multi-class problem, and tighten the lowest classification accuracy, leading to the improvement of the overall classification accuracy. 5. Our goal is to use the cumulative changes in hyperplane to estimate the difficulty level to rank each pair-wise class in this multi-class classification problem.
This slide shows the critical class identification process:Scaled changes in hyperplane (w) for each query step is computed 2. To integrate the learning sequence information, accumulated changes are obtain for each pair-wise classes. 3. Define an order statistics to rank all class pairs:phi only has limited values over the discrete grid of (critical level), which stands for the ranking information of the changes of hyperplane for all class pairs. A larger value indicates that class pair k is more critical. 4. The probability of a pair-class k has difficulty level CL is estimated based on its frequencies that occur at level CL.
Higher Probability at CL, indicating class k is likely have this level (CL) of difficulty in this multi-class classification problem (likely to be ranked as CL in multi-class)A class-pair with higher probability at top difficulty level (e.g. CL=45 for 10 classes) are identified as critical class pairThe critical class set (CCs) is then obtained by the union of classes in each selected class pair . Based on this Critical class set, query is conducted in two ways:SVM-CC Randomly select next query sample from samples with the estimated labels that belong to the critical class set. Querying from this contention pool, we may either learn from the samples that truly belong to those CC, or learn from mistaken samples that are incorrectly classified into these critical classes. b. SVM-CCMS, - SVM-CC only conducts concept level query strategy, which cannot guarantee the learner to focus on the most informative samples - Thus, borrow the idea in SVMMS and propose SVM-CCMS, whereby samples in are further ranked by the distance towards the hyperplane in the kernel space. -For each sample, the minimum distance towards all the hyperplanes is used to represent its uncertainty. Samples with smallest distance, which indicates greater uncertainty, are selected for next query.
Figure – KSCTable – KSC & BOTHard classes are tagged with *.KSC : NASA Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) 176 remaining bands18m spatial resolution the Kennedy Space Center (KSC), 1996 224 bands of 10nm width from 400-2500nm. BOT: NASA EO-1 satellite over the Okavango Delta, 145 remaining bands 30m spatial resolution 242 bands of 10nm width covering 400-2500nm. Initial Data Set KSC : 3 samples x 10 Classes = 30 BOT: 6 Samples x 9 classes = 54Runs: KSC : 870 run BOT : 400 run
Upper-left figure:1.show the examples of the AMI values for the SVM-CC approach for KSC at the 10th and 30th query, with each class pair corresponding to one bar. 2. It can be seen that higher AMI values correspond to pairs: 18{C3,C4}, 25-28 {C4 vs. C5,C6,C7,C8}, which is consistent with our previous study that those classes are the most difficult classes in this data set. 3. Low values refer to class pairs such as 9, 15-17, and 22-24, which all relate to the easiest classes 8-10 in this data. Comparing Fig.(a) and Fig.(b), AMI changes at the different learning stages, and the values that relate to the hard classes increase as the learning progresses.Bottom figure:Shows the AMI value as learning process for all class pairs for KSC and BOT respectively.
Table: 1.Compares the per-class classification accuracy improvement for DT relative to RS at the 600th and 400th query step for KSC and BOT, respectively.2. The proposed methods clearly lead to better results especially for those classes as compared to SVMMS and RS.3. SVM-CCMS performed better than SVM-CC since it further incorporates the uncertainty measurement, thus is able to target on the most informative samples. 4. Some classes got worse since fewer samples were acquired for them, but not significant. 5. Note that for all AL methods, water class (C10 in KSC, C1 in BOT) gains zero improvement, since it is the easiest class to be discriminated from the others.Figure:Shows an example of the learning curves of KSC by SVM-CCMS. Although we are more interested in per-class performance, improvements are still achieved in terms of the overall evaluation.
Table:1. Per-class Sampling Ratio of KSC and BOT data, compare with 4 different methods, the proposed method more concentrate on the hard classes2. The lowest sampling ratio is 23% for KSC and 0% for BOT indicating that sampling complexity is quite low for this class and much fewer samples are needed to achieve a good modeling for classifying this class. By querying fewer samples for this class, a lot of redundancy can be eliminated without scarifying the performance and also saves the space in training set for focusing on sampling other hard classes.Figure: Figure (right-middle) - SVs Ratio (KSC) plots the ratio of the total SVs to the size of the training data as the learning process. Figure at the bottom shows the no. of support vectors (SVs) of each class as learning process. Our methods clearly yield more SVs for hard classes, and the overall ratio is high. Since the SVM decision function depends only on SVs, higher ratio indicates the higher utility of the constructed training data set by the proposed methods [16]Reference:J. Wang, P. Neskovic, and L. N Cooper, “Training data selection for support vector machines,” in Proc. ICNC, 2005, pp.554-564.
Thanks very much.
Fig.1 Examples of ELsmI as learning for all class pairs (a)KSC, (e)BOT; examples of the estimated probability for each class pair at different critical levels (highest at the bottom) of the last query: (b)KSC, (f)BOT; examples of ELsmI by SVM-CC for KSC at the 10th query (c) and 30th query (d).Fig.1(a)(e) show the examples of the accumulated hyperplane change represented by ELsmI as learning process by SVM-CC for KSC and BOT data, respectively. It can be seen that higher ELsmI values correspond to class pairs: 18{C3,C4}, 25-28 {C4 vs. C5,C6,C7,C8} for KSC, and 18{C3,C6} for BOT, which is consistent with our previous study that those classes are poor class pairs in each data set (also see Fig. 2). Fig.1 (c)(d) are the ELsmI by SVM-CC for KSC at the 10th and 30th query respectively with each class pair corresponding to one bar. Several obvious valleys refer to class pairs such as 9, 15-17, 22-24, which all relate to the easiest classes 8-10 in this data. Fig.1 (b)(f) shows the examples of the estimated probability in Eq.5 for each class pair (x-axis) at different critical levels (y-axis) of the last query for KSC and BOT data, respectively. Higher value (brighter color) indicates higher possibility of a pair of classes at a certain critical level. Class pair 18 of KSC again has the highest probability at the highest critical level (45th, bottom), followed by class pair 26 and 25 with high possibility at the level 44 and 43, respectively. As the critical level goes down, more class pairs show comparable probabilities. For BOT data, class pair 18 beats all the others at the highest level.