SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
Summer School
“Achievements and Applications of Contemporary Informatics,
         Mathematics and Physics” (AACIMP 2011)
              August 8-20, 2011, Kiev, Ukraine




                       Classification

                                 Erik Kropat

                     University of the Bundeswehr Munich
                      Institute for Theoretical Computer Science,
                        Mathematics and Operations Research
                                Neubiberg, Germany
Examples

Clinical trials
In a clinical trial 20 laboratory values of 10.000 patients are collected together
with the diagnosis ( ill / not ill ).

   We measure the values of a new patient.
   Is he / she ill or not?

Credit ratings
An online shop collects data from its customers together with some information
about the credit rating ( good customer / bad customer ).

   We get the data of a new customer.
   Is he / she a good customer or not?
Machine-Learning /
Classification
                                 New Example




         Labeled     Machine
                                 Classification
         training     learning
                                      rule
        examples     algorithm



                                   Predicted
                                 classification
k Nearest Neighbor Classification
            ̶ kNN ̶
k Nearest Neighbor Classification
Idea: Classify a new object with regard to a set of training examples.
      Compare the new object with the k “nearest” objects                                      (“nearest neighbors”)



             ̶              ̶                           ̶   +                       ̶
                      ̶         ̶                                                       ̶   Objects in class 1
                                            ̶           +           +
                      + +               ̶       ̶
                                                            +
                                                                            ̶           + Objects in class 2
                     ̶ +            ̶               ̶       ̶
                        ̶          ̶                                            ̶           New object
                 ̶           ̶       +
                     ̶           ̶ +                            ̶
                           ̶         +
         ̶                     ̶ +                                      ̶

                         4-nearest neighbor
k Nearest Neighbor Classification

                   New object
                                                   • Required
               ̶         ̶                     ̶       −   Training set, i.e. objects and their class labels
     ̶
                       ̶          ̶ +
          ̶                    ̶ + +                   −   Distance measure
          +       +           ̶ ̶ +
         ̶ +                               ̶           −   The number k of nearest neighbors
                          ̶      ̶ ̶
              ̶             ̶            ̶
       ̶            ̶            +
            ̶            ̶ +         ̶             • Classification of a new object
                ̶                +
     ̶                ̶ +              ̶               −   Calculate the distance between the objects
                                                           of the training set.
         5-nearest neighbor                            −   Identify the k nearest neighbors.
                                                       −   Use the class label of the k nearest neighbors
                                                           to determine the class of the new object
                                                           (e.g. by majority vote).
k Nearest Neighbor Classification
                                    1-nearest neighbor                                                                           2-nearest neighbor                                                                              3-nearest neighbor


                                    ̶           ̶               ̶                                                                ̶           ̶                   ̶                                                               ̶           ̶           ̶
            ̶                                                           ̶           ̶               ̶                                                                    ̶       ̶                   ̶                                                           ̶       ̶
                            ̶                                                           ̶                               ̶                                                                ̶                               ̶                                                       ̶
            ̶                                                       +                                   ̶                                                            +                                   ̶                                                   +
                                                    ̶                                           ̶                                                ̶                                               ̶                                               ̶                                       ̶
                    ̶                                                                                           ̶                                                                                                ̶
                                                            +                               ̶                                                            +                                   ̶                                                       +                               ̶
                        ̶                                                                                           ̶                                                                                                ̶
                                ̶                                           ̶                                                ̶                                               ̶                                               ̶                                       ̶
                ̶                       ̶                           +                                       ̶                        ̶                               +                                       ̶                       ̶                       +
                                            ̶               +                   ̶                                                        ̶               +                           ̶                                                   ̶           +                       ̶



Classification
                                                        ̶                                                                                            ?                                                                                           +
                                                                                                                             Class label:
                                                                                                                            Decision by distance
                                                                                                                                                             ̶
1-nearest neighbor ⇒ Voronoi diagram
kNN ̶ k Nearest Neighbor Classification

Distance

• The distance between the new object and the objects in the set of training samples
  is usually measured by the Euclidean metric or the squared Euclidean metric.

• In text mining the Hamming-distance is often used.
kNN ̶ k Nearest Neighbor Classification

Class label of the new object

• The class label of the new object is determined by the list of the k nearest neighbors.

  This could be achieved by

    − Majority vote with regard to the class labels of the k nearest neighbors.
    − Distance of the k nearest neighbors.
kNN ̶ k Nearest Neighbor Classification

• The value of k has a strong influence on the classification result.

    − k too small: Noise can have a strong influence.
    − k too large:   Neighborhood can contain objects from different classes
                     (ambiguity / false classification)


                               +                ̶       ̶                ̶   +       ̶
                                        ̶           ̶       ̶        ̶           ̶ +
                                ̶                                ̶
                                            ̶    ̶          ̶
                                    ̶                         ̶ +                  ̶
                                ̶          ̶ +          +       ̶
                                                    + + ̶ ̶           ̶
                                   ̶   ̶                            ̶
                                             ̶            ̶
                                     ̶             ̶ ̶ ̶          ̶
                                  ̶ +    ̶                              ̶
                                               ̶       ̶ + ̶ ̶
Support Vector Machines
Support Vector Machines

A set of training samples with objects in Rn is divided in two categories:

              positive objects             and         negative objects
Support Vector Machines

Goal: “Learn” a decision rule from the training samples.
        Assign a new example into the “positive” or the “negative” category.
Idea:   Determine a separating hyperplane.




                                                    New objects are classified as
                                                    positive, if they are in the half space
                                                              of positive examples
                                                    negative, if they are in the half space
                                                              of negative examples.
Support Vector Machines

INPUT:     Sample of training data                                        Data from patients
                                                                          with confirmed
             T = { (x1, y1),...,(xk, yk) | xi ∈ Rn , yi ∈ { -1, +1 } },   diagnosis

           with     xi ∈ Rn                 data                          Laboratory values
           and      yi ∈ {-1, +1}           class label                   Disease: Yes / No




Decision rule:                                                            INPUT:
                                                                          Laboratory values
             f : Rn → {-1, +1}                                            of a new patient

                                                                          Decision:
                                                                          Disease: Yes / No
Separating Hyperplane

A separating hyperplane is determined by
 − a normal vector w       and
                                                                  H
 − a parameter b        scalar product                                  w

   H = { x ∈ Rn | 〈 w, x 〉 ̶ b = 0 }



Offset of the hyperplane from the origin along w:
     b
   ____
    ‖w ‖

 Idea: Choose w and b, such that the hyperplane separates the set of training samples
       in an optimal way.
What is a good separating hyperplane?

There exist many separating hyperplanes




                    Will this new object be in the “red” class?
Question:      What is the best separating hyperplane?
Answer:        Choose the separating hyperplane so that the distance from it
               to the nearest data point on each side is maximized.

                                          support vector
          maximum-margin
          hyperplane            H




                                                           margin



                                    support vector
Scaling of Hyperplanes

• A hyperplane can be defined in many ways:

     For c ≠ 0:   { x ∈ Rn | 〈 w, x 〉 + b = 0 } = { x ∈ Rn | 〈 cw, x 〉 + cb = 0 }


• Use trainings samples to choose (w, b), such that



                   Min    | 〈 w, xi 〉 + b | = 1
                     xi                                       canonical hyperplane
Definition
A training sample T = {(x1, y1),...,(xk, yk) | xi ∈ Rn , yi ∈ {-1, +1} } is separable
by the hyperplane

                   H = { x ∈ Rn | 〈 w, x 〉 + b = 0 },
                                                                      H

if there exists a vector w ∈ Rn
and a parameter b ∈ R, such that                                                   w
                                                                                              〈 w, x 〉 + b = 1
                   〈 w, xi 〉 + b ≥ +1 , falls yi = +1
                   〈 w, xi 〉 + b ≤ ̶ 1 , falls yi = ̶ 1

for all i ∈ {1,...,k}.
                                                                          〈 w, x 〉 + b = -1
Maximal Margin
                                                               H

• The above conditions can be rewritten:
                                                                            w
                                                                                       〈 w, x 〉 + b = 1
          yi · ( 〈 w, xi 〉 + b ) ≥ 1   for all i ∈ {1,...,k}


• Distance between the two margin hyperplanes:

                                                                   〈 w, x 〉 + b = -1
             2
            ____
            ‖w ‖


 ⇒ In order to maximize the margin we must minimize ‖ w ‖
Optimization problem
Find a normal vector w and a parameter b, such that the distance between
the training samples and the hyperplane defined by w and b is maximized.




 Minimize    __ ‖ w ‖ 2
             1                                                      H
             2

 s.t.        yi · ( 〈 w, xi 〉 + b ) ≥ 1   for all i ∈ {1,...,k}            w




        ⇒ quadratic programming problem
Dual Form

    Find parameters α1,...,αk, such that
               k                    k
                                                                Kernel function
    Max       Σ     αi     ̶
                               1 Σ αα y y
                                ̶         i j i j 〈 xi, xj 〉
              i=1              2 i, j = 1                       k ( xi, xj ) := 〈 xi, xj 〉

    with      αi ≥ 0 for all i = 1,...,k

                k
               Σ αi      yi = 0
              i=1



        The maximal margin hyperplane (= the classification problem)
⇒
        is only a function of the support vectors.
Dual Form

• When the optimal Parameters α*,...,α* are known, the normal vector w*
                               1      k

 of the separating hyperplane is given by

                  k
           w* =   Σ     α* yi xi
                         i                   training data
                  i=1


• The parameter b* is given by

                  1       max { 〈 w*, xi 〉 | yi = ̶ 1 }
           b* = _ _                                          + min { 〈 w*, xi 〉 | yi = +1 }
                  2
Classifier

• A decision function f maps a new object x ∈ Rn to a category f(x) ∈ {-1, +1} :



               +1 , if     〈 w*, x 〉 + b* ≥ +1
     f (x) =
                ̶ 1 , if   〈 w*, x 〉 + b* ≤ ̶ 1

                                                                H
                                                                               +1

                                                                         w




                                                                    -1
Support Vector Machines
     ̶ Soft Margins ̶
Soft Margin Support Vector Machines

• Until now: Hard margin SVMs
             The set of training samples can be separated by a hyperplane.

• Problem:   Some elements of the trainings samples can have a false label
             The set of training samples can not be separated by a hyperplane
             and SVM is not applicable.
Soft Margin Support Vector Machines

• Idea: Soft margin SVMs
  Modified maximum margin method for mislabeled examples.

• Choose a hyperplane that splits the training set as cleanly as possible,
  while still maximizing the distance to the nearest cleanly split examples.

• Introduce slack variables ξ1,…, ξ n which
  measure the degree of misclassification.
Soft Margin Support Vector Machines

• Interpretation
 The slack variables measure the degree of misclassification of the training examples
 with regard to a given hyperplane H.



                                H


                                    ξi
                                            ξj
Soft Margin Support Vector Machines

• Replace the constraints

             yi · ( 〈 w, xi 〉 + b ) ≥ 1              for all i ∈ {1,...,n}

 by

             yi · ( 〈 w, xi 〉 + b ) ≥ 1 ̶ ξ i        for all i ∈ {1,...,n}


                                   H




                                                ξi
Soft Margin Support Vector Machines
• Idea
 If the slack variables ξ i are small, then:

         ξi = 0   ⇔     xi is correctly classified                       H



   0 < ξi < 1     ⇔     xi is between the margins.
                                                                                   ξi

         ξi ≥ 1   ⇔     xi is misclassified
                        [ yi · ( 〈 w, xi 〉 + b ) < 0 ]




  Constraint:             yi · ( 〈 w, xi 〉 + b ) ≥ 1 ̶ ξ i for all i ∈ {1,...,n}
Soft Margin Support Vector Machines

• The sum of all slack variables is an upper bound for the total training error:


                       n
                      Σ ξi
                     i=1
                                                      H


                                                          ξi
                                                                  ξj
Soft Margin Support Vector Machines

Find a hyperplane with maximal margin and minimal training error.


                          regularisation

                             *                   n
                    __ ‖ w
                    1
                                 ‖               Σ ξi
                                     2
                                             C
                                         +
       Minimize
                    2
                                                 i=1
                                         *           *
       s.t.         yi · ( 〈 w, xi 〉 + b ) ≥ 1 ̶ ξ i     for all i ∈ {1,...,n }
                      2
                    ξi       ≥0                          for all i ∈ {1,...,kn
Support Vector Machines
 ̶ Nonlinear Classifiers ̶
Support Vector Machines ̶ Nonlinear Separation

Question: Is it possible to create nonlinear classifiers?
Support Vector Machines ̶ Nonlinear Separation

Idea:   Map data points into a higher dimensional feature space
        where a linear separation is possible.




                                       Ф




                 Rn                                               Rm
Nonlinear Transformation




                                 Ф




        original feature space       high dimensional feature space
                 Rn                               Rm
Kernel Functions

Assume:     For a given set X of training examples we know a function Ф,
            such that a linear separation in the high-dimensional space is possible.


Decision:   When we have solved the corresponding optimization problem,
            we only need to evaluate a scalar product
            to decide about the class label of a new data object.


                                    n
             f(xnew) =   sign   (   Σ α*i yi 〈 Ф (xi), Ф(xneu) 〉 + b*   )   ∈ {-1, +1}
                                    i=1
Kernel functions


Introduce a kernel function



                          K(xi, xj) = 〈 Ф (xi), Ф(xj) 〉

The kernel function defines a similarity measure between the objects xi and xj.


It is not necessary to know the function Ф or the dimension of H !!!
Kernel Trick

Example:   Transformation into a higher dimensional feature space
                                                                  ___
                                 Ф (x1,x2) = (                    √
                                                            2                       2
              Ф:R →R,
                     2       3
                                                           x1 ,       2 x1 x2, x2 )

Input:     An element of the training sample x,
                             ^
           a new object x
                                                    ___                            ___
                         ^                                                ^2
                                                                      ), ( x1 ,√ 2 ^ 1 x2, ^ 2 ) 〉
                                                                                   x ^ x
                                           2                      2                         2
           〈 Ф ( x ), Ф( x ) 〉   =   〈 ( x1 ,   √    2    x1 x2, x2
                                     x1 ^ 1 +   2 x1 ^ 1 x2 ^ 2 + x2 x2
                                                                     ^
                                      2 2                                      2
                                 =      x            x      x      2
                                                            2
                                 = ( x1 ^1 + x2 ^2)
                                        x       x
                                                          = K ( x ,^)
                                                2
                                 = 〈 x,^ 〉
                                       x                           x

           The scalar product in the higher dimensional space (here: R 3 )
           can be evaluated in the low dimensional original space (here: R 2 ).
Kernel Trick

It is not necessary to apply the nonlinear function Ф to transform
the set of training examples into a higher dimensional feature space.


Use a kernel function

                           K(xi, xj) = 〈 Ф (xi), Ф(xj) 〉

instead of the scalar product in the original optimization problem and the decision problem.
Kernel Functions


Linear kernel                    K(xi, xj) = 〈 xi, xj 〉
                                                                    2
                                                  ̶ ‖ xi ̶ xj ‖                2
Radial basis function kernel     K(xi, xj) = exp ___________               ; σ0 = mean ‖ xi ̶ xj ‖ 2
                                                           2
                                                      2 σ0
Polynomial kernel                K(xi, xj) = (s 〈 xi, xj 〉 + c) d

Sigmoid kernel                   K(xi, xj) = tanh (s 〈 xi, xj 〉 + c)

Convex combinations of kernels   K(xi, xj) = c1K1(xi, xj) + c2K2(xi, xj)

                                                       ,
                                                    K (xi, xj)
Normalization kernel             K(xi, xj) = ___________________
                                                 ,           ,
                                              √ K (xi, xi) K (xj, xj)
Summary

• Support vector machines can be used for binary classification.

• We can handle misclassified data if we introduce slack variables.

• If the sets to discriminate are not linearly separable we can use kernel functions.

• Applications → binary decisions

   −   Spam filter (spam / no spam)
   −   Face recognition ( access / no access)
   −   Credit rating ( good customer / bad costumer)
Literature
• N. Christianini, J.Shawe-Taylor
  An Introduction to Support Vector Machines and Other
  Kernel-based Learning Methods.
  Cambridge University Press, Cambridge, 2004.

• T. Hastie, R. Tibshirani, J. Friedman
  The Elements of Statistical Learning: Data Mining, Inference,
  and Prediction.
  Springer, New York, 2011.
Thank you very much!

Weitere ähnliche Inhalte

Andere mochten auch

Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive BayesJosh Patterson
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...Revolution Analytics
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesPier Luca Lanzi
 

Andere mochten auch (8)

Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
 

Mehr von SSA KPI

Germany presentation
Germany presentationGermany presentation
Germany presentationSSA KPI
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energySSA KPI
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainabilitySSA KPI
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentSSA KPI
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering educationSSA KPI
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginersSSA KPI
 
DAAD-10.11.2011
DAAD-10.11.2011DAAD-10.11.2011
DAAD-10.11.2011SSA KPI
 
Talking with money
Talking with moneyTalking with money
Talking with moneySSA KPI
 
'Green' startup investment
'Green' startup investment'Green' startup investment
'Green' startup investmentSSA KPI
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesFrom Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesSSA KPI
 
Dynamics of dice games
Dynamics of dice gamesDynamics of dice games
Dynamics of dice gamesSSA KPI
 
Energy Security Costs
Energy Security CostsEnergy Security Costs
Energy Security CostsSSA KPI
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsNaturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsSSA KPI
 
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5SSA KPI
 
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4SSA KPI
 
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3SSA KPI
 
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2SSA KPI
 
Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1SSA KPI
 
Fluorescent proteins in current biology
Fluorescent proteins in current biologyFluorescent proteins in current biology
Fluorescent proteins in current biologySSA KPI
 
Neurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsNeurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsSSA KPI
 

Mehr von SSA KPI (20)

Germany presentation
Germany presentationGermany presentation
Germany presentation
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energy
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainability
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable development
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering education
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginers
 
DAAD-10.11.2011
DAAD-10.11.2011DAAD-10.11.2011
DAAD-10.11.2011
 
Talking with money
Talking with moneyTalking with money
Talking with money
 
'Green' startup investment
'Green' startup investment'Green' startup investment
'Green' startup investment
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesFrom Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
 
Dynamics of dice games
Dynamics of dice gamesDynamics of dice games
Dynamics of dice games
 
Energy Security Costs
Energy Security CostsEnergy Security Costs
Energy Security Costs
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsNaturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
 
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5
 
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4
 
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3
 
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2
 
Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1
 
Fluorescent proteins in current biology
Fluorescent proteins in current biologyFluorescent proteins in current biology
Fluorescent proteins in current biology
 
Neurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsNeurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functions
 

Kürzlich hochgeladen

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxNikitaBankoti2
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 

Kürzlich hochgeladen (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 

Data Mining. Classification

  • 1. Summer School “Achievements and Applications of Contemporary Informatics, Mathematics and Physics” (AACIMP 2011) August 8-20, 2011, Kiev, Ukraine Classification Erik Kropat University of the Bundeswehr Munich Institute for Theoretical Computer Science, Mathematics and Operations Research Neubiberg, Germany
  • 2. Examples Clinical trials In a clinical trial 20 laboratory values of 10.000 patients are collected together with the diagnosis ( ill / not ill ). We measure the values of a new patient. Is he / she ill or not? Credit ratings An online shop collects data from its customers together with some information about the credit rating ( good customer / bad customer ). We get the data of a new customer. Is he / she a good customer or not?
  • 3. Machine-Learning / Classification New Example Labeled Machine Classification training learning rule examples algorithm Predicted classification
  • 4. k Nearest Neighbor Classification ̶ kNN ̶
  • 5. k Nearest Neighbor Classification Idea: Classify a new object with regard to a set of training examples. Compare the new object with the k “nearest” objects (“nearest neighbors”) ̶ ̶ ̶ + ̶ ̶ ̶ ̶ Objects in class 1 ̶ + + + + ̶ ̶ + ̶ + Objects in class 2 ̶ + ̶ ̶ ̶ ̶ ̶ ̶ New object ̶ ̶ + ̶ ̶ + ̶ ̶ + ̶ ̶ + ̶ 4-nearest neighbor
  • 6. k Nearest Neighbor Classification New object • Required ̶ ̶ ̶ − Training set, i.e. objects and their class labels ̶ ̶ ̶ + ̶ ̶ + + − Distance measure + + ̶ ̶ + ̶ + ̶ − The number k of nearest neighbors ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ + ̶ ̶ + ̶ • Classification of a new object ̶ + ̶ ̶ + ̶ − Calculate the distance between the objects of the training set. 5-nearest neighbor − Identify the k nearest neighbors. − Use the class label of the k nearest neighbors to determine the class of the new object (e.g. by majority vote).
  • 7. k Nearest Neighbor Classification 1-nearest neighbor 2-nearest neighbor 3-nearest neighbor ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ + ̶ + ̶ + ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ + ̶ + ̶ + ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ + ̶ ̶ + ̶ ̶ + ̶ + ̶ ̶ + ̶ ̶ + ̶ Classification ̶ ? + Class label: Decision by distance ̶
  • 8. 1-nearest neighbor ⇒ Voronoi diagram
  • 9. kNN ̶ k Nearest Neighbor Classification Distance • The distance between the new object and the objects in the set of training samples is usually measured by the Euclidean metric or the squared Euclidean metric. • In text mining the Hamming-distance is often used.
  • 10. kNN ̶ k Nearest Neighbor Classification Class label of the new object • The class label of the new object is determined by the list of the k nearest neighbors. This could be achieved by − Majority vote with regard to the class labels of the k nearest neighbors. − Distance of the k nearest neighbors.
  • 11. kNN ̶ k Nearest Neighbor Classification • The value of k has a strong influence on the classification result. − k too small: Noise can have a strong influence. − k too large: Neighborhood can contain objects from different classes (ambiguity / false classification) + ̶ ̶ ̶ + ̶ ̶ ̶ ̶ ̶ ̶ + ̶ ̶ ̶ ̶ ̶ ̶ ̶ + ̶ ̶ ̶ + + ̶ + + ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ ̶ + ̶ ̶ ̶ ̶ + ̶ ̶
  • 13. Support Vector Machines A set of training samples with objects in Rn is divided in two categories: positive objects and negative objects
  • 14. Support Vector Machines Goal: “Learn” a decision rule from the training samples. Assign a new example into the “positive” or the “negative” category. Idea: Determine a separating hyperplane. New objects are classified as positive, if they are in the half space of positive examples negative, if they are in the half space of negative examples.
  • 15. Support Vector Machines INPUT: Sample of training data Data from patients with confirmed T = { (x1, y1),...,(xk, yk) | xi ∈ Rn , yi ∈ { -1, +1 } }, diagnosis with xi ∈ Rn data Laboratory values and yi ∈ {-1, +1} class label Disease: Yes / No Decision rule: INPUT: Laboratory values f : Rn → {-1, +1} of a new patient Decision: Disease: Yes / No
  • 16. Separating Hyperplane A separating hyperplane is determined by − a normal vector w and H − a parameter b scalar product w H = { x ∈ Rn | 〈 w, x 〉 ̶ b = 0 } Offset of the hyperplane from the origin along w: b ____ ‖w ‖ Idea: Choose w and b, such that the hyperplane separates the set of training samples in an optimal way.
  • 17. What is a good separating hyperplane? There exist many separating hyperplanes Will this new object be in the “red” class?
  • 18. Question: What is the best separating hyperplane? Answer: Choose the separating hyperplane so that the distance from it to the nearest data point on each side is maximized. support vector maximum-margin hyperplane H margin support vector
  • 19. Scaling of Hyperplanes • A hyperplane can be defined in many ways: For c ≠ 0: { x ∈ Rn | 〈 w, x 〉 + b = 0 } = { x ∈ Rn | 〈 cw, x 〉 + cb = 0 } • Use trainings samples to choose (w, b), such that Min | 〈 w, xi 〉 + b | = 1 xi canonical hyperplane
  • 20. Definition A training sample T = {(x1, y1),...,(xk, yk) | xi ∈ Rn , yi ∈ {-1, +1} } is separable by the hyperplane H = { x ∈ Rn | 〈 w, x 〉 + b = 0 }, H if there exists a vector w ∈ Rn and a parameter b ∈ R, such that w 〈 w, x 〉 + b = 1 〈 w, xi 〉 + b ≥ +1 , falls yi = +1 〈 w, xi 〉 + b ≤ ̶ 1 , falls yi = ̶ 1 for all i ∈ {1,...,k}. 〈 w, x 〉 + b = -1
  • 21. Maximal Margin H • The above conditions can be rewritten: w 〈 w, x 〉 + b = 1 yi · ( 〈 w, xi 〉 + b ) ≥ 1 for all i ∈ {1,...,k} • Distance between the two margin hyperplanes: 〈 w, x 〉 + b = -1 2 ____ ‖w ‖ ⇒ In order to maximize the margin we must minimize ‖ w ‖
  • 22. Optimization problem Find a normal vector w and a parameter b, such that the distance between the training samples and the hyperplane defined by w and b is maximized. Minimize __ ‖ w ‖ 2 1 H 2 s.t. yi · ( 〈 w, xi 〉 + b ) ≥ 1 for all i ∈ {1,...,k} w ⇒ quadratic programming problem
  • 23. Dual Form Find parameters α1,...,αk, such that k k Kernel function Max Σ αi ̶ 1 Σ αα y y ̶ i j i j 〈 xi, xj 〉 i=1 2 i, j = 1 k ( xi, xj ) := 〈 xi, xj 〉 with αi ≥ 0 for all i = 1,...,k k Σ αi yi = 0 i=1 The maximal margin hyperplane (= the classification problem) ⇒ is only a function of the support vectors.
  • 24. Dual Form • When the optimal Parameters α*,...,α* are known, the normal vector w* 1 k of the separating hyperplane is given by k w* = Σ α* yi xi i training data i=1 • The parameter b* is given by 1 max { 〈 w*, xi 〉 | yi = ̶ 1 } b* = _ _ + min { 〈 w*, xi 〉 | yi = +1 } 2
  • 25. Classifier • A decision function f maps a new object x ∈ Rn to a category f(x) ∈ {-1, +1} : +1 , if 〈 w*, x 〉 + b* ≥ +1 f (x) = ̶ 1 , if 〈 w*, x 〉 + b* ≤ ̶ 1 H +1 w -1
  • 26. Support Vector Machines ̶ Soft Margins ̶
  • 27. Soft Margin Support Vector Machines • Until now: Hard margin SVMs The set of training samples can be separated by a hyperplane. • Problem: Some elements of the trainings samples can have a false label The set of training samples can not be separated by a hyperplane and SVM is not applicable.
  • 28. Soft Margin Support Vector Machines • Idea: Soft margin SVMs Modified maximum margin method for mislabeled examples. • Choose a hyperplane that splits the training set as cleanly as possible, while still maximizing the distance to the nearest cleanly split examples. • Introduce slack variables ξ1,…, ξ n which measure the degree of misclassification.
  • 29. Soft Margin Support Vector Machines • Interpretation The slack variables measure the degree of misclassification of the training examples with regard to a given hyperplane H. H ξi ξj
  • 30. Soft Margin Support Vector Machines • Replace the constraints yi · ( 〈 w, xi 〉 + b ) ≥ 1 for all i ∈ {1,...,n} by yi · ( 〈 w, xi 〉 + b ) ≥ 1 ̶ ξ i for all i ∈ {1,...,n} H ξi
  • 31. Soft Margin Support Vector Machines • Idea If the slack variables ξ i are small, then: ξi = 0 ⇔ xi is correctly classified H 0 < ξi < 1 ⇔ xi is between the margins. ξi ξi ≥ 1 ⇔ xi is misclassified [ yi · ( 〈 w, xi 〉 + b ) < 0 ] Constraint: yi · ( 〈 w, xi 〉 + b ) ≥ 1 ̶ ξ i for all i ∈ {1,...,n}
  • 32. Soft Margin Support Vector Machines • The sum of all slack variables is an upper bound for the total training error: n Σ ξi i=1 H ξi ξj
  • 33. Soft Margin Support Vector Machines Find a hyperplane with maximal margin and minimal training error. regularisation * n __ ‖ w 1 ‖ Σ ξi 2 C + Minimize 2 i=1 * * s.t. yi · ( 〈 w, xi 〉 + b ) ≥ 1 ̶ ξ i for all i ∈ {1,...,n } 2 ξi ≥0 for all i ∈ {1,...,kn
  • 34. Support Vector Machines ̶ Nonlinear Classifiers ̶
  • 35. Support Vector Machines ̶ Nonlinear Separation Question: Is it possible to create nonlinear classifiers?
  • 36. Support Vector Machines ̶ Nonlinear Separation Idea: Map data points into a higher dimensional feature space where a linear separation is possible. Ф Rn Rm
  • 37. Nonlinear Transformation Ф original feature space high dimensional feature space Rn Rm
  • 38. Kernel Functions Assume: For a given set X of training examples we know a function Ф, such that a linear separation in the high-dimensional space is possible. Decision: When we have solved the corresponding optimization problem, we only need to evaluate a scalar product to decide about the class label of a new data object. n f(xnew) = sign ( Σ α*i yi 〈 Ф (xi), Ф(xneu) 〉 + b* ) ∈ {-1, +1} i=1
  • 39. Kernel functions Introduce a kernel function K(xi, xj) = 〈 Ф (xi), Ф(xj) 〉 The kernel function defines a similarity measure between the objects xi and xj. It is not necessary to know the function Ф or the dimension of H !!!
  • 40. Kernel Trick Example: Transformation into a higher dimensional feature space ___ Ф (x1,x2) = ( √ 2 2 Ф:R →R, 2 3 x1 , 2 x1 x2, x2 ) Input: An element of the training sample x, ^ a new object x ___ ___ ^ ^2 ), ( x1 ,√ 2 ^ 1 x2, ^ 2 ) 〉 x ^ x 2 2 2 〈 Ф ( x ), Ф( x ) 〉 = 〈 ( x1 , √ 2 x1 x2, x2 x1 ^ 1 + 2 x1 ^ 1 x2 ^ 2 + x2 x2 ^ 2 2 2 = x x x 2 2 = ( x1 ^1 + x2 ^2) x x = K ( x ,^) 2 = 〈 x,^ 〉 x x The scalar product in the higher dimensional space (here: R 3 ) can be evaluated in the low dimensional original space (here: R 2 ).
  • 41. Kernel Trick It is not necessary to apply the nonlinear function Ф to transform the set of training examples into a higher dimensional feature space. Use a kernel function K(xi, xj) = 〈 Ф (xi), Ф(xj) 〉 instead of the scalar product in the original optimization problem and the decision problem.
  • 42. Kernel Functions Linear kernel K(xi, xj) = 〈 xi, xj 〉 2 ̶ ‖ xi ̶ xj ‖ 2 Radial basis function kernel K(xi, xj) = exp ___________ ; σ0 = mean ‖ xi ̶ xj ‖ 2 2 2 σ0 Polynomial kernel K(xi, xj) = (s 〈 xi, xj 〉 + c) d Sigmoid kernel K(xi, xj) = tanh (s 〈 xi, xj 〉 + c) Convex combinations of kernels K(xi, xj) = c1K1(xi, xj) + c2K2(xi, xj) , K (xi, xj) Normalization kernel K(xi, xj) = ___________________ , , √ K (xi, xi) K (xj, xj)
  • 43. Summary • Support vector machines can be used for binary classification. • We can handle misclassified data if we introduce slack variables. • If the sets to discriminate are not linearly separable we can use kernel functions. • Applications → binary decisions − Spam filter (spam / no spam) − Face recognition ( access / no access) − Credit rating ( good customer / bad costumer)
  • 44. Literature • N. Christianini, J.Shawe-Taylor An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge, 2004. • T. Hastie, R. Tibshirani, J. Friedman The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2011.
  • 45. Thank you very much!