SlideShare a Scribd company logo
1 of 34
Introduction to Statistical
                                                                                   Machine Learning

                                                                                       c 2011
                                                                                 Christfried Webers
                                                                                       NICTA
                                                                               The Australian National
                                                                                     University

Introduction to Statistical Machine Learning

                         Christfried Webers

                      Statistical Machine Learning Group
                                     NICTA
                                      and
                College of Engineering and Computer Science
                      The Australian National University


                            Canberra
                       February – June 2011



 (Many figures from C. M. Bishop, "Pattern Recognition and Machine Learning")



                                                                                                  1of 112
Introduction to Statistical
                     Machine Learning

                         c 2011
                   Christfried Webers
                         NICTA
                 The Australian National
                       University




   Part III
                 Basic Concepts

                 Linear Transformations


Linear Algebra   Trace

                 Inner Product

                 Projection

                 Rank, Determinant, Trace

                 Matrix Inverse

                 Eigenvectors

                 Singular Value
                 Decomposition

                 Directional Derivative,
                 Gradient

                 Books




                                   80of 112
Introduction to Statistical

Intuition                                                                     Machine Learning

                                                                                  c 2011
                                                                            Christfried Webers
                                                                                  NICTA
                                                                          The Australian National
                                                                                University



 Geometry
     Points and Lines
     Vector addition and scaling                                          Basic Concepts

     Humans have experience with 3 dimensions (less with 1, 2             Linear Transformations

     though)                                                              Trace

                                                                          Inner Product
 Generalisation to N dimensions (possibly N → ∞)                          Projection

     Line → vector space V                                                Rank, Determinant, Trace

                                                                          Matrix Inverse
     Point → vector x ∈ V
                                                                          Eigenvectors
     Example : X ∈ Rn×m                                                   Singular Value
                                                                          Decomposition
                           n×m                                n·m
     Space of matrices R         and the space of vectors R         are   Directional Derivative,
     isomorphic                                                           Gradient

                                                                          Books




                                                                                            81of 112
Introduction to Statistical

Vector Space V, underlying Field F                                       Machine Learning

                                                                             c 2011
                                                                       Christfried Webers
                                                                             NICTA
                                                                     The Australian National
Given vectors u, v, w ∈ V and scalars α, β ∈ F, the following              University

holds:
    Associativity of addition : u + (v + w) = (u + v) + w .
    Commutativity of addition : v + w = w + v .
    Identity element of addition : ∃ 0 such that                     Basic Concepts

    v + 0 = v, ∀ v ∈ V.                                              Linear Transformations

                                                                     Trace
    Inverse of addition : For all v ∈ V, ∃w ∈ V, such that           Inner Product
    v + w = 0. The additive inverse is denoted −v.                   Projection

    Distributivity of scalar multiplication with respect to vector   Rank, Determinant, Trace

    addition : α(v + w) = αv + αw .                                  Matrix Inverse

                                                                     Eigenvectors
    Distributivity of scalar multiplication with respect to field
                                                                     Singular Value
    addition : (α + β)v = αv + βv .                                  Decomposition

                                                                     Directional Derivative,
    Compatibility of scalar multiplication with field                 Gradient

    multiplication : α(βv) = (αβ)v .                                 Books


    Identity element of scalar multiplication : 1v = v, where 1 is
    the identity in F .
                                                                                       82of 112
Introduction to Statistical

Matrix-Vector Multiplication                                                                 Machine Learning

                                                                                                 c 2011
                                                                                           Christfried Webers
                                                                                                 NICTA
                                                                                         The Australian National
                                                                                               University
    §                                                                                ¤
1        f o r i i n xrange (m) :
            R[ i ] = 0 . 0 ;
             f o r j i n xrange ( n ) :
                  R[ i ] = R[ i ] + A[ i , j ] ∗ V[ j ]
    ¦                                                                                ¥   Basic Concepts
            Listing 1: Code for elementwise matrix multiplication.                       Linear Transformations

                                                                                         Trace

                                                                                         Inner Product

                                                                                         Projection

                                                                                         Rank, Determinant, Trace

                       A                V       =              R                         Matrix Inverse

                                                                                   Eigenvectors
          a11    a12    . . . a1n       v1      a11 v1 + a12 v2 + · · · + a1n vn         Singular Value
         a21
                a22    . . . a2n   v2   a21 v1 + a22 v2 + · · · + a2n vn 
                                      =                                      
                                                                                         Decomposition

        . . .                                                                           Directional Derivative,
                 ...    . . . . . .  . . .                ...                       Gradient

         am1     am2    . . . amn       vn     am1 v1 + am2 v2 + · · · + amn vn          Books




                                                                                                           83of 112
Introduction to Statistical

Matrix-Vector Multiplication                                                                Machine Learning

                                                                                                c 2011
                                                                                          Christfried Webers
                                                                                                NICTA
                                                                                        The Australian National
                                                                                              University




                                                                                        Basic Concepts
                       A              V       =               R                         Linear Transformations
                                                                          
          a11    a12   . . . a1n       v1      a11 v1 + a12 v2 + · · · + a1n vn         Trace

         a21
                a22   . . . a2n   v2   a21 v1 + a22 v2 + · · · + a2n vn 
                                     =                                      
                                                                                        Inner Product

        . . .   ...   . . . . . .  . . .                ...                       Projection

         am1     am2   . . . amn       vn     am1 v1 + am2 v2 + · · · + amn vn          Rank, Determinant, Trace

                                                                                        Matrix Inverse


    §                                                                               ¤   Eigenvectors

                                                                                        Singular Value
1       R = A[: ,0];                                                                    Decomposition
        f o r j i n xrange ( 1 , n ) :
                                                                                        Directional Derivative,
               R += A [ : , j ] ∗ V [ j ] ;
    ¦                                                                               ¥
                                                                                        Gradient

                                                                                        Books
             Listing 2: Code for columnwise matrix multiplication.


                                                                                                          84of 112
Introduction to Statistical

Matrix-Vector Multiplication                                                Machine Learning

                                                                                c 2011
                                                                          Christfried Webers
                                                                                NICTA
                                                                        The Australian National
                                                                              University




    Denote the n columns of A by a1 , a2 , . . . , an                   Basic Concepts

                                                                        Linear Transformations
    Each ai is now a (column) vector
                                                                        Trace

                                                                        Inner Product

                                                                        Projection

                A                 V       =            R                Rank, Determinant, Trace
                                                                      Matrix Inverse
                             v1                                  
                                                                        Eigenvectors
                                v2 
    a1    a2       ...   an    = a1 v1 + a2 v2 + · · · + an vn 
                               . . .
                                                                        Singular Value
                                                                        Decomposition

                                 vn                                     Directional Derivative,
                                                                        Gradient

                                                                        Books




                                                                                          85of 112
Introduction to Statistical

Transpose of R                                                        Machine Learning

                                                                          c 2011
                                                                    Christfried Webers
                                                                          NICTA
                                                                  The Australian National
                                                                        University




    Given R = AV,
                                                                Basic Concepts

                                                                  Linear Transformations
                   R = a1 v1 + a2 v2 + · · · + an vn 
                                                                  Trace

                                                                  Inner Product

                                                                  Projection
              T
    What is R ?                                                   Rank, Determinant, Trace

                                                                  Matrix Inverse

                  RT = v1 aT + v2 aT + · · · + vn aT
                           1       2               n
                                                                  Eigenvectors

                                                                  Singular Value
    NOT equal to AT V T ! (In fact, AT V T is not even defined.)   Decomposition

                                                                  Directional Derivative,
                                                                  Gradient

                                                                  Books




                                                                                    86of 112
Introduction to Statistical

Transpose                                                               Machine Learning

                                                                            c 2011
                                                                      Christfried Webers
                                                                            NICTA
                                                                    The Australian National
                                                                          University




   Reverse order rule : (AV)T = V T AT
                                                                    Basic Concepts

            VT             AT       =               RT              Linear Transformations

                                                                    Trace
                               
                         aT
                           1                                       Inner Product

                                                                  Projection
                         aT
                           2
                                
                                 = v1 aT + v2 aT + · · · + vn aT
     v1   v2 . . . vn 
                                      1       2               n
                                                                    Rank, Determinant, Trace

                                                                    Matrix Inverse
                      
                         ...   
                                                                   Eigenvectors

                          aT
                           n                                        Singular Value
                                                                    Decomposition

                                                                    Directional Derivative,
                                                                    Gradient

                                                                    Books




                                                                                      87of 112
Introduction to Statistical

Trace                                                               Machine Learning

                                                                        c 2011
                                                                  Christfried Webers
                                                                        NICTA
                                                                The Australian National
                                                                      University




   The trace of a square matrix A is the sum of all diagonal
   elements of A.
                                       n
                                                                Basic Concepts
                           tr {A} =         Akk                 Linear Transformations
                                      k=1                       Trace

   Example                                                      Inner Product

                                                                Projection
                      
             1    2   3                                         Rank, Determinant, Trace

        A = 4    5   6              tr {A} = 1 + 5 + 9 = 15   Matrix Inverse

                                                                Eigenvectors
             7    8   9
                                                                Singular Value
                                                                Decomposition
   The trace does not exist for a non-square matrix.            Directional Derivative,
                                                                Gradient

                                                                Books




                                                                                  88of 112
Introduction to Statistical

vec(x) operator                                                      Machine Learning

                                                                         c 2011
                                                                   Christfried Webers
                                                                         NICTA
                                                                 The Australian National
    Define vec(X) as the vector which results from stacking all         University

    columns of a matrix A on top of each other.
    Example
                                                 
                                                 1
                                                                 Basic Concepts
                                                4
                                                               Linear Transformations
                                                7
                                                             Trace
                   1 2     3                    2
                                                               Inner Product
              A = 4 5     6          vec(A) = 5
                                                               Projection
                   7 8     9                    8
                                                               Rank, Determinant, Trace
                                                3              Matrix Inverse
                                                 
                                                6              Eigenvectors

                                                 9               Singular Value
                                                                 Decomposition

                                                                 Directional Derivative,
    Given two matrices X, Y ∈ Rn×p , the trace of the product    Gradient

    XT Y can be written as                                       Books




                    tr XT Y = vec(X)T vec(Y)
                                                                                   89of 112
Introduction to Statistical

Inner Product                                                      Machine Learning

                                                                       c 2011
                                                                 Christfried Webers
                                                                       NICTA
                                                               The Australian National
                                                                     University




    Mapping from two vectors x, y ∈ V to a field of scalars F
    ( F = R or F = C ):
                                                               Basic Concepts

                          ·, · : V × V → F                     Linear Transformations

                                                               Trace

    Conjugate Symmetry :                                       Inner Product

     x, y = y, x                                               Projection

                                                               Rank, Determinant, Trace
    Linearity :                                                Matrix Inverse
     ax, y = a x, y , and x + y, z = x, z + y, z               Eigenvectors

    Positive-definitness :                                      Singular Value
                                                               Decomposition
     x, x ≥ 0, and x, x = 0 for x = 0 only.                    Directional Derivative,
                                                               Gradient

                                                               Books




                                                                                 90of 112
Introduction to Statistical

Canonical Inner Products                                                             Machine Learning

                                                                                         c 2011
                                                                                   Christfried Webers
                                                                                         NICTA
    Inner product for real numbers x, y ∈ R:                                     The Australian National
                                                                                       University



                                    x, y = x y


    Dot product between two vectors:                                             Basic Concepts
    Given x, y ∈ Rn                                                              Linear Transformations

                                                                                 Trace
                             n
                   T                                                             Inner Product
         x, y = x y ≡             xk yk = x1 y1 + x2 y2 + · · · + xn yn
                                                                                 Projection
                            k=1
                                                                                 Rank, Determinant, Trace

    Canonical inner product for matrices :                                       Matrix Inverse

    Given X, Y ∈ Rn×p                                                            Eigenvectors

                                                                                 Singular Value
                                      p                 p   n                    Decomposition

        X, Y = tr XT Y =                  (XT Y)kk =             (XT )kl (Y)lk   Directional Derivative,
                                                                                 Gradient
                                   k=1                 k=1 l=1                   Books
                   p   n
              =             Xlk Ylk
                  k=1 l=1
                                                                                                   91of 112
Introduction to Statistical

Calculations with Matrices                                                             Machine Learning

                                                                                           c 2011
                                                                                     Christfried Webers
                th                                                                         NICTA
    Denote (i, j) element of matrix X by Xij                                       The Australian National
                                                                                         University
    Transpose : (XT )ij = Xji
                           ...
    Product : (XY)ij = k=1 Xik Ykj
    Proof of the linearity of the canonical matrix inner product :
                                               p
                                      T
     X + Y, Z = tr (X + Y) Z =                      ((X + Y)T Z)kk                 Basic Concepts

                                              k=1                                  Linear Transformations
                      p     n                           p   n                      Trace

                =               ((X + Y)T )kl Zlk =              (Xlk + Ylk )Zlk   Inner Product

                     k=1 l=1                           k=1 l=1                     Projection

                      p   n                                                        Rank, Determinant, Trace

                =               Xlk Zlk + Ylk Zlk                                  Matrix Inverse

                                                                                   Eigenvectors
                     k=1 l=1
                      p   n                                                        Singular Value
                                                                                   Decomposition
                                  T                T
                =               (X )kl Zlk + (Y )kl Zlk                            Directional Derivative,
                                                                                   Gradient
                     k=1 l=1
                      p                                                            Books

                           T              T
                =          (X Z)kk + (Y Z)kk
                     k=1
                = tr XT Z + tr YT Z = X, Z + Y, Z                                                    92of 112
Introduction to Statistical

Projection                                                             Machine Learning

                                                                           c 2011
                                                                     Christfried Webers
                                                                           NICTA
                                                                   The Australian National
    In linear algebra and functional analysis, a projection is a         University

    linear transformation P from a vector space V to itself such
    that
                               P2 = P
                                                                   Basic Concepts

                                                                   Linear Transformations
                                    (a,b)                          Trace
                y                                                  Inner Product

                                                                   Projection

                                                                   Rank, Determinant, Trace

                                                                   Matrix Inverse

                                                                   Eigenvectors

                                                                   Singular Value
                                                                   Decomposition
                                            x
                                                                   Directional Derivative,
                    (a-b,0)                                        Gradient

                                                                   Books

               a   a−b                  1       −1
             A   =                   A=
               b    0                   0        0
                                                                                     93of 112
Introduction to Statistical

Orthogonal Projection                                              Machine Learning

                                                                       c 2011
                                                                 Christfried Webers
                                                                       NICTA
    Orthogonality : need an inner product x, y                 The Australian National
                                                                     University

    Choose two arbitrary vectors x and y. Then Px and y − Py
    are orthogonal.

         0 = Px, y − Py = (Px)T (y − Py) = xT (P − PT P)y
                                                               Basic Concepts

    Orthogonal Projection                                      Linear Transformations

                                                               Trace
                           2               T
                           P =P      P=P                       Inner Product

                                                               Projection

    Example : Given some unit vector u ∈ Rn characterising a   Rank, Determinant, Trace


    line through the origin in Rn                              Matrix Inverse

                                                               Eigenvectors
    project an arbitrary vector x ∈ Rn onto this line by       Singular Value
                                                               Decomposition

                               P = uuT                         Directional Derivative,
                                                               Gradient

                                                               Books
    Proof : P = PT , and

                  P2 x = (uuT )(uuT )x = uuT x = Px
                                                                                 94of 112
Introduction to Statistical

Orthogonal Projection                                          Machine Learning

                                                                   c 2011
                                                             Christfried Webers
                                                                   NICTA
                                                           The Australian National
                                                                 University
    Given a matrix A ∈ Rn×p and a vector x. What is the
    closest point x to x in the column space of A ?
    Orthogonal Projection into the column space of A

                        x = A(AT A)−1 AT x                 Basic Concepts

                                                           Linear Transformations

    Projection matrix                                      Trace

                                                           Inner Product

                        P = A(AT A)−1 AT                   Projection

                                                           Rank, Determinant, Trace

    Proof                                                  Matrix Inverse

                                                           Eigenvectors

       P2 = A(AT A)−1 AT A(AT A)−1 AT = A(AT A)−1 AT = P   Singular Value
                                                           Decomposition

                                                           Directional Derivative,
    Orthogonal projection ?                                Gradient

                                                           Books

            PT = (A(AT A)−1 AT )T = A(AT A)−1 AT = P

                                                                             95of 112
Introduction to Statistical

Linear Independence of Vectors                                               Machine Learning

                                                                                 c 2011
                                                                           Christfried Webers
                                                                                 NICTA
                                                                         The Australian National
                                                                               University

    A set of vectors is linearly independent if none of them can
    be written as a linear combination of finitely many other
    vectors in the set.
    Given a vector space V and k vectors v1 , . . . , vk ∈ V and         Basic Concepts
    scalars a1 , . . . , ak . The vectors are linearly independent, if   Linear Transformations
    and only if                                                          Trace

                                                                       Inner Product
                                                         0               Projection
                                                       0               Rank, Determinant, Trace
                   a1 v1 + a2 v2 + · · · + ak vk = 0 ≡  . 
                                                        
                                                       .
                                                                         Matrix Inverse
                                                         .               Eigenvectors
                                                         0               Singular Value
                                                                         Decomposition

    has only the trivial solution                                        Directional Derivative,
                                                                         Gradient

                                                                         Books
                            a1 = a2 = ak = 0


                                                                                           96of 112
Introduction to Statistical

Rank                                                            Machine Learning

                                                                    c 2011
                                                              Christfried Webers
                                                                    NICTA
                                                            The Australian National
                                                                  University




   The column rank of a matrix A is the maximal number of   Basic Concepts

   linearly independent columns of A.                       Linear Transformations

                                                            Trace
   The row rank of a matrix A is the maximal number of      Inner Product
   linearly independent rows of A.                          Projection

   For every matrix A, the row rank and column rank are     Rank, Determinant, Trace

   equal, called the rank of A.                             Matrix Inverse

                                                            Eigenvectors

                                                            Singular Value
                                                            Decomposition

                                                            Directional Derivative,
                                                            Gradient

                                                            Books




                                                                              97of 112
Introduction to Statistical

Determinant                                                               Machine Learning

                                                                              c 2011
                                                                        Christfried Webers
                                                                              NICTA
                                                                      The Australian National
                                                                            University




Let Sn be the set of all permutation of the numbers {1, . . . , n}.
The determinant of the square matrix A is then given by
                                                n                     Basic Concepts

                  det {A} =           sgn(σ)         Ai,σ(i)          Linear Transformations

                                                                      Trace
                               σ∈Sn            i=1
                                                                      Inner Product

where sgn is the signature of the permutation ( +1 if the             Projection

permutation is even, −1 if the permutation is odd).                   Rank, Determinant, Trace

                                                                      Matrix Inverse
    det {AB} = det {A} det {B} for A, B square matrices.              Eigenvectors
                          −1
    det A−1 = det {A}          .                                      Singular Value
                                                                      Decomposition

    det AT = det {A}.                                                 Directional Derivative,
                                                                      Gradient

                                                                      Books




                                                                                        98of 112
Introduction to Statistical

Matrix Inverse                                                       Machine Learning

                                                                         c 2011
                                                                   Christfried Webers
                                                                         NICTA
                                                                 The Australian National
                                                                       University




    Identity Matrix I
    I = AA−1 = A−1 A
    The matrix inverse A−1 does only exist for square matrices   Basic Concepts

    which are NOT singular.                                      Linear Transformations

                                                                 Trace
    Singular matrix                                              Inner Product
        at least one eigenvalue is zero,                         Projection
        determinant |A| = 0.                                     Rank, Determinant, Trace

    Inversion ’reverses the order’ (like transposition)          Matrix Inverse

                                                                 Eigenvectors

                          (AB)−1 = B−1 A−1                       Singular Value
                                                                 Decomposition

                                                                 Directional Derivative,
    (Only then is (AB)(AB)−1 = (AB)B−1 A−1 = AA−1 = I.)          Gradient

                                                                 Books




                                                                                   99of 112
Introduction to Statistical

Matrix Inverse and Transpose                Machine Learning

                                                c 2011
                                          Christfried Webers
                                                NICTA
                                        The Australian National
                                              University




                                        Basic Concepts

      −1 T                              Linear Transformations
    (A   ) ?
                                        Trace
    need to assume that (A−1 ) exists   Inner Product

    rule : (A−1 )T = (AT )−1            Projection

                                        Rank, Determinant, Trace

                                        Matrix Inverse

                                        Eigenvectors

                                        Singular Value
                                        Decomposition

                                        Directional Derivative,
                                        Gradient

                                        Books




                                                         100of 112
Introduction to Statistical

Useful Identity                                                             Machine Learning

                                                                                c 2011
                                                                          Christfried Webers
                                                                                NICTA
                                                                        The Australian National
                                                                              University




          (A−1 + BT C−1 B)−1 BT C−1 = ABT (BABT + C)−1                  Basic Concepts

                                                                        Linear Transformations

                                                                        Trace

    How to analyse and prove such an equation?                          Inner Product

                                                                        Projection
     −1                                                       n×n
    A     must exist, therefore A must be square, say A ∈ R         .   Rank, Determinant, Trace

    Same for C−1 , so let’s assume C ∈ Rm×m .                           Matrix Inverse

                       m×n
    Therefore, B ∈ R         .                                          Eigenvectors

                                                                        Singular Value
                                                                        Decomposition

                                                                        Directional Derivative,
                                                                        Gradient

                                                                        Books




                                                                                         101of 112
Introduction to Statistical

Prove the Identity                                                        Machine Learning

                                                                              c 2011
                                                                        Christfried Webers
                                                                              NICTA
                                                                      The Australian National
                                                                            University




        (A−1 + BT C−1 B)−1 BT C−1 = ABT (BABT + C)−1

                                                                      Basic Concepts
    Multiply by (BABT + C)                                            Linear Transformations

                                                                      Trace

             (A−1 + BT C−1 B)−1 BT C−1 (BABT + C) = ABT               Inner Product

                                                                      Projection

    Simplify the left-hand side                                       Rank, Determinant, Trace

                                                                      Matrix Inverse

               (A−1 + BT C−1 B)−1 [BT C−1 B(ABT ) + BT ]              Eigenvectors

                  −1     T    −1    −1    T     −1     −1         T   Singular Value
             = (A      +B C        B)    [B C        B+A    ]AB       Decomposition


             = ABT                                                    Directional Derivative,
                                                                      Gradient

                                                                      Books




                                                                                       102of 112
Introduction to Statistical

Woodbury Identity                                                        Machine Learning

                                                                             c 2011
                                                                       Christfried Webers
                                                                             NICTA
                                                                     The Australian National
                                                                           University




                                                                     Basic Concepts

      (A + BD−1 C)−1 = A−1 − A−1 B(D + CA−1 B)−1 CA−1                Linear Transformations

                                                                     Trace

                                                                     Inner Product

    Useful if A is diagonal and easy to invert, and B is very tall   Projection

    (many rows, but only a few columns) and C is very wide           Rank, Determinant, Trace

    (few rows, many columns).                                        Matrix Inverse

                                                                     Eigenvectors

                                                                     Singular Value
                                                                     Decomposition

                                                                     Directional Derivative,
                                                                     Gradient

                                                                     Books




                                                                                      103of 112
Introduction to Statistical

Manipulating Matrix Equations                                       Machine Learning

                                                                        c 2011
                                                                  Christfried Webers
                                                                        NICTA
                                                                The Australian National
                                                                      University




    Don’t multiply by a matrix which does not have full rank.   Basic Concepts

    Why? In the general case, you will loose solutions.         Linear Transformations

                                                                Trace
    Don’t multiply by nonsquare matrices.                       Inner Product

    Don’t assume a matrix inverse exists because a matrix is    Projection

    square. The matrix might be singular and you are making     Rank, Determinant, Trace

    the same mistake as if deducing a = b from a 0 = b 0.       Matrix Inverse

                                                                Eigenvectors

                                                                Singular Value
                                                                Decomposition

                                                                Directional Derivative,
                                                                Gradient

                                                                Books




                                                                                 104of 112
Introduction to Statistical

Eigenvectors                                              Machine Learning

                                                              c 2011
                                                        Christfried Webers
                                                              NICTA
                                                      The Australian National
                                                            University




    Every square matrix A ∈ Rn×n has an Eigenvector
    decomposition
                            Ax = λx
                                                      Basic Concepts

    where x ∈ Rn and λ ∈ C.                           Linear Transformations

                                                      Trace
    Example:
                                                      Inner Product
                          0 1
                              x = λx                  Projection
                         −1 0                         Rank, Determinant, Trace

                                                      Matrix Inverse

                       λ = {−ı, ı}                    Eigenvectors

                                                      Singular Value
                              ı   −ı                  Decomposition
                       x=       ,                     Directional Derivative,
                              1    1                  Gradient

                                                      Books




                                                                       105of 112
Introduction to Statistical

Eigenvectors                                                 Machine Learning

                                                                 c 2011
                                                           Christfried Webers
                                                                 NICTA
                                                         The Australian National
                                                               University




    How many eigenvalue/eigenvector pairs?
                                                         Basic Concepts

                                                         Linear Transformations
                              Ax = λx                    Trace

                                                         Inner Product
    is equivalent to
                                                         Projection
                           (A − λI)x = 0
                                                         Rank, Determinant, Trace

    Has only non-trivial solution for det {A − λI} = 0   Matrix Inverse

                                                         Eigenvectors
    polynom of nth order; at most n distinct solutions
                                                         Singular Value
                                                         Decomposition

                                                         Directional Derivative,
                                                         Gradient

                                                         Books




                                                                          106of 112
Introduction to Statistical

Real Eigenvalues                                                 Machine Learning

                                                                     c 2011
                                                               Christfried Webers
                                                                     NICTA
                                                             The Australian National
                                                                   University




    How can we enforce real eigenvalues?
    Let’s look at matrices with complex entries A ∈ Cn×n .   Basic Concepts

                                                             Linear Transformations
    Transposition is replaced by Hermitian adjoint, e.g.     Trace

                                                             Inner Product
                                H
                1 + ı2 3 + ı4         1 − ı2 5 − ı6          Projection
                                    =
                5 + ı6 7 + ı8         7 − ı8 7 − ı8          Rank, Determinant, Trace

                                                             Matrix Inverse

    Denote the complex conjugate of a complex number λ by    Eigenvectors

    λ.                                                       Singular Value
                                                             Decomposition

                                                             Directional Derivative,
                                                             Gradient

                                                             Books




                                                                              107of 112
Introduction to Statistical

Real Eigenvalues                                                     Machine Learning

                                                                         c 2011
                                                                   Christfried Webers
                                                                         NICTA
    How can we enforce real eigenvalues?                         The Australian National
                                                                       University

    Let’s assume A ∈ Cn×n , Hermitian (AH = A).
    Calculate
                           xH Ax = λxH x
    for an eigenvector x ∈ Cn of A.                              Basic Concepts

    Another possibility to calculate xH Ax                       Linear Transformations

                                                                 Trace

            xH Ax = xH AH x              (A is Hermitian)        Inner Product

                        H     H                                  Projection
                    = (x Ax)              (reverse order)
                                                                 Rank, Determinant, Trace

                    = (λxH x)H               (eigenvalue)        Matrix Inverse

                        H                                        Eigenvectors
                    = λx x
                                                                 Singular Value
                                                                 Decomposition
    and therefore                                                Directional Derivative,
                                                                 Gradient
                            λ=λ   (λ is real).
                                                                 Books

    If A is Hermitian, then all eigenvalues are real.
    Special case: If A has only real entries and is symmetric,
    then all eigenvalues are real.
                                                                                  108of 112
Introduction to Statistical

Singular Value Decomposition                                    Machine Learning

                                                                    c 2011
                                                              Christfried Webers
                                                                    NICTA
                                                            The Australian National
                                                                  University




Every matrix A ∈ Rn×p can be decomposed into a product of   Basic Concepts

three matrices                                              Linear Transformations

                         A = UΣV T                          Trace

                                                            Inner Product
where U ∈ Rn×n and V ∈ Rp×p are orthogonal matrices (       Projection

U T U = I and V T V = I ), and Σ ∈ Rn×p has nonnegative     Rank, Determinant, Trace

numbers on the diagonal.                                    Matrix Inverse

                                                            Eigenvectors

                                                            Singular Value
                                                            Decomposition

                                                            Directional Derivative,
                                                            Gradient

                                                            Books




                                                                             109of 112
Introduction to Statistical

How to calculate a gradient?                                           Machine Learning

                                                                           c 2011
                                                                     Christfried Webers
                                                                           NICTA
                                                                   The Australian National
    Given a vector space V, and a function f : V → R.                    University


    Example: X ∈ Rn×p , arbitrary C ∈ Rn×n ,

                         f (X) = tr X T CX .
                                                                   Basic Concepts
    How to calculate the gradient of f ?                           Linear Transformations

    Ill-defined question. There is no gradient in a vector space.   Trace

                                                                   Inner Product
    Only a Directional Derivative at point X ∈ V in direction
                                                                   Projection
    ξ ∈ V.                                                         Rank, Determinant, Trace
                                  f (X + hξ) − f (X)
                 Df (X)(ξ) = lim                                   Matrix Inverse
                             h→0          h                        Eigenvectors

    For the example f (X) = tr X T CX                              Singular Value
                                                                   Decomposition

                                                                   Directional Derivative,
                       tr (X + hξ)T C(X + hξ) − tr X T CX          Gradient

       Df (X)(ξ) = lim                                             Books
                   h→0                  h
                 = tr ξ CX + X Cξ = tr X T (CT + C)ξ
                       T       T


                                                                                    110of 112
Introduction to Statistical

How to calculate a gradient?                                            Machine Learning

                                                                            c 2011
                                                                      Christfried Webers
                                                                            NICTA
                                                                    The Australian National
                                                                          University
      Given a vector space V, and a function f : V → R.
      How to calculate the gradient of f ?

  1   Calculate the Directional Derivative
  2   Define an inner product A, B on the vector space               Basic Concepts

                                                                    Linear Transformations
  3   The gradient is now defined as                                 Trace

                                                                    Inner Product
                         Df (X)(ξ) = grad f , ξ                     Projection

                                                                    Rank, Determinant, Trace

      For the example f (X) = tr X T CX                             Matrix Inverse

                                                                    Eigenvectors
      Define the inner product A, B = tr AT B .                      Singular Value
                                                                    Decomposition
      Write the Directional Derivative as an inner product with ξ
                                                                    Directional Derivative,
                                                                    Gradient
                                 T   T
                Df (X)(ξ) = tr X (C + C)ξ                           Books

                                     T
                          = (C + C )X, ξ = grad f , ξ

                                                                                     111of 112
Introduction to Statistical

Books                                                                Machine Learning

                                                                         c 2011
                                                                   Christfried Webers
                                                                         NICTA
                                                                 The Australian National
                                                                       University




                                                                 Basic Concepts

   Gilbert Strang, "Introduction to Linear Algebra", Wellesley   Linear Transformations

   Cambridge, 2009.                                              Trace

                                                                 Inner Product
   David C. Lay, "Linear Algebra and Its Applications",          Projection
   Addison Wesley, 2005.                                         Rank, Determinant, Trace

                                                                 Matrix Inverse

                                                                 Eigenvectors

                                                                 Singular Value
                                                                 Decomposition

                                                                 Directional Derivative,
                                                                 Gradient

                                                                 Books




                                                                                  112of 112

More Related Content

What's hot

02 linear algebra
02 linear algebra02 linear algebra
02 linear algebraRonald Teo
 
Decision tree and instance-based learning for label ranking
Decision tree and instance-based learning for label rankingDecision tree and instance-based learning for label ranking
Decision tree and instance-based learning for label rankingroywwcheng
 
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...Ceni Babaoglu, PhD
 
matrix algebra
matrix algebramatrix algebra
matrix algebrakganu
 
Eigen value and eigen vector
Eigen value and eigen vectorEigen value and eigen vector
Eigen value and eigen vectorRutvij Patel
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityPier Luca Lanzi
 
lagrange interpolation
lagrange interpolationlagrange interpolation
lagrange interpolationayush raj
 
Data Exploration, Validation and Sanitization
Data Exploration, Validation and SanitizationData Exploration, Validation and Sanitization
Data Exploration, Validation and SanitizationVenkata Reddy Konasani
 
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear TransformationsCeni Babaoglu, PhD
 
Matrices and System of Linear Equations ppt
Matrices and System of Linear Equations pptMatrices and System of Linear Equations ppt
Matrices and System of Linear Equations pptDrazzer_Dhruv
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data ScienceMutia Ulfi
 
Maths-->>Eigenvalues and eigenvectors
Maths-->>Eigenvalues and eigenvectorsMaths-->>Eigenvalues and eigenvectors
Maths-->>Eigenvalues and eigenvectorsJaydev Kishnani
 

What's hot (20)

02 linear algebra
02 linear algebra02 linear algebra
02 linear algebra
 
07 approximate inference in bn
07 approximate inference in bn07 approximate inference in bn
07 approximate inference in bn
 
Pca
PcaPca
Pca
 
Vector Spaces
Vector SpacesVector Spaces
Vector Spaces
 
Decision tree and instance-based learning for label ranking
Decision tree and instance-based learning for label rankingDecision tree and instance-based learning for label ranking
Decision tree and instance-based learning for label ranking
 
LinearAlgebra.ppt
LinearAlgebra.pptLinearAlgebra.ppt
LinearAlgebra.ppt
 
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
 
matrix algebra
matrix algebramatrix algebra
matrix algebra
 
Spectral graph theory
Spectral graph theorySpectral graph theory
Spectral graph theory
 
Graph Theory
Graph TheoryGraph Theory
Graph Theory
 
Eigen value and eigen vector
Eigen value and eigen vectorEigen value and eigen vector
Eigen value and eigen vector
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and Credibility
 
lagrange interpolation
lagrange interpolationlagrange interpolation
lagrange interpolation
 
Data Exploration, Validation and Sanitization
Data Exploration, Validation and SanitizationData Exploration, Validation and Sanitization
Data Exploration, Validation and Sanitization
 
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
 
Matrix algebra
Matrix algebraMatrix algebra
Matrix algebra
 
Matrices and System of Linear Equations ppt
Matrices and System of Linear Equations pptMatrices and System of Linear Equations ppt
Matrices and System of Linear Equations ppt
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data Science
 
Maths-->>Eigenvalues and eigenvectors
Maths-->>Eigenvalues and eigenvectorsMaths-->>Eigenvalues and eigenvectors
Maths-->>Eigenvalues and eigenvectors
 

Viewers also liked

matrices and algbra
matrices and algbramatrices and algbra
matrices and algbragandhinagar
 
Oxford university (2014)
Oxford university (2014)Oxford university (2014)
Oxford university (2014)Jaime Botello
 
Statistic for spatial data
Statistic for spatial dataStatistic for spatial data
Statistic for spatial dataBao Van Tuy
 
01 03 traffic statistic items
01 03 traffic statistic items01 03 traffic statistic items
01 03 traffic statistic itemsrwayda93
 
Introducción a el calculo vectorial
Introducción a el calculo vectorialIntroducción a el calculo vectorial
Introducción a el calculo vectorialmguevarac24
 
Confluence of Broken Windows JavaOne 2016
Confluence of Broken Windows JavaOne 2016Confluence of Broken Windows JavaOne 2016
Confluence of Broken Windows JavaOne 2016Vincent Kok
 
Precalculo sullivan, mejia salazar ramon
Precalculo sullivan, mejia salazar ramonPrecalculo sullivan, mejia salazar ramon
Precalculo sullivan, mejia salazar ramonRamon Mejía salazar
 
1 algebra lineal y vectores aleatorios
1 algebra lineal y vectores aleatorios1 algebra lineal y vectores aleatorios
1 algebra lineal y vectores aleatoriosDAISY PAEZ
 
차원축소 훑어보기 (PCA, SVD, NMF)
차원축소 훑어보기 (PCA, SVD, NMF)차원축소 훑어보기 (PCA, SVD, NMF)
차원축소 훑어보기 (PCA, SVD, NMF)beom kyun choi
 
Estructura de Capital (MM)
Estructura de Capital (MM)Estructura de Capital (MM)
Estructura de Capital (MM)P&A Consulting
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distributionswarna dey
 
Introduction to Analysis of Variance
Introduction to Analysis of VarianceIntroduction to Analysis of Variance
Introduction to Analysis of Variancejasondroesch
 
Data Visualisation for Data Science
Data Visualisation for Data ScienceData Visualisation for Data Science
Data Visualisation for Data ScienceChristophe Bontemps
 
Introduction to the t Statistic
Introduction to the t StatisticIntroduction to the t Statistic
Introduction to the t Statisticjasondroesch
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialJia-Bin Huang
 
Solucionario de trigonometría de granville
Solucionario de trigonometría de granvilleSolucionario de trigonometría de granville
Solucionario de trigonometría de granvilleCris Panchi
 

Viewers also liked (20)

matrices and algbra
matrices and algbramatrices and algbra
matrices and algbra
 
Oxford university (2014)
Oxford university (2014)Oxford university (2014)
Oxford university (2014)
 
STATISTIC ESTIMATION
STATISTIC ESTIMATIONSTATISTIC ESTIMATION
STATISTIC ESTIMATION
 
Statistic for spatial data
Statistic for spatial dataStatistic for spatial data
Statistic for spatial data
 
01 03 traffic statistic items
01 03 traffic statistic items01 03 traffic statistic items
01 03 traffic statistic items
 
Introducción a el calculo vectorial
Introducción a el calculo vectorialIntroducción a el calculo vectorial
Introducción a el calculo vectorial
 
Confluence of Broken Windows JavaOne 2016
Confluence of Broken Windows JavaOne 2016Confluence of Broken Windows JavaOne 2016
Confluence of Broken Windows JavaOne 2016
 
Precalculo sullivan, mejia salazar ramon
Precalculo sullivan, mejia salazar ramonPrecalculo sullivan, mejia salazar ramon
Precalculo sullivan, mejia salazar ramon
 
1 algebra lineal y vectores aleatorios
1 algebra lineal y vectores aleatorios1 algebra lineal y vectores aleatorios
1 algebra lineal y vectores aleatorios
 
Basic Statistic
Basic Statistic Basic Statistic
Basic Statistic
 
Freq distribution
Freq distributionFreq distribution
Freq distribution
 
Statistic in Daily Life
Statistic in Daily LifeStatistic in Daily Life
Statistic in Daily Life
 
차원축소 훑어보기 (PCA, SVD, NMF)
차원축소 훑어보기 (PCA, SVD, NMF)차원축소 훑어보기 (PCA, SVD, NMF)
차원축소 훑어보기 (PCA, SVD, NMF)
 
Estructura de Capital (MM)
Estructura de Capital (MM)Estructura de Capital (MM)
Estructura de Capital (MM)
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
Introduction to Analysis of Variance
Introduction to Analysis of VarianceIntroduction to Analysis of Variance
Introduction to Analysis of Variance
 
Data Visualisation for Data Science
Data Visualisation for Data ScienceData Visualisation for Data Science
Data Visualisation for Data Science
 
Introduction to the t Statistic
Introduction to the t StatisticIntroduction to the t Statistic
Introduction to the t Statistic
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
 
Solucionario de trigonometría de granville
Solucionario de trigonometría de granvilleSolucionario de trigonometría de granville
Solucionario de trigonometría de granville
 

Similar to Linear Algebra

Similar to Linear Algebra (6)

Introduction
IntroductionIntroduction
Introduction
 
linear classification
linear classificationlinear classification
linear classification
 
linear regression part 2
linear regression part 2linear regression part 2
linear regression part 2
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
08 linear classification_2
08 linear classification_208 linear classification_2
08 linear classification_2
 
Probability
ProbabilityProbability
Probability
 

More from nep_test_account (13)

Database slide
Database slideDatabase slide
Database slide
 
database slide 1
database slide 1database slide 1
database slide 1
 
Doc2pages
Doc2pagesDoc2pages
Doc2pages
 
Doc2pages
Doc2pagesDoc2pages
Doc2pages
 
PDF TEST
PDF TESTPDF TEST
PDF TEST
 
Baboom!
Baboom!Baboom!
Baboom!
 
Baboom!
Baboom!Baboom!
Baboom!
 
uat
uatuat
uat
 
test pdf
test pdftest pdf
test pdf
 
Lecture Notes in Machine Learning
Lecture Notes in Machine LearningLecture Notes in Machine Learning
Lecture Notes in Machine Learning
 
Induction of Decision Trees
Induction of Decision TreesInduction of Decision Trees
Induction of Decision Trees
 
Large-Scale Machine Learning at Twitter
Large-Scale Machine Learning at TwitterLarge-Scale Machine Learning at Twitter
Large-Scale Machine Learning at Twitter
 
A Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine LearningA Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine Learning
 

Linear Algebra

  • 1. Introduction to Statistical Machine Learning c 2011 Christfried Webers NICTA The Australian National University Introduction to Statistical Machine Learning Christfried Webers Statistical Machine Learning Group NICTA and College of Engineering and Computer Science The Australian National University Canberra February – June 2011 (Many figures from C. M. Bishop, "Pattern Recognition and Machine Learning") 1of 112
  • 2. Introduction to Statistical Machine Learning c 2011 Christfried Webers NICTA The Australian National University Part III Basic Concepts Linear Transformations Linear Algebra Trace Inner Product Projection Rank, Determinant, Trace Matrix Inverse Eigenvectors Singular Value Decomposition Directional Derivative, Gradient Books 80of 112
  • 3. Introduction to Statistical Intuition Machine Learning c 2011 Christfried Webers NICTA The Australian National University Geometry Points and Lines Vector addition and scaling Basic Concepts Humans have experience with 3 dimensions (less with 1, 2 Linear Transformations though) Trace Inner Product Generalisation to N dimensions (possibly N → ∞) Projection Line → vector space V Rank, Determinant, Trace Matrix Inverse Point → vector x ∈ V Eigenvectors Example : X ∈ Rn×m Singular Value Decomposition n×m n·m Space of matrices R and the space of vectors R are Directional Derivative, isomorphic Gradient Books 81of 112
  • 4. Introduction to Statistical Vector Space V, underlying Field F Machine Learning c 2011 Christfried Webers NICTA The Australian National Given vectors u, v, w ∈ V and scalars α, β ∈ F, the following University holds: Associativity of addition : u + (v + w) = (u + v) + w . Commutativity of addition : v + w = w + v . Identity element of addition : ∃ 0 such that Basic Concepts v + 0 = v, ∀ v ∈ V. Linear Transformations Trace Inverse of addition : For all v ∈ V, ∃w ∈ V, such that Inner Product v + w = 0. The additive inverse is denoted −v. Projection Distributivity of scalar multiplication with respect to vector Rank, Determinant, Trace addition : α(v + w) = αv + αw . Matrix Inverse Eigenvectors Distributivity of scalar multiplication with respect to field Singular Value addition : (α + β)v = αv + βv . Decomposition Directional Derivative, Compatibility of scalar multiplication with field Gradient multiplication : α(βv) = (αβ)v . Books Identity element of scalar multiplication : 1v = v, where 1 is the identity in F . 82of 112
  • 5. Introduction to Statistical Matrix-Vector Multiplication Machine Learning c 2011 Christfried Webers NICTA The Australian National University § ¤ 1 f o r i i n xrange (m) : R[ i ] = 0 . 0 ; f o r j i n xrange ( n ) : R[ i ] = R[ i ] + A[ i , j ] ∗ V[ j ] ¦ ¥ Basic Concepts Listing 1: Code for elementwise matrix multiplication. Linear Transformations Trace Inner Product Projection Rank, Determinant, Trace A V = R Matrix Inverse      Eigenvectors a11 a12 . . . a1n v1 a11 v1 + a12 v2 + · · · + a1n vn Singular Value  a21  a22 . . . a2n   v2   a21 v1 + a22 v2 + · · · + a2n vn    =   Decomposition . . . Directional Derivative, ... . . . . . .  . . .  ...  Gradient am1 am2 . . . amn vn am1 v1 + am2 v2 + · · · + amn vn Books 83of 112
  • 6. Introduction to Statistical Matrix-Vector Multiplication Machine Learning c 2011 Christfried Webers NICTA The Australian National University Basic Concepts A V = R Linear Transformations      a11 a12 . . . a1n v1 a11 v1 + a12 v2 + · · · + a1n vn Trace  a21  a22 . . . a2n   v2   a21 v1 + a22 v2 + · · · + a2n vn    =   Inner Product . . . ... . . . . . .  . . .  ...  Projection am1 am2 . . . amn vn am1 v1 + am2 v2 + · · · + amn vn Rank, Determinant, Trace Matrix Inverse § ¤ Eigenvectors Singular Value 1 R = A[: ,0]; Decomposition f o r j i n xrange ( 1 , n ) : Directional Derivative, R += A [ : , j ] ∗ V [ j ] ; ¦ ¥ Gradient Books Listing 2: Code for columnwise matrix multiplication. 84of 112
  • 7. Introduction to Statistical Matrix-Vector Multiplication Machine Learning c 2011 Christfried Webers NICTA The Australian National University Denote the n columns of A by a1 , a2 , . . . , an Basic Concepts Linear Transformations Each ai is now a (column) vector Trace Inner Product Projection A V = R Rank, Determinant, Trace   Matrix Inverse   v1   Eigenvectors  v2  a1 a2 ... an    = a1 v1 + a2 v2 + · · · + an vn  . . . Singular Value Decomposition vn Directional Derivative, Gradient Books 85of 112
  • 8. Introduction to Statistical Transpose of R Machine Learning c 2011 Christfried Webers NICTA The Australian National University Given R = AV,   Basic Concepts Linear Transformations R = a1 v1 + a2 v2 + · · · + an vn  Trace Inner Product Projection T What is R ? Rank, Determinant, Trace Matrix Inverse RT = v1 aT + v2 aT + · · · + vn aT 1 2 n Eigenvectors Singular Value NOT equal to AT V T ! (In fact, AT V T is not even defined.) Decomposition Directional Derivative, Gradient Books 86of 112
  • 9. Introduction to Statistical Transpose Machine Learning c 2011 Christfried Webers NICTA The Australian National University Reverse order rule : (AV)T = V T AT Basic Concepts VT AT = RT Linear Transformations Trace    aT 1  Inner Product   Projection  aT 2   = v1 aT + v2 aT + · · · + vn aT v1 v2 . . . vn    1 2 n Rank, Determinant, Trace Matrix Inverse   ...   Eigenvectors aT n Singular Value Decomposition Directional Derivative, Gradient Books 87of 112
  • 10. Introduction to Statistical Trace Machine Learning c 2011 Christfried Webers NICTA The Australian National University The trace of a square matrix A is the sum of all diagonal elements of A. n Basic Concepts tr {A} = Akk Linear Transformations k=1 Trace Example Inner Product Projection   1 2 3 Rank, Determinant, Trace A = 4 5 6 tr {A} = 1 + 5 + 9 = 15 Matrix Inverse Eigenvectors 7 8 9 Singular Value Decomposition The trace does not exist for a non-square matrix. Directional Derivative, Gradient Books 88of 112
  • 11. Introduction to Statistical vec(x) operator Machine Learning c 2011 Christfried Webers NICTA The Australian National Define vec(X) as the vector which results from stacking all University columns of a matrix A on top of each other. Example   1 Basic Concepts 4   Linear Transformations 7     Trace 1 2 3 2   Inner Product A = 4 5 6 vec(A) = 5   Projection 7 8 9 8   Rank, Determinant, Trace 3 Matrix Inverse   6 Eigenvectors 9 Singular Value Decomposition Directional Derivative, Given two matrices X, Y ∈ Rn×p , the trace of the product Gradient XT Y can be written as Books tr XT Y = vec(X)T vec(Y) 89of 112
  • 12. Introduction to Statistical Inner Product Machine Learning c 2011 Christfried Webers NICTA The Australian National University Mapping from two vectors x, y ∈ V to a field of scalars F ( F = R or F = C ): Basic Concepts ·, · : V × V → F Linear Transformations Trace Conjugate Symmetry : Inner Product x, y = y, x Projection Rank, Determinant, Trace Linearity : Matrix Inverse ax, y = a x, y , and x + y, z = x, z + y, z Eigenvectors Positive-definitness : Singular Value Decomposition x, x ≥ 0, and x, x = 0 for x = 0 only. Directional Derivative, Gradient Books 90of 112
  • 13. Introduction to Statistical Canonical Inner Products Machine Learning c 2011 Christfried Webers NICTA Inner product for real numbers x, y ∈ R: The Australian National University x, y = x y Dot product between two vectors: Basic Concepts Given x, y ∈ Rn Linear Transformations Trace n T Inner Product x, y = x y ≡ xk yk = x1 y1 + x2 y2 + · · · + xn yn Projection k=1 Rank, Determinant, Trace Canonical inner product for matrices : Matrix Inverse Given X, Y ∈ Rn×p Eigenvectors Singular Value p p n Decomposition X, Y = tr XT Y = (XT Y)kk = (XT )kl (Y)lk Directional Derivative, Gradient k=1 k=1 l=1 Books p n = Xlk Ylk k=1 l=1 91of 112
  • 14. Introduction to Statistical Calculations with Matrices Machine Learning c 2011 Christfried Webers th NICTA Denote (i, j) element of matrix X by Xij The Australian National University Transpose : (XT )ij = Xji ... Product : (XY)ij = k=1 Xik Ykj Proof of the linearity of the canonical matrix inner product : p T X + Y, Z = tr (X + Y) Z = ((X + Y)T Z)kk Basic Concepts k=1 Linear Transformations p n p n Trace = ((X + Y)T )kl Zlk = (Xlk + Ylk )Zlk Inner Product k=1 l=1 k=1 l=1 Projection p n Rank, Determinant, Trace = Xlk Zlk + Ylk Zlk Matrix Inverse Eigenvectors k=1 l=1 p n Singular Value Decomposition T T = (X )kl Zlk + (Y )kl Zlk Directional Derivative, Gradient k=1 l=1 p Books T T = (X Z)kk + (Y Z)kk k=1 = tr XT Z + tr YT Z = X, Z + Y, Z 92of 112
  • 15. Introduction to Statistical Projection Machine Learning c 2011 Christfried Webers NICTA The Australian National In linear algebra and functional analysis, a projection is a University linear transformation P from a vector space V to itself such that P2 = P Basic Concepts Linear Transformations (a,b) Trace y Inner Product Projection Rank, Determinant, Trace Matrix Inverse Eigenvectors Singular Value Decomposition x Directional Derivative, (a-b,0) Gradient Books a a−b 1 −1 A = A= b 0 0 0 93of 112
  • 16. Introduction to Statistical Orthogonal Projection Machine Learning c 2011 Christfried Webers NICTA Orthogonality : need an inner product x, y The Australian National University Choose two arbitrary vectors x and y. Then Px and y − Py are orthogonal. 0 = Px, y − Py = (Px)T (y − Py) = xT (P − PT P)y Basic Concepts Orthogonal Projection Linear Transformations Trace 2 T P =P P=P Inner Product Projection Example : Given some unit vector u ∈ Rn characterising a Rank, Determinant, Trace line through the origin in Rn Matrix Inverse Eigenvectors project an arbitrary vector x ∈ Rn onto this line by Singular Value Decomposition P = uuT Directional Derivative, Gradient Books Proof : P = PT , and P2 x = (uuT )(uuT )x = uuT x = Px 94of 112
  • 17. Introduction to Statistical Orthogonal Projection Machine Learning c 2011 Christfried Webers NICTA The Australian National University Given a matrix A ∈ Rn×p and a vector x. What is the closest point x to x in the column space of A ? Orthogonal Projection into the column space of A x = A(AT A)−1 AT x Basic Concepts Linear Transformations Projection matrix Trace Inner Product P = A(AT A)−1 AT Projection Rank, Determinant, Trace Proof Matrix Inverse Eigenvectors P2 = A(AT A)−1 AT A(AT A)−1 AT = A(AT A)−1 AT = P Singular Value Decomposition Directional Derivative, Orthogonal projection ? Gradient Books PT = (A(AT A)−1 AT )T = A(AT A)−1 AT = P 95of 112
  • 18. Introduction to Statistical Linear Independence of Vectors Machine Learning c 2011 Christfried Webers NICTA The Australian National University A set of vectors is linearly independent if none of them can be written as a linear combination of finitely many other vectors in the set. Given a vector space V and k vectors v1 , . . . , vk ∈ V and Basic Concepts scalars a1 , . . . , ak . The vectors are linearly independent, if Linear Transformations and only if Trace   Inner Product 0 Projection 0 Rank, Determinant, Trace a1 v1 + a2 v2 + · · · + ak vk = 0 ≡  .    . Matrix Inverse . Eigenvectors 0 Singular Value Decomposition has only the trivial solution Directional Derivative, Gradient Books a1 = a2 = ak = 0 96of 112
  • 19. Introduction to Statistical Rank Machine Learning c 2011 Christfried Webers NICTA The Australian National University The column rank of a matrix A is the maximal number of Basic Concepts linearly independent columns of A. Linear Transformations Trace The row rank of a matrix A is the maximal number of Inner Product linearly independent rows of A. Projection For every matrix A, the row rank and column rank are Rank, Determinant, Trace equal, called the rank of A. Matrix Inverse Eigenvectors Singular Value Decomposition Directional Derivative, Gradient Books 97of 112
  • 20. Introduction to Statistical Determinant Machine Learning c 2011 Christfried Webers NICTA The Australian National University Let Sn be the set of all permutation of the numbers {1, . . . , n}. The determinant of the square matrix A is then given by n Basic Concepts det {A} = sgn(σ) Ai,σ(i) Linear Transformations Trace σ∈Sn i=1 Inner Product where sgn is the signature of the permutation ( +1 if the Projection permutation is even, −1 if the permutation is odd). Rank, Determinant, Trace Matrix Inverse det {AB} = det {A} det {B} for A, B square matrices. Eigenvectors −1 det A−1 = det {A} . Singular Value Decomposition det AT = det {A}. Directional Derivative, Gradient Books 98of 112
  • 21. Introduction to Statistical Matrix Inverse Machine Learning c 2011 Christfried Webers NICTA The Australian National University Identity Matrix I I = AA−1 = A−1 A The matrix inverse A−1 does only exist for square matrices Basic Concepts which are NOT singular. Linear Transformations Trace Singular matrix Inner Product at least one eigenvalue is zero, Projection determinant |A| = 0. Rank, Determinant, Trace Inversion ’reverses the order’ (like transposition) Matrix Inverse Eigenvectors (AB)−1 = B−1 A−1 Singular Value Decomposition Directional Derivative, (Only then is (AB)(AB)−1 = (AB)B−1 A−1 = AA−1 = I.) Gradient Books 99of 112
  • 22. Introduction to Statistical Matrix Inverse and Transpose Machine Learning c 2011 Christfried Webers NICTA The Australian National University Basic Concepts −1 T Linear Transformations (A ) ? Trace need to assume that (A−1 ) exists Inner Product rule : (A−1 )T = (AT )−1 Projection Rank, Determinant, Trace Matrix Inverse Eigenvectors Singular Value Decomposition Directional Derivative, Gradient Books 100of 112
  • 23. Introduction to Statistical Useful Identity Machine Learning c 2011 Christfried Webers NICTA The Australian National University (A−1 + BT C−1 B)−1 BT C−1 = ABT (BABT + C)−1 Basic Concepts Linear Transformations Trace How to analyse and prove such an equation? Inner Product Projection −1 n×n A must exist, therefore A must be square, say A ∈ R . Rank, Determinant, Trace Same for C−1 , so let’s assume C ∈ Rm×m . Matrix Inverse m×n Therefore, B ∈ R . Eigenvectors Singular Value Decomposition Directional Derivative, Gradient Books 101of 112
  • 24. Introduction to Statistical Prove the Identity Machine Learning c 2011 Christfried Webers NICTA The Australian National University (A−1 + BT C−1 B)−1 BT C−1 = ABT (BABT + C)−1 Basic Concepts Multiply by (BABT + C) Linear Transformations Trace (A−1 + BT C−1 B)−1 BT C−1 (BABT + C) = ABT Inner Product Projection Simplify the left-hand side Rank, Determinant, Trace Matrix Inverse (A−1 + BT C−1 B)−1 [BT C−1 B(ABT ) + BT ] Eigenvectors −1 T −1 −1 T −1 −1 T Singular Value = (A +B C B) [B C B+A ]AB Decomposition = ABT Directional Derivative, Gradient Books 102of 112
  • 25. Introduction to Statistical Woodbury Identity Machine Learning c 2011 Christfried Webers NICTA The Australian National University Basic Concepts (A + BD−1 C)−1 = A−1 − A−1 B(D + CA−1 B)−1 CA−1 Linear Transformations Trace Inner Product Useful if A is diagonal and easy to invert, and B is very tall Projection (many rows, but only a few columns) and C is very wide Rank, Determinant, Trace (few rows, many columns). Matrix Inverse Eigenvectors Singular Value Decomposition Directional Derivative, Gradient Books 103of 112
  • 26. Introduction to Statistical Manipulating Matrix Equations Machine Learning c 2011 Christfried Webers NICTA The Australian National University Don’t multiply by a matrix which does not have full rank. Basic Concepts Why? In the general case, you will loose solutions. Linear Transformations Trace Don’t multiply by nonsquare matrices. Inner Product Don’t assume a matrix inverse exists because a matrix is Projection square. The matrix might be singular and you are making Rank, Determinant, Trace the same mistake as if deducing a = b from a 0 = b 0. Matrix Inverse Eigenvectors Singular Value Decomposition Directional Derivative, Gradient Books 104of 112
  • 27. Introduction to Statistical Eigenvectors Machine Learning c 2011 Christfried Webers NICTA The Australian National University Every square matrix A ∈ Rn×n has an Eigenvector decomposition Ax = λx Basic Concepts where x ∈ Rn and λ ∈ C. Linear Transformations Trace Example: Inner Product 0 1 x = λx Projection −1 0 Rank, Determinant, Trace Matrix Inverse λ = {−ı, ı} Eigenvectors Singular Value ı −ı Decomposition x= , Directional Derivative, 1 1 Gradient Books 105of 112
  • 28. Introduction to Statistical Eigenvectors Machine Learning c 2011 Christfried Webers NICTA The Australian National University How many eigenvalue/eigenvector pairs? Basic Concepts Linear Transformations Ax = λx Trace Inner Product is equivalent to Projection (A − λI)x = 0 Rank, Determinant, Trace Has only non-trivial solution for det {A − λI} = 0 Matrix Inverse Eigenvectors polynom of nth order; at most n distinct solutions Singular Value Decomposition Directional Derivative, Gradient Books 106of 112
  • 29. Introduction to Statistical Real Eigenvalues Machine Learning c 2011 Christfried Webers NICTA The Australian National University How can we enforce real eigenvalues? Let’s look at matrices with complex entries A ∈ Cn×n . Basic Concepts Linear Transformations Transposition is replaced by Hermitian adjoint, e.g. Trace Inner Product H 1 + ı2 3 + ı4 1 − ı2 5 − ı6 Projection = 5 + ı6 7 + ı8 7 − ı8 7 − ı8 Rank, Determinant, Trace Matrix Inverse Denote the complex conjugate of a complex number λ by Eigenvectors λ. Singular Value Decomposition Directional Derivative, Gradient Books 107of 112
  • 30. Introduction to Statistical Real Eigenvalues Machine Learning c 2011 Christfried Webers NICTA How can we enforce real eigenvalues? The Australian National University Let’s assume A ∈ Cn×n , Hermitian (AH = A). Calculate xH Ax = λxH x for an eigenvector x ∈ Cn of A. Basic Concepts Another possibility to calculate xH Ax Linear Transformations Trace xH Ax = xH AH x (A is Hermitian) Inner Product H H Projection = (x Ax) (reverse order) Rank, Determinant, Trace = (λxH x)H (eigenvalue) Matrix Inverse H Eigenvectors = λx x Singular Value Decomposition and therefore Directional Derivative, Gradient λ=λ (λ is real). Books If A is Hermitian, then all eigenvalues are real. Special case: If A has only real entries and is symmetric, then all eigenvalues are real. 108of 112
  • 31. Introduction to Statistical Singular Value Decomposition Machine Learning c 2011 Christfried Webers NICTA The Australian National University Every matrix A ∈ Rn×p can be decomposed into a product of Basic Concepts three matrices Linear Transformations A = UΣV T Trace Inner Product where U ∈ Rn×n and V ∈ Rp×p are orthogonal matrices ( Projection U T U = I and V T V = I ), and Σ ∈ Rn×p has nonnegative Rank, Determinant, Trace numbers on the diagonal. Matrix Inverse Eigenvectors Singular Value Decomposition Directional Derivative, Gradient Books 109of 112
  • 32. Introduction to Statistical How to calculate a gradient? Machine Learning c 2011 Christfried Webers NICTA The Australian National Given a vector space V, and a function f : V → R. University Example: X ∈ Rn×p , arbitrary C ∈ Rn×n , f (X) = tr X T CX . Basic Concepts How to calculate the gradient of f ? Linear Transformations Ill-defined question. There is no gradient in a vector space. Trace Inner Product Only a Directional Derivative at point X ∈ V in direction Projection ξ ∈ V. Rank, Determinant, Trace f (X + hξ) − f (X) Df (X)(ξ) = lim Matrix Inverse h→0 h Eigenvectors For the example f (X) = tr X T CX Singular Value Decomposition Directional Derivative, tr (X + hξ)T C(X + hξ) − tr X T CX Gradient Df (X)(ξ) = lim Books h→0 h = tr ξ CX + X Cξ = tr X T (CT + C)ξ T T 110of 112
  • 33. Introduction to Statistical How to calculate a gradient? Machine Learning c 2011 Christfried Webers NICTA The Australian National University Given a vector space V, and a function f : V → R. How to calculate the gradient of f ? 1 Calculate the Directional Derivative 2 Define an inner product A, B on the vector space Basic Concepts Linear Transformations 3 The gradient is now defined as Trace Inner Product Df (X)(ξ) = grad f , ξ Projection Rank, Determinant, Trace For the example f (X) = tr X T CX Matrix Inverse Eigenvectors Define the inner product A, B = tr AT B . Singular Value Decomposition Write the Directional Derivative as an inner product with ξ Directional Derivative, Gradient T T Df (X)(ξ) = tr X (C + C)ξ Books T = (C + C )X, ξ = grad f , ξ 111of 112
  • 34. Introduction to Statistical Books Machine Learning c 2011 Christfried Webers NICTA The Australian National University Basic Concepts Gilbert Strang, "Introduction to Linear Algebra", Wellesley Linear Transformations Cambridge, 2009. Trace Inner Product David C. Lay, "Linear Algebra and Its Applications", Projection Addison Wesley, 2005. Rank, Determinant, Trace Matrix Inverse Eigenvectors Singular Value Decomposition Directional Derivative, Gradient Books 112of 112