SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Probability Densities
                        in Data Mining
                                                    Andrew W. Moore
  Note to other teachers and users of
  these slides. Andrew would be delighted

                                                         Professor
  if you found this source material useful in
  giving your own lectures. Feel free to use
  these slides verbatim, or to modify them

                                                School of Computer Science
  to fit your own needs. PowerPoint
  originals are available. If you make use
  of a significant portion of these slides in

                                                Carnegie Mellon University
  your own lecture, please include this
  message, or the following link to the
  source repository of Andrew’s tutorials:
                                                      www.cs.cmu.edu/~awm
  http://www.cs.cmu.edu/~awm/tutorials .
  Comments and corrections gratefully
                                                        awm@cs.cmu.edu
  received.

                                                          412-268-7599



Copyright © Andrew W. Moore                                                  Slide 1




        Probability Densities in Data Mining
    • Why we should care
    • Notation and Fundamentals of continuous
      PDFs
    • Multivariate continuous PDFs
    • Combining continuous and discrete random
      variables




Copyright © Andrew W. Moore                                                  Slide 2




                                                                                       1
Why we should care
    • Real Numbers occur in at least 50% of
      database records
    • Can’t always quantize them
    • So need to understand how to describe
      where they come from
    • A great way of saying what’s a reasonable
      range of values
    • A great way of saying how multiple
      attributes should reasonably co-occur

Copyright © Andrew W. Moore                          Slide 3




                              Why we should care
    • Can immediately get us Bayes Classifiers
      that are sensible with real-valued data
    • You’ll need to intimately understand PDFs in
      order to do kernel methods, clustering with
      Mixture Models, analysis of variance, time
      series and many other things
    • Will introduce us to linear and non-linear
      regression


Copyright © Andrew W. Moore                          Slide 4




                                                               2
A PDF of American Ages in 2000




Copyright © Andrew W. Moore                                              Slide 5




       A PDF of American Ages in 2000
                                 Let X be a continuous random
                                 variable.
                                 If p(x) is a Probability Density
                                 Function for X then…
                                                      b
                                  P(a < X ≤ b ) =     ∫ p( x)dx
                                                     x=a


                                                       50
                              P(30 < Age ≤ 50 ) =         ∫ p(age)dage
                                                     age = 30


                                           = 0.36


Copyright © Andrew W. Moore                                              Slide 6




                                                                                   3
Properties of PDFs
                                                        b
                                      P(a < X ≤ b ) =   ∫ p( x)dx
                                                        x=a

                                               That means…


                                                          ⎛              h⎞
                                                               h
                                                         P⎜ x − < X ≤ x + ⎟
                                           p( x) = lim ⎝
                                                                         2⎠
                                                               2
                                                                 h
                                                    h →0



                                                  ∂
                                                     P( X ≤ x ) = p (x)
                                                  ∂x

Copyright © Andrew W. Moore                                                     Slide 7




                              Properties of PDFs




                                                                ∞
                               b
        P(a < X ≤ b ) =        ∫ p( x)dx                        ∫ p( x)dx = 1
                                           Therefore…
                               x=a                            x = −∞


           ∂
              P( X ≤ x ) = p(x)                               ∀x : p ( x) ≥ 0
                                           Therefore…
           ∂x

Copyright © Andrew W. Moore                                                     Slide 8




                                                                                          4
Talking to your stomach
    • What’s the gut-feel meaning of p(x)?

    If
                    p(5.31) = 0.06 and p(5.92) = 0.03
    then
                    when a value X is sampled from the
                     distribution, you are 2 times as likely to find
                     that X is “very close to” 5.31 than that X is
                     “very close to” 5.92.

Copyright © Andrew W. Moore                                             Slide 9




                         Talking to your stomach
    • What’s the gut-feel meaning of p(x)?

    If
                    p(5.31) = 0.06 and p(5.92) = 0.03
                       a                  b
    then
                    when a value X is sampled from the
                     distribution, you are 2 times as likely to find
                                                a
                     that X is “very close to” 5.31 than that X is
                                      b
                     “very close to” 5.92.

Copyright © Andrew W. Moore                                            Slide 10




                                                                                  5
Talking to your stomach
    • What’s the gut-feel meaning of p(x)?

    If
                              2z                  z
                    p(5.31) = 0.03 and p(5.92) = 0.06
                       a                  b
    then
                    when a value X is sampled from the
                     distribution, you are 2 times as likely to find
                                                a
                     that X is “very close to” 5.31 than that X is
                                      b
                     “very close to” 5.92.

Copyright © Andrew W. Moore                                            Slide 11




                         Talking to your stomach
    • What’s the gut-feel meaning of p(x)?

    If
                              αz                  z
                    p(5.31) = 0.03 and p(5.92) = 0.06
                       a                  b
    then
                    when a value X is sampled from the
                     distribution, you are α times as likely to find
                                                a
                     that X is “very close to” 5.31 than that X is
                                      b
                     “very close to” 5.92.

Copyright © Andrew W. Moore                                            Slide 12




                                                                                  6
Talking to your stomach
    • What’s the gut-feel meaning of p(x)?

    If                        p(a)
                                    =α
                              p (b)
    then
                    when a value X is sampled from the
                     distribution, you are α times as likely to find
                                                a
                     that X is “very close to” 5.31 than that X is
                                      b
                     “very close to” 5.92.

Copyright © Andrew W. Moore                                            Slide 13




                         Talking to your stomach
    • What’s the gut-feel meaning of p(x)?

    If                        p(a)
                                    =α
                              p (b)
    then
                          P ( a − h < X < a + h)
                                                 =α
              lim
                          P(b − h < X < b + h)
              h →0




Copyright © Andrew W. Moore                                            Slide 14




                                                                                  7
Yet another way to view a PDF
                                                    A recipe for sampling a random
                                                        age.
                                                    1. Generate a random dot
                                                       from the rectangle
                                                       surrounding the PDF curve.
                                                       Call the dot (age,d)
                                                    2. If d < p(age) stop and
                                                       return age
                                                    3. Else try again: go to Step 1.




Copyright © Andrew W. Moore                                                       Slide 15




                        Test your understanding
    • True or False:
                                 ∀x : p ( x) ≤ 1

                              ∀x : P ( X = x) = 0




Copyright © Andrew W. Moore                                                       Slide 16




                                                                                             8
Expectations
                                              E[X] = the expected value of
                                              random variable X
                                              = the average value we’d see
                                              if we took a very large number
                                              of random samples of X
                                                          ∞

                                                          ∫ x p( x) dx
                                                    =
                                                        x = −∞




Copyright © Andrew W. Moore                                                  Slide 17




                                 Expectations
                                              E[X] = the expected value of
                                              random variable X
                                              = the average value we’d see
                                              if we took a very large number
                                              of random samples of X
                                                          ∞

                                                          ∫ x p( x) dx
                                                    =
                                                        x = −∞
                              E[age]=35.897   = the first moment of the
                                              shape formed by the axes and
                                              the blue curve
                                              = the best value to choose if
                                              you must guess an unknown
                                              person’s age and you’ll be
                                              fined the square of your error
Copyright © Andrew W. Moore                                                  Slide 18




                                                                                        9
Expectation of a function
                                                       μ=E[f(X)] = the expected
                                                       value of f(x) where x is drawn
                                                       from X’s distribution.
                                                       = the average value we’d see
                                                       if we took a very large number
                                                       of random samples of f(X)
                                                                 ∞

                                                                 ∫ f ( x) p( x) dx
                                                         μ=
                               E[age ] = 1786.64
                                     2
                                                               x = −∞


                              ( E[age]) 2 = 1288.62      Note that in general:
                                                          E[ f ( x)] ≠ f ( E[ X ])



Copyright © Andrew W. Moore                                                               Slide 19




                                     Variance
                                  σ2 = Var[X] = the
                                  expected squared               ∞
                                  difference between
                                                                 ∫ (x − μ )
                                                        σ=                        p ( x) dx
                                                          2                   2
                                  x and E[X]
                                                               x = −∞

                                                        = amount you’d expect to lose
                                                        if you must guess an unknown
                                                        person’s age and you’ll be
                                                        fined the square of your error,
                              Var[age] = 498.02         and assuming you play
                                                        optimally




Copyright © Andrew W. Moore                                                               Slide 20




                                                                                                     10
Standard Deviation
                                  σ2 = Var[X] = the
                                  expected squared              ∞
                                  difference between
                                                                ∫ (x − μ )
                                                       σ=                        p ( x) dx
                                                         2                   2
                                  x and E[X]
                                                              x = −∞

                                                       = amount you’d expect to lose
                                                       if you must guess an unknown
                                                       person’s age and you’ll be
                                                       fined the square of your error,
                               Var[age] = 498.02       and assuming you play
                                                       optimally
                               σ = 22.32
                                                       σ = Standard Deviation =
                                                       “typical” deviation of X from
                                                       its mean

                                                             σ = Var[ X ]
Copyright © Andrew W. Moore                                                              Slide 21




             In 2
          dimensions

                                                 p(x,y) = probability density of
                                                   random variables (X,Y) at
                                                         location (x,y)




Copyright © Andrew W. Moore                                                              Slide 22




                                                                                                    11
In 2                 Let X,Y be a pair of continuous random
                                  variables, and let R be some region of (X,Y)

          dimensions              space…


                                                       ∫∫ p( x, y)dydx
                              P(( X , Y ) ∈ R) =
                                                    ( x , y )∈R




Copyright © Andrew W. Moore                                                  Slide 23




             In 2                 Let X,Y be a pair of continuous random
                                  variables, and let R be some region of (X,Y)

          dimensions              space…


                                                       ∫∫ p( x, y)dydx
                              P(( X , Y ) ∈ R) =
                                                    ( x , y )∈R



                                            P( 20<mpg<30 and
                                               2500<weight<3000) =

                                            area under the 2-d surface within
                                            the red rectangle




Copyright © Andrew W. Moore                                                  Slide 24




                                                                                        12
In 2                                 Let X,Y be a pair of continuous random
                                                  variables, and let R be some region of (X,Y)

          dimensions                              space…


                                                                       ∫∫ p( x, y)dydx
                                              P(( X , Y ) ∈ R) =
                                                                    ( x , y )∈R


                                                            P( [(mpg-25)/10]2 +
                                                               [(weight-3300)/1500]2
                                                                 <1)=

                                                            area under the 2-d surface within
                                                            the red oval




Copyright © Andrew W. Moore                                                                  Slide 25




             In 2                                 Let X,Y be a pair of continuous random
                                                  variables, and let R be some region of (X,Y)

          dimensions                              space…


                                                                       ∫∫ p( x, y)dydx
                                              P(( X , Y ) ∈ R) =
                                                                    ( x , y )∈R


          Take the special case of region R = “everywhere”.
          Remember that with probability 1, (X,Y) will be drawn from
          “somewhere”.
          So..
                                ∞     ∞

                                ∫ ∫ p( x, y)dydx = 1
                              x = −∞ y = −∞



Copyright © Andrew W. Moore                                                                  Slide 26




                                                                                                        13
In 2                                 Let X,Y be a pair of continuous random
                                                  variables, and let R be some region of (X,Y)

          dimensions                              space…


                                                                         ∫∫ p( x, y)dydx
                                           P(( X , Y ) ∈ R) =
                                                                      ( x , y )∈R


                                      ⎛                                                      h⎞
                                           h          h                             h
                                     P⎜ x − < X ≤ x +              ∧         y−       <Y ≤ y+ ⎟
 p( x, y ) = lim                      ⎝                                                      2⎠
                                           2          2                             2
                                                                 h2
                              h →0




Copyright © Andrew W. Moore                                                                  Slide 27




             In m                                 Let (X1,X2,…Xm) be an n-tuple of continuous
                                                  random variables, and let R be some region

          dimensions                              of Rm …




    P(( X 1 , X 2 ,..., X m ) ∈ R) =

               ∫∫ ...∫ p( x , x ,..., x                        )dxm , ,...dx2 , dx1
                                                           m
                                       1      2
( x1 , x2 ,..., xm )∈R




Copyright © Andrew W. Moore                                                                  Slide 28




                                                                                                        14
Independence
               X ⊥ Y iff ∀x, y : p( x, y ) = p( x) p( y )
                                         If X and Y are independent
                                        then knowing the value of X
                                          does not help predict the
                                                  value of Y




                                             mpg,weight NOT
                                              independent




Copyright © Andrew W. Moore                                           Slide 29




                              Independence
               X ⊥ Y iff ∀x, y : p( x, y ) = p( x) p( y )
                                         If X and Y are independent
                                        then knowing the value of X
                                          does not help predict the
                                                  value of Y




                                           the contours say that
                                        acceleration and weight are
                                                independent



Copyright © Andrew W. Moore                                           Slide 30




                                                                                 15
Multivariate Expectation
                              μ X = E[ X] = ∫ x p(x)d x


                                                 E[mpg,weight] =
                                                 (24.5,2600)



                                                    The centroid of the
                                                    cloud



Copyright © Andrew W. Moore                                               Slide 31




                         Multivariate Expectation
                         E[ f ( X)] = ∫ f (x) p(x)d x




Copyright © Andrew W. Moore                                               Slide 32




                                                                                     16
Test your understanding
  Question : When (if ever) does E[ X + Y ] = E[ X ] + E[Y ] ?

        •All the time?
        •Only when X and Y are independent?
        •It can fail even if X and Y are independent?




Copyright © Andrew W. Moore                                                         Slide 33




                              Bivariate Expectation
                          E[ f ( x, y )] = ∫ f ( x, y ) p ( x, y )dydx

                    if f ( x, y ) = x then E[ f ( X , Y )] = ∫ x p( x, y )dydx

                    if f ( x, y ) = y then E[ f ( X , Y )] = ∫ y p( x, y )dydx

           if f ( x, y ) = x + y then E[ f ( X , Y )] = ∫ ( x + y ) p( x, y )dydx


                                    E[ X + Y ] = E[ X ] + E[Y ]
Copyright © Andrew W. Moore                                                         Slide 34




                                                                                               17
Bivariate Covariance
                  σ xy = Cov[ X , Y ] = E[( X − μ x )(Y − μ y )]

      σ xx = σ 2 x = Cov[ X , X ] = Var[ X ] = E[( X − μ x ) 2 ]
       σ yy = σ 2 y = Cov[Y , Y ] = Var[Y ] = E[(Y − μ y ) 2 ]




Copyright © Andrew W. Moore                                        Slide 35




                              Bivariate Covariance
                  σ xy = Cov[ X , Y ] = E[( X − μ x )(Y − μ y )]

      σ xx = σ 2 x = Cov[ X , X ] = Var[ X ] = E[( X − μ x ) 2 ]
       σ yy = σ 2 y = Cov[Y , Y ] = Var[Y ] = E[(Y − μ y ) 2 ]
                                          ⎛X⎞
                                Write X = ⎜ ⎟ , then
                                          ⎜Y ⎟
                                          ⎝⎠
                                          ⎛ σ 2 x σ xy ⎞
 Cov[ X] = E[(X − μ x )(X − μ x ) ] = Σ = ⎜            ⎟
                                                T
                                          ⎜σ        2⎟
                                          ⎝ xy σ y ⎠
Copyright © Andrew W. Moore                                        Slide 36




                                                                              18
Covariance Intuition




                                                            E[mpg,weight] =
                                                            (24.5,2600)
                     σ weight = 700



                     σ weight = 700
                                      σ mpg = 8 σ mpg = 8




Copyright © Andrew W. Moore                                                         Slide 37




                               Covariance Intuition
                                                                      Principal
                                                                      Eigenvector
                                                                           Σ
                                                                      of




                                                            E[mpg,weight] =
                                                            (24.5,2600)
                     σ weight = 700



                     σ weight = 700
                                      σ mpg = 8 σ mpg = 8




Copyright © Andrew W. Moore                                                         Slide 38




                                                                                               19
Covariance Fun Facts
                                         ⎛ σ 2 x σ xy ⎞
Cov[ X] = E[(X − μ x )(X − μ x ) ] = Σ = ⎜            ⎟
                                                      T
                                         ⎜σ        2⎟
                                         ⎝ xy σ y ⎠
   •True or False: If σxy = 0 then X and Y are
   independent
   •True or False: If X and Y are independent                              How could
   then σxy = 0                                                            you prove
                                                                           or disprove
   •True or False: If σxy = σx σy then X and Y are                         these?
   deterministically related
   •True or False: If X and Y are deterministically
   related then σxy = σx σy

Copyright © Andrew W. Moore                                                          Slide 39




                              General Covariance
      Let X = (X1,X2, … Xk) be a vector of k continuous random variables


                       Cov[ X] = E[(X − μ x )(X − μ x )T ] = Σ

                                Σ ij = Cov[ X i , X j ] = σ xi x j

        S is a k x k symmetric non-negative definite matrix
        If all distributions are linearly independent it is positive definite
        If the distributions are linearly dependent it has determinant zero


Copyright © Andrew W. Moore                                                          Slide 40




                                                                                                20
Test your understanding
Question : When (if ever) does Var[ X + Y ] = Var[ X ] + Var[Y ] ?

        •All the time?
        •Only when X and Y are independent?
        •It can fail even if X and Y are independent?




Copyright © Andrew W. Moore                                    Slide 41




                               Marginal Distributions




                               ∞

                                ∫ p( x, y)dy
      p( x) =
                              y = −∞

Copyright © Andrew W. Moore                                    Slide 42




                                                                          21
Conditional
 p (mpg | weight = 4600)



                                             Distributions

 p (mpg | weight = 3200)




  p (mpg | weight = 2000)




                                                   p( x | y) =
                                           p.d.f. of X when Y = y
Copyright © Andrew W. Moore                                         Slide 43




                                             Conditional
 p (mpg | weight = 4600)



                                             Distributions


                              p ( x, y )
            p( x | y ) =
                               p( y)


                          Why?


                                                   p( x | y) =
                                           p.d.f. of X when Y = y
Copyright © Andrew W. Moore                                         Slide 44




                                                                               22
Independence Revisited
                X ⊥ Y iff ∀x, y : p ( x, y ) = p ( x) p ( y )
           It’s easy to prove that these statements are equivalent…

                              ∀x, y : p ( x, y ) = p ( x) p ( y )
                                             ⇔
                                ∀x, y : p ( x | y ) = p ( x)
                                             ⇔
                                ∀x, y : p ( y | x) = p ( y )
Copyright © Andrew W. Moore                                                      Slide 45




                              More useful stuff
                         ∞

                         ∫ p ( x | y )dx = 1
                                               (These can all be
                                               proved from
                                                              definitions on
                    x = −∞                                    previous slides)

                                         p ( x, y | z )
               p( x | y, z ) =
                                          p( y | z )

                            p( y | x) p( x)                           Bayes
                p( x | y) =                                            Rule

                                p( y)
Copyright © Andrew W. Moore                                                      Slide 46




                                                                                            23
Mixing discrete and continuous variables
                                       ⎛                       ⎞
                                            h         h
                                      P⎜ x − < X ≤ x + ∧ A = v ⎟
                p ( x, A = v) = lim ⎝                          ⎠
                                            2         2
                                                  h
                                 h →0

                                       ∞
                                 nA

                                 ∑ ∫ p( x, A = v)dx = 1
                                 v =1 x = −∞


                                               P( A | x) p( x)   Bayes
                               p ( x | A) =                       Rule
                                                   P ( A)

                                                                 Bayes
                                           p ( x | A) P ( A)
                              P ( A | x) =                        Rule
                                                 p ( x)

Copyright © Andrew W. Moore                                              Slide 47




      Mixing discrete and continuous variables

   P(EduYears,Wealthy)




Copyright © Andrew W. Moore                                              Slide 48




                                                                                    24
Mixing discrete and continuous variables

   P(EduYears,Wealthy)




                                      P(Wealthy| EduYears)




Copyright © Andrew W. Moore                             Slide 49




      Mixing discrete and continuous variables

   P(EduYears,Wealthy)




                                      P(Wealthy| EduYears)

 P(EduYears|Wealthy)
                       Renormalized
                          Axes




Copyright © Andrew W. Moore                             Slide 50




                                                                   25
What you should know
    • You should be able to play with discrete,
      continuous and mixed joint distributions
    • You should be happy with the difference
      between p(x) and P(A)
    • You should be intimate with expectations of
      continuous and discrete random variables
    • You should smile when you meet a
      covariance matrix
    • Independence and its consequences should
      be second nature
Copyright © Andrew W. Moore                               Slide 51




                                   Discussion
    • Are PDFs the only sensible way to handle analysis
      of real-valued variables?
    • Why is covariance an important concept?
    • Suppose X and Y are independent real-valued
      random variables distributed between 0 and 1:
            • What is p[min(X,Y)]?
            • What is E[min(X,Y)]?
    • Prove that E[X] is the value u that minimizes
          E[(X-u)2]
    • What is the value u that minimizes E[|X-u|]?

Copyright © Andrew W. Moore                               Slide 52




                                                                     26

Weitere ähnliche Inhalte

Was ist angesagt?

5 cramer-rao lower bound
5 cramer-rao lower bound5 cramer-rao lower bound
5 cramer-rao lower boundSolo Hermelin
 
Least Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaLeast Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaEdureka!
 
Bayesian Linear Regression.pptx
Bayesian Linear Regression.pptxBayesian Linear Regression.pptx
Bayesian Linear Regression.pptxJerminJershaTC
 
Lesson 21: Partial Derivatives in Economics
Lesson 21: Partial Derivatives in EconomicsLesson 21: Partial Derivatives in Economics
Lesson 21: Partial Derivatives in EconomicsMatthew Leingang
 
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...Dexlab Analytics
 
Continuous Random variable
Continuous Random variableContinuous Random variable
Continuous Random variableJay Patel
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regressionalok tiwari
 
Point and Interval Estimation
Point and Interval EstimationPoint and Interval Estimation
Point and Interval EstimationShubham Mehta
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimationzihad164
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability DistributionsCIToolkit
 

Was ist angesagt? (20)

5 cramer-rao lower bound
5 cramer-rao lower bound5 cramer-rao lower bound
5 cramer-rao lower bound
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
 
Least Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaLeast Squares Regression Method | Edureka
Least Squares Regression Method | Edureka
 
Bayesian Linear Regression.pptx
Bayesian Linear Regression.pptxBayesian Linear Regression.pptx
Bayesian Linear Regression.pptx
 
Point estimation
Point estimationPoint estimation
Point estimation
 
Bernoulli distribution
Bernoulli distributionBernoulli distribution
Bernoulli distribution
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Linear regression
Linear regression Linear regression
Linear regression
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Lesson 21: Partial Derivatives in Economics
Lesson 21: Partial Derivatives in EconomicsLesson 21: Partial Derivatives in Economics
Lesson 21: Partial Derivatives in Economics
 
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Continuous Random variable
Continuous Random variableContinuous Random variable
Continuous Random variable
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regression
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Point and Interval Estimation
Point and Interval EstimationPoint and Interval Estimation
Point and Interval Estimation
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimation
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
 
Linear regression
Linear regressionLinear regression
Linear regression
 

Ähnlich wie Probability Density Functions

Probability density in data mining and coveriance
Probability density in data mining and coverianceProbability density in data mining and coveriance
Probability density in data mining and coverianceudhayax793
 
Information Gain
Information GainInformation Gain
Information Gainguest32311f
 
Probability for Data Miners
Probability for Data MinersProbability for Data Miners
Probability for Data Minersguestfee8698
 
Predicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regressionPredicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regressionguestfee8698
 
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
2013-1 Machine Learning Lecture 02 - Andrew Moore: EntropyDongseo University
 

Ähnlich wie Probability Density Functions (8)

Gaussians
GaussiansGaussians
Gaussians
 
Probability density in data mining and coveriance
Probability density in data mining and coverianceProbability density in data mining and coveriance
Probability density in data mining and coveriance
 
Information Gain
Information GainInformation Gain
Information Gain
 
Probability for Data Miners
Probability for Data MinersProbability for Data Miners
Probability for Data Miners
 
Bayesian Networks
Bayesian NetworksBayesian Networks
Bayesian Networks
 
Predicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regressionPredicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regression
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
2013-1 Machine Learning Lecture 02 - Andrew Moore: Entropy
 

Kürzlich hochgeladen

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Kürzlich hochgeladen (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Probability Density Functions

  • 1. Probability Densities in Data Mining Andrew W. Moore Note to other teachers and users of these slides. Andrew would be delighted Professor if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them School of Computer Science to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in Carnegie Mellon University your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: www.cs.cmu.edu/~awm http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully awm@cs.cmu.edu received. 412-268-7599 Copyright © Andrew W. Moore Slide 1 Probability Densities in Data Mining • Why we should care • Notation and Fundamentals of continuous PDFs • Multivariate continuous PDFs • Combining continuous and discrete random variables Copyright © Andrew W. Moore Slide 2 1
  • 2. Why we should care • Real Numbers occur in at least 50% of database records • Can’t always quantize them • So need to understand how to describe where they come from • A great way of saying what’s a reasonable range of values • A great way of saying how multiple attributes should reasonably co-occur Copyright © Andrew W. Moore Slide 3 Why we should care • Can immediately get us Bayes Classifiers that are sensible with real-valued data • You’ll need to intimately understand PDFs in order to do kernel methods, clustering with Mixture Models, analysis of variance, time series and many other things • Will introduce us to linear and non-linear regression Copyright © Andrew W. Moore Slide 4 2
  • 3. A PDF of American Ages in 2000 Copyright © Andrew W. Moore Slide 5 A PDF of American Ages in 2000 Let X be a continuous random variable. If p(x) is a Probability Density Function for X then… b P(a < X ≤ b ) = ∫ p( x)dx x=a 50 P(30 < Age ≤ 50 ) = ∫ p(age)dage age = 30 = 0.36 Copyright © Andrew W. Moore Slide 6 3
  • 4. Properties of PDFs b P(a < X ≤ b ) = ∫ p( x)dx x=a That means… ⎛ h⎞ h P⎜ x − < X ≤ x + ⎟ p( x) = lim ⎝ 2⎠ 2 h h →0 ∂ P( X ≤ x ) = p (x) ∂x Copyright © Andrew W. Moore Slide 7 Properties of PDFs ∞ b P(a < X ≤ b ) = ∫ p( x)dx ∫ p( x)dx = 1 Therefore… x=a x = −∞ ∂ P( X ≤ x ) = p(x) ∀x : p ( x) ≥ 0 Therefore… ∂x Copyright © Andrew W. Moore Slide 8 4
  • 5. Talking to your stomach • What’s the gut-feel meaning of p(x)? If p(5.31) = 0.06 and p(5.92) = 0.03 then when a value X is sampled from the distribution, you are 2 times as likely to find that X is “very close to” 5.31 than that X is “very close to” 5.92. Copyright © Andrew W. Moore Slide 9 Talking to your stomach • What’s the gut-feel meaning of p(x)? If p(5.31) = 0.06 and p(5.92) = 0.03 a b then when a value X is sampled from the distribution, you are 2 times as likely to find a that X is “very close to” 5.31 than that X is b “very close to” 5.92. Copyright © Andrew W. Moore Slide 10 5
  • 6. Talking to your stomach • What’s the gut-feel meaning of p(x)? If 2z z p(5.31) = 0.03 and p(5.92) = 0.06 a b then when a value X is sampled from the distribution, you are 2 times as likely to find a that X is “very close to” 5.31 than that X is b “very close to” 5.92. Copyright © Andrew W. Moore Slide 11 Talking to your stomach • What’s the gut-feel meaning of p(x)? If αz z p(5.31) = 0.03 and p(5.92) = 0.06 a b then when a value X is sampled from the distribution, you are α times as likely to find a that X is “very close to” 5.31 than that X is b “very close to” 5.92. Copyright © Andrew W. Moore Slide 12 6
  • 7. Talking to your stomach • What’s the gut-feel meaning of p(x)? If p(a) =α p (b) then when a value X is sampled from the distribution, you are α times as likely to find a that X is “very close to” 5.31 than that X is b “very close to” 5.92. Copyright © Andrew W. Moore Slide 13 Talking to your stomach • What’s the gut-feel meaning of p(x)? If p(a) =α p (b) then P ( a − h < X < a + h) =α lim P(b − h < X < b + h) h →0 Copyright © Andrew W. Moore Slide 14 7
  • 8. Yet another way to view a PDF A recipe for sampling a random age. 1. Generate a random dot from the rectangle surrounding the PDF curve. Call the dot (age,d) 2. If d < p(age) stop and return age 3. Else try again: go to Step 1. Copyright © Andrew W. Moore Slide 15 Test your understanding • True or False: ∀x : p ( x) ≤ 1 ∀x : P ( X = x) = 0 Copyright © Andrew W. Moore Slide 16 8
  • 9. Expectations E[X] = the expected value of random variable X = the average value we’d see if we took a very large number of random samples of X ∞ ∫ x p( x) dx = x = −∞ Copyright © Andrew W. Moore Slide 17 Expectations E[X] = the expected value of random variable X = the average value we’d see if we took a very large number of random samples of X ∞ ∫ x p( x) dx = x = −∞ E[age]=35.897 = the first moment of the shape formed by the axes and the blue curve = the best value to choose if you must guess an unknown person’s age and you’ll be fined the square of your error Copyright © Andrew W. Moore Slide 18 9
  • 10. Expectation of a function μ=E[f(X)] = the expected value of f(x) where x is drawn from X’s distribution. = the average value we’d see if we took a very large number of random samples of f(X) ∞ ∫ f ( x) p( x) dx μ= E[age ] = 1786.64 2 x = −∞ ( E[age]) 2 = 1288.62 Note that in general: E[ f ( x)] ≠ f ( E[ X ]) Copyright © Andrew W. Moore Slide 19 Variance σ2 = Var[X] = the expected squared ∞ difference between ∫ (x − μ ) σ= p ( x) dx 2 2 x and E[X] x = −∞ = amount you’d expect to lose if you must guess an unknown person’s age and you’ll be fined the square of your error, Var[age] = 498.02 and assuming you play optimally Copyright © Andrew W. Moore Slide 20 10
  • 11. Standard Deviation σ2 = Var[X] = the expected squared ∞ difference between ∫ (x − μ ) σ= p ( x) dx 2 2 x and E[X] x = −∞ = amount you’d expect to lose if you must guess an unknown person’s age and you’ll be fined the square of your error, Var[age] = 498.02 and assuming you play optimally σ = 22.32 σ = Standard Deviation = “typical” deviation of X from its mean σ = Var[ X ] Copyright © Andrew W. Moore Slide 21 In 2 dimensions p(x,y) = probability density of random variables (X,Y) at location (x,y) Copyright © Andrew W. Moore Slide 22 11
  • 12. In 2 Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) dimensions space… ∫∫ p( x, y)dydx P(( X , Y ) ∈ R) = ( x , y )∈R Copyright © Andrew W. Moore Slide 23 In 2 Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) dimensions space… ∫∫ p( x, y)dydx P(( X , Y ) ∈ R) = ( x , y )∈R P( 20<mpg<30 and 2500<weight<3000) = area under the 2-d surface within the red rectangle Copyright © Andrew W. Moore Slide 24 12
  • 13. In 2 Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) dimensions space… ∫∫ p( x, y)dydx P(( X , Y ) ∈ R) = ( x , y )∈R P( [(mpg-25)/10]2 + [(weight-3300)/1500]2 <1)= area under the 2-d surface within the red oval Copyright © Andrew W. Moore Slide 25 In 2 Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) dimensions space… ∫∫ p( x, y)dydx P(( X , Y ) ∈ R) = ( x , y )∈R Take the special case of region R = “everywhere”. Remember that with probability 1, (X,Y) will be drawn from “somewhere”. So.. ∞ ∞ ∫ ∫ p( x, y)dydx = 1 x = −∞ y = −∞ Copyright © Andrew W. Moore Slide 26 13
  • 14. In 2 Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) dimensions space… ∫∫ p( x, y)dydx P(( X , Y ) ∈ R) = ( x , y )∈R ⎛ h⎞ h h h P⎜ x − < X ≤ x + ∧ y− <Y ≤ y+ ⎟ p( x, y ) = lim ⎝ 2⎠ 2 2 2 h2 h →0 Copyright © Andrew W. Moore Slide 27 In m Let (X1,X2,…Xm) be an n-tuple of continuous random variables, and let R be some region dimensions of Rm … P(( X 1 , X 2 ,..., X m ) ∈ R) = ∫∫ ...∫ p( x , x ,..., x )dxm , ,...dx2 , dx1 m 1 2 ( x1 , x2 ,..., xm )∈R Copyright © Andrew W. Moore Slide 28 14
  • 15. Independence X ⊥ Y iff ∀x, y : p( x, y ) = p( x) p( y ) If X and Y are independent then knowing the value of X does not help predict the value of Y mpg,weight NOT independent Copyright © Andrew W. Moore Slide 29 Independence X ⊥ Y iff ∀x, y : p( x, y ) = p( x) p( y ) If X and Y are independent then knowing the value of X does not help predict the value of Y the contours say that acceleration and weight are independent Copyright © Andrew W. Moore Slide 30 15
  • 16. Multivariate Expectation μ X = E[ X] = ∫ x p(x)d x E[mpg,weight] = (24.5,2600) The centroid of the cloud Copyright © Andrew W. Moore Slide 31 Multivariate Expectation E[ f ( X)] = ∫ f (x) p(x)d x Copyright © Andrew W. Moore Slide 32 16
  • 17. Test your understanding Question : When (if ever) does E[ X + Y ] = E[ X ] + E[Y ] ? •All the time? •Only when X and Y are independent? •It can fail even if X and Y are independent? Copyright © Andrew W. Moore Slide 33 Bivariate Expectation E[ f ( x, y )] = ∫ f ( x, y ) p ( x, y )dydx if f ( x, y ) = x then E[ f ( X , Y )] = ∫ x p( x, y )dydx if f ( x, y ) = y then E[ f ( X , Y )] = ∫ y p( x, y )dydx if f ( x, y ) = x + y then E[ f ( X , Y )] = ∫ ( x + y ) p( x, y )dydx E[ X + Y ] = E[ X ] + E[Y ] Copyright © Andrew W. Moore Slide 34 17
  • 18. Bivariate Covariance σ xy = Cov[ X , Y ] = E[( X − μ x )(Y − μ y )] σ xx = σ 2 x = Cov[ X , X ] = Var[ X ] = E[( X − μ x ) 2 ] σ yy = σ 2 y = Cov[Y , Y ] = Var[Y ] = E[(Y − μ y ) 2 ] Copyright © Andrew W. Moore Slide 35 Bivariate Covariance σ xy = Cov[ X , Y ] = E[( X − μ x )(Y − μ y )] σ xx = σ 2 x = Cov[ X , X ] = Var[ X ] = E[( X − μ x ) 2 ] σ yy = σ 2 y = Cov[Y , Y ] = Var[Y ] = E[(Y − μ y ) 2 ] ⎛X⎞ Write X = ⎜ ⎟ , then ⎜Y ⎟ ⎝⎠ ⎛ σ 2 x σ xy ⎞ Cov[ X] = E[(X − μ x )(X − μ x ) ] = Σ = ⎜ ⎟ T ⎜σ 2⎟ ⎝ xy σ y ⎠ Copyright © Andrew W. Moore Slide 36 18
  • 19. Covariance Intuition E[mpg,weight] = (24.5,2600) σ weight = 700 σ weight = 700 σ mpg = 8 σ mpg = 8 Copyright © Andrew W. Moore Slide 37 Covariance Intuition Principal Eigenvector Σ of E[mpg,weight] = (24.5,2600) σ weight = 700 σ weight = 700 σ mpg = 8 σ mpg = 8 Copyright © Andrew W. Moore Slide 38 19
  • 20. Covariance Fun Facts ⎛ σ 2 x σ xy ⎞ Cov[ X] = E[(X − μ x )(X − μ x ) ] = Σ = ⎜ ⎟ T ⎜σ 2⎟ ⎝ xy σ y ⎠ •True or False: If σxy = 0 then X and Y are independent •True or False: If X and Y are independent How could then σxy = 0 you prove or disprove •True or False: If σxy = σx σy then X and Y are these? deterministically related •True or False: If X and Y are deterministically related then σxy = σx σy Copyright © Andrew W. Moore Slide 39 General Covariance Let X = (X1,X2, … Xk) be a vector of k continuous random variables Cov[ X] = E[(X − μ x )(X − μ x )T ] = Σ Σ ij = Cov[ X i , X j ] = σ xi x j S is a k x k symmetric non-negative definite matrix If all distributions are linearly independent it is positive definite If the distributions are linearly dependent it has determinant zero Copyright © Andrew W. Moore Slide 40 20
  • 21. Test your understanding Question : When (if ever) does Var[ X + Y ] = Var[ X ] + Var[Y ] ? •All the time? •Only when X and Y are independent? •It can fail even if X and Y are independent? Copyright © Andrew W. Moore Slide 41 Marginal Distributions ∞ ∫ p( x, y)dy p( x) = y = −∞ Copyright © Andrew W. Moore Slide 42 21
  • 22. Conditional p (mpg | weight = 4600) Distributions p (mpg | weight = 3200) p (mpg | weight = 2000) p( x | y) = p.d.f. of X when Y = y Copyright © Andrew W. Moore Slide 43 Conditional p (mpg | weight = 4600) Distributions p ( x, y ) p( x | y ) = p( y) Why? p( x | y) = p.d.f. of X when Y = y Copyright © Andrew W. Moore Slide 44 22
  • 23. Independence Revisited X ⊥ Y iff ∀x, y : p ( x, y ) = p ( x) p ( y ) It’s easy to prove that these statements are equivalent… ∀x, y : p ( x, y ) = p ( x) p ( y ) ⇔ ∀x, y : p ( x | y ) = p ( x) ⇔ ∀x, y : p ( y | x) = p ( y ) Copyright © Andrew W. Moore Slide 45 More useful stuff ∞ ∫ p ( x | y )dx = 1 (These can all be proved from definitions on x = −∞ previous slides) p ( x, y | z ) p( x | y, z ) = p( y | z ) p( y | x) p( x) Bayes p( x | y) = Rule p( y) Copyright © Andrew W. Moore Slide 46 23
  • 24. Mixing discrete and continuous variables ⎛ ⎞ h h P⎜ x − < X ≤ x + ∧ A = v ⎟ p ( x, A = v) = lim ⎝ ⎠ 2 2 h h →0 ∞ nA ∑ ∫ p( x, A = v)dx = 1 v =1 x = −∞ P( A | x) p( x) Bayes p ( x | A) = Rule P ( A) Bayes p ( x | A) P ( A) P ( A | x) = Rule p ( x) Copyright © Andrew W. Moore Slide 47 Mixing discrete and continuous variables P(EduYears,Wealthy) Copyright © Andrew W. Moore Slide 48 24
  • 25. Mixing discrete and continuous variables P(EduYears,Wealthy) P(Wealthy| EduYears) Copyright © Andrew W. Moore Slide 49 Mixing discrete and continuous variables P(EduYears,Wealthy) P(Wealthy| EduYears) P(EduYears|Wealthy) Renormalized Axes Copyright © Andrew W. Moore Slide 50 25
  • 26. What you should know • You should be able to play with discrete, continuous and mixed joint distributions • You should be happy with the difference between p(x) and P(A) • You should be intimate with expectations of continuous and discrete random variables • You should smile when you meet a covariance matrix • Independence and its consequences should be second nature Copyright © Andrew W. Moore Slide 51 Discussion • Are PDFs the only sensible way to handle analysis of real-valued variables? • Why is covariance an important concept? • Suppose X and Y are independent real-valued random variables distributed between 0 and 1: • What is p[min(X,Y)]? • What is E[min(X,Y)]? • Prove that E[X] is the value u that minimizes E[(X-u)2] • What is the value u that minimizes E[|X-u|]? Copyright © Andrew W. Moore Slide 52 26