SlideShare ist ein Scribd-Unternehmen logo
1 von 160
Classification ,[object Object],[object Object],[object Object],modified by Donghui Zhang Integrated with slides from Prof. Andrew W. Moore http:// www.cs.cmu.edu/~awm/tutorials
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Classification vs. Prediction
Classification—A Two-Step Process   ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Classification Process (1): Model Construction Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’  Training Data Classifier (Model)
Classification Process (2): Use the Model in Prediction (Jeff, Professor, 4) Tenured? Classifier Testing Data Unseen Data
Supervised vs. Unsupervised Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Evaluating Classification Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Training Dataset This follows an  example from Quinlan’s ID3
Output: A Decision Tree for “ buys_computer” age? overcast student? credit rating? no yes fair excellent <=30 >40 no no yes yes yes 30..40
Extracting Classification Rules from Trees ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Algorithm for Decision Tree Induction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Information gain slides adapted from Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email_address] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials:  http://www.cs.cmu.edu/~awm/tutorials  . Comments and corrections gratefully received.
Bits ,[object Object],[object Object],[object Object],[object Object],[object Object],P(X=C) = 1/4 P(X=B) = 1/4 P(X=D) = 1/4 P(X=A) = 1/4
Fewer Bits ,[object Object],[object Object],[object Object],P(X=C) = 1/8 P(X=B) = 1/4 P(X=D) = 1/8 P(X=A) = 1/2
Fewer Bits ,[object Object],[object Object],[object Object],[object Object],P(X=C) = 1/8 P(X=B) = 1/4 P(X=D) = 1/8 P(X=A) = 1/2 111 D 110 C 10 B 0 A
Fewer Bits ,[object Object],[object Object],[object Object],[object Object],P(X=C) = 1/3 P(X=B) = 1/3 P(X=D) = 1/3 10 C 01 B 00 A
[object Object],[object Object],[object Object],[object Object],[object Object],General Case … . P(X=V 2 ) = p 2 P(X=V 1 ) = p 1 P(X=V m ) = p m
[object Object],[object Object],[object Object],[object Object],[object Object],General Case A histogram of the frequency distribution of values of X would be flat A histogram of the frequency distribution of values of X would have many lows and one or two highs … . P(X=V 2 ) = p 2 P(X=V 1 ) = p 1 P(X=V m ) = p m
[object Object],[object Object],[object Object],[object Object],[object Object],General Case A histogram of the frequency distribution of values of X would be flat A histogram of the frequency distribution of values of X would have many lows and one or two highs ..and so the values sampled from it would be all over the place ..and so the values sampled from it would be more predictable … . P(X=V 2 ) = p 2 P(X=V 1 ) = p 1 P(X=V m ) = p m
Entropy in a nut-shell Low Entropy High Entropy
Entropy in a nut-shell Low Entropy High Entropy ..the values (locations of soup) unpredictable... almost uniformly sampled throughout our dining room ..the values (locations of soup) sampled entirely from within the soup bowl
Exercise: ,[object Object],[object Object],[object Object],[object Object]
Specific Conditional Entropy Suppose I’m trying to predict output Y and I have input X ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
Specific Conditional Entropy Definition of Specific Conditional Entropy: H(Y | X=v)  =  The entropy of  Y  among only those records in which  X  has value  v X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
Specific Conditional Entropy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
Conditional Entropy Definition of Specific Conditional Entropy: H(Y | X)   =  The average conditional entropy of  Y = if you choose a record at random what will be the conditional entropy of  Y , conditioned on that row’s value of  X = Expected number of bits to transmit  Y  if both sides will know the value of  X =  Σ j Prob(X=v j ) H(Y  |  X = v j ) X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
Conditional Entropy ,[object Object],[object Object],[object Object],X = College Major Y = Likes “Gladiator” Example: H(Y | X)  =  0.5 * 1 + 0.25 * 0 + 0.25 * 0 = 0.5 0 0.25 CS 0 0.25 History 1 0.5 Math H(Y  |  X = v j ) Prob(X=v j ) v j Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
Information Gain ,[object Object],[object Object],[object Object],X = College Major Y = Likes “Gladiator” ,[object Object],[object Object],[object Object],[object Object],Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
What is Information Gain used for? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conditional entropy H(C|age) ,[object Object],[object Object],[object Object]
Select the attribute with lowest conditional entropy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],yes age? <=30 >40 30..40 student? no yes no yes credit rating? fair excellent no yes
Goodness in Decision Tree Induction ,[object Object],[object Object],[object Object],[object Object]
Scalable Decision Tree Induction Methods in Data Mining Studies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Visualization of a   Decision Tree   in SGI/MineSet 3.0
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bayesian Classification: Why? ,[object Object],[object Object],[object Object],[object Object]
Bayesian Classification ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bayesian Theorem ,[object Object],[object Object],[object Object],[object Object],[object Object]
Basic Idea  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Naïve Bayes Classifier  ,[object Object],[object Object],[object Object],[object Object]
Sample quiz questions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Naïve Bayesian Classifier:  Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Pitfall: forget P(Ci)
Naïve Bayesian Classifier: Comments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Baysian Networks slides adapted from Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email_address] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials:  http://www.cs.cmu.edu/~awm/tutorials  . Comments and corrections gratefully received.
What we’ll discuss ,[object Object],[object Object],[object Object],[object Object],[object Object]
Why this matters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Anomaly Detection Inference Active Data Collection
Ways to deal with Uncertainty ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Discrete Random Variables ,[object Object],[object Object],[object Object],[object Object],[object Object]
Probabilities ,[object Object],[object Object],[object Object]
Visualizing A Event space of all possible worlds Its area is 1 Worlds in which A is False Worlds in which A is true P(A) = Area of reddish oval
Interpreting the axioms ,[object Object],[object Object],[object Object],[object Object],The area of A can’t get any smaller than 0 And a zero area would mean no world could ever have A true
Interpreting the axioms ,[object Object],[object Object],[object Object],[object Object],The area of A can’t get any bigger than 1 And an area of 1 would mean all worlds will have A true
Interpreting the axioms ,[object Object],[object Object],[object Object],[object Object],A B
Interpreting the axioms ,[object Object],[object Object],[object Object],[object Object],P(A or B) B P(A and B) Simple addition and subtraction A B
These Axioms are Not to be Trifled With ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Theorems from the Axioms ,[object Object],[object Object],[object Object],[object Object],[object Object]
Another important theorem ,[object Object],[object Object],[object Object],[object Object],[object Object]
Conditional Probability ,[object Object],F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 “ Headaches are rare and flu is rarer, but if you’re coming down with ‘flu there’s a 50-50 chance you’ll have a headache.”
Conditional Probability H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(H|F) = Fraction of flu-inflicted worlds in which you have a headache = #worlds with flu and headache ------------------------------------ #worlds with flu = Area of “H and F” region ------------------------------ Area of “F” region = P(H ^ F) ----------- P(F)  F H
Definition of Conditional Probability P(A ^ B)  P(A|B)  =  ----------- P(B)  Corollary: The Chain Rule P(A ^ B) = P(A|B) P(B)
Bayes Rule ,[object Object],[object Object],[object Object],[object Object],Bayes, Thomas (1763)  An essay towards solving a problem in the doctrine of chances.  Philosophical Transactions of the Royal Society of London,  53:370-418
Using Bayes Rule to Gamble ,[object Object],The “Lose” envelope has three beads and no money Trivial question: someone draws an envelope at random and offers to sell it to you. How much should you pay? R  R  B  B R  B  B $1.00
Using Bayes Rule to Gamble ,[object Object],The “Lose” envelope has three beads and no money ,[object Object],[object Object],[object Object],$1.00
Another Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Multivalued Random Variables ,[object Object],[object Object],[object Object]
An easy fact about Multivalued Random Variables: ,[object Object],[object Object],[object Object],[object Object],[object Object]
An easy fact about Multivalued Random Variables: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Another fact about Multivalued Random Variables: ,[object Object],[object Object],[object Object],[object Object],[object Object]
Another fact about Multivalued Random Variables: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
More General Forms of Bayes Rule
More General Forms of Bayes Rule
Useful Easy-to-prove facts
From Probability to Bayesian Net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Joint Distribution Recipe for making a joint distribution of M variables: Example: Boolean variables A, B, C
The Joint Distribution ,[object Object],[object Object],Example: Boolean variables A, B, C 1 1 1 0 1 1 1 0 1 0 0 1 1 1 0 0 1 0 1 0 0 0 0 0 C B A
The Joint Distribution ,[object Object],[object Object],[object Object],Example: Boolean variables A, B, C 0.10 1 1 1 0.25 0 1 1 0.10 1 0 1 0.05 0 0 1 0.05 1 1 0 0.10 0 1 0 0.05 1 0 0 0.30 0 0 0 Prob C B A
The Joint Distribution ,[object Object],[object Object],[object Object],[object Object],Example: Boolean variables A, B, C A B C 0.05 0.25 0.10 0.05 0.05 0.10 0.10 0.30 0.10 1 1 1 0.25 0 1 1 0.10 1 0 1 0.05 0 0 1 0.05 1 1 0 0.10 0 1 0 0.05 1 0 0 0.30 0 0 0 Prob C B A
Using the Joint Once you have the JD you can ask for the probability of any logical expression involving your attribute
Using the Joint P(Poor Male) = 0.4654
Using the Joint P(Poor) = 0.7604
Inference with the Joint
Inference with the Joint P( Male  |  Poor ) = 0.4654 / 0.7604 = 0.612
Joint distributions ,[object Object],[object Object],[object Object],[object Object]
Using fewer numbers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Using fewer numbers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],What extra assumption can you make?
Independence ,[object Object],[object Object],[object Object],[object Object],[object Object]
Independence ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Independence ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],And in general: P(M=u ^ S=v) = P(M=u) P(S=v) for each of the four combinations of u=True/False v=True/False
Independence ,[object Object],[object Object],[object Object],[object Object],And since we now have the joint pdf, we can make any queries we like. From these statements, we can derive the full joint pdf. F F T F F T T T Prob S M
A more interesting case ,[object Object],[object Object],[object Object],[object Object]
A more interesting case ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A more interesting case ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A more interesting case ,[object Object],[object Object],[object Object],[object Object],P(S    M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L    M ^ S) = 0.05 P(L    M ^ ~S) = 0.1 P(L    ~M ^ S) = 0.1 P(L    ~M ^ ~S) = 0.2 Now we can derive a full joint p.d.f. with a “mere” six numbers instead of seven* *Savings are larger for larger numbers of variables.
A more interesting case ,[object Object],[object Object],[object Object],[object Object],P(S    M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L    M ^ S) = 0.05 P(L    M ^ ~S) = 0.1 P(L    ~M ^ S) = 0.1 P(L    ~M ^ ~S) = 0.2 Question:  Express P(L=x ^ M=y ^ S=z) in terms that only need the above expressions, where  x,y  and  z  may each be True or False.
A bit of notation S M L P(s)=0.3 P(M)=0.6 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 P(S    M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L    M ^ S) = 0.05 P(L    M ^ ~S) = 0.1 P(L    ~M ^ S) = 0.1 P(L    ~M ^ ~S) = 0.2
A bit of notation S M L P(s)=0.3 P(M)=0.6 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 Read the absence of an arrow between S and M to mean “it would not help me predict M if I knew the value of S” Read the two arrows into L to mean that if I want to know the value of L it may help me to know M and to know S. This kind of stuff will be thoroughly formalized later P(S    M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L    M ^ S) = 0.05 P(L    M ^ ~S) = 0.1 P(L    ~M ^ S) = 0.1 P(L    ~M ^ ~S) = 0.2
An even cuter trick ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conditional independence ,[object Object],[object Object],[object Object],[object Object],[object Object],M L R Given knowledge of M, knowing anything else in the diagram won’t help us with L, etc. ..which is also notated by the following diagram.
Conditional Independence formalized ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ Shoe-size is conditionally independent of Glove-size given height weight and age” means forall s,g,h,w,a P(ShoeSize=s|Height=h,Weight=w,Age=a) = P(ShoeSize=s|Height=h,Weight=w,Age=a,GloveSize=g)
Example: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ Shoe-size is conditionally independent of Glove-size given height weight and age” does not mean forall s,g,h P(ShoeSize=s|Height=h) = P(ShoeSize=s|Height=h, GloveSize=g)
Conditional independence M L R ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],‘ R and L conditionally independent given M’
Conditional independence M L R ,[object Object],[object Object],[object Object],[object Object],[object Object],Conditional Independence: P(R  M,L) = P(R  M), P(R  ~M,L) = P(R  ~M) Again, we can obtain any member of the Joint prob dist that we desire: P(L=x ^ R=y ^ M=z) =
Assume five variables ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Making a Bayes net ,[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T ,[object Object],[object Object]
Making a Bayes net ,[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T ,[object Object],[object Object],[object Object]
Making a Bayes net ,[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 ,[object Object],[object Object]
Making a Bayes net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
Bayes Nets Formalized ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Building a Bayes Net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Computing a Joint Entry ,[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
Computing with Bayes Net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
The general case ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],So any entry in joint pdf table can be computed. And so  any conditional probability  can be computed.
Where are we now? ,[object Object],[object Object],[object Object],[object Object],E.G. What   could we do to   compute P(R    T,~S)? S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
Where are we now? ,[object Object],[object Object],[object Object],[object Object],E.G. What   could we do to   compute P(R    T,~S)? Step 1: Compute P( R ^   T ^ ~S ) Step 2: Compute P( T ^ ~S ) Step 3: Return P( R ^   T ^ ~S ) ------------------------------------- P( T ^ ~S ) S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
Where are we now? ,[object Object],[object Object],[object Object],[object Object],E.G. What   could we do to   compute P(R    T,~S)? Step 1: Compute P( R ^   T ^ ~S ) Step 2: Compute P( T ^ ~S ) Step 3: Return P( R ^   T ^ ~S ) ------------------------------------- P( T ^ ~S ) Sum of all the rows in the Joint that match  R ^   T ^ ~S Sum of all the rows in the Joint that match  T ^ ~S S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
Where are we now? ,[object Object],[object Object],[object Object],[object Object],E.G. What   could we do to   compute P(R    T,~S)? Step 1: Compute P( R ^   T ^ ~S ) Step 2: Compute P( T ^ ~S ) Step 3: Return P( R ^   T ^ ~S ) ------------------------------------- P( T ^ ~S ) Sum of all the rows in the Joint that match  R ^   T ^ ~S Sum of all the rows in the Joint that match  T ^ ~S Each of these obtained by the “computing a joint probability entry” method of the earlier slides 4 joint computes 8 joint computes S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
The good news ,[object Object],[object Object]
The good news ,[object Object],[object Object],Suppose you have  m  binary-valued variables in your Bayes Net and expression  E 2  mentions  k  variables. How much work is the above computation?
The sad, bad news ,[object Object],[object Object]
The sad, bad news ,[object Object],[object Object],[object Object],[object Object],[object Object]
The sad, bad news ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bayes nets inference algorithms ,[object Object],A poly tree Not a poly tree (but still a legal Bayes net) S R L T L T M S M R X 1 X 2 X 4 X 3 X 5 X 1 X 2 X 3 X 5 X 4 ,[object Object],[object Object],[object Object]
Sampling from the Joint Distribution ,[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
Sampling from the Joint Distribution ,[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2
A general sampling algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Stochastic Simulation Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General Stochastic Simulation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Likelihood weighting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Likelihood weighting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Case Study I ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Questions ,[object Object],[object Object],[object Object],[object Object],[object Object]
What you should know ,[object Object],[object Object],[object Object],[object Object],[object Object]
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Neural Networks ,[object Object],[object Object],[object Object]
A  Neuron ,[object Object], k - f weighted  sum Input vector  x output  y Activation function weight vector  w  w 0 w 1 w n x 0 x 1 x n
A  Neuron  k - f weighted  sum Input vector  x output  y Activation function weight vector  w  w 0 w 1 w n x 0 x 1 x n
Multi-Layer Perceptron Output nodes Input nodes Hidden nodes Output vector Input vector:  x i w ij
Network Training ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Network Pruning and Rule Extraction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Linear Support Vector Machines ,[object Object],value( )= -1, e.g. does not buy computer value( )= 1, e.g. buy computer ,[object Object],Margin
Linear Support Vector Machines Support Vectors Small Margin Large Margin
Linear Support Vector Machines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],w·x  + b  = -1 w·x  + b  = 1
Linear Support Vector Machines ,[object Object],w·x  + b  = -1 w·x  + b  = 1 10 11 12 60° ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],M
Linear Support Vector Machines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SVM – Cont. ,[object Object],[object Object],-1 0 +1 + + - (1,0) (0,0) (0,1) + + -
Non-Linear SVM Classification using SVM ( w,b ) In non linear case we can see this as  Kernel – Can be thought of as doing dot product  in some high dimensional space
Example of Non-linear SVM
Results
SVM vs. Neural Network ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SVM Related Links ,[object Object],[object Object],[object Object],[object Object],[object Object]
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bagging and Boosting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Classifier C Classification method (CM) CM Classifier C1 CM Classifier C2 Classifier C*
Bagging  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Boosting Technique — Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier108kaushik
 
Chap08 estimation additional topics
Chap08 estimation additional topicsChap08 estimation additional topics
Chap08 estimation additional topicsJudianto Nugroho
 
Generalization abstraction
Generalization abstractionGeneralization abstraction
Generalization abstractionEdward Blurock
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with TensorflowShubham Sharma
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2Nandhini S
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLPbutest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
Chap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distributionChap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distributionJudianto Nugroho
 
Classification
ClassificationClassification
ClassificationCloudxLab
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.ShwetaPatil174
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part) Marina Santini
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation sourcebutest
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random ForestsCloudxLab
 

Was ist angesagt? (20)

ppt
pptppt
ppt
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier
 
Chap08 estimation additional topics
Chap08 estimation additional topicsChap08 estimation additional topics
Chap08 estimation additional topics
 
Generalization abstraction
Generalization abstractionGeneralization abstraction
Generalization abstraction
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Sfs4e ppt 06
Sfs4e ppt 06Sfs4e ppt 06
Sfs4e ppt 06
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLP
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Machine learning
Machine learningMachine learning
Machine learning
 
Chap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distributionChap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distribution
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
Machine learning
Machine learningMachine learning
Machine learning
 
Classification
ClassificationClassification
Classification
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation source
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 

Ähnlich wie My7class

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.pptbutest
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.pptRohit Raj
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmKaniska Mandal
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
original
originaloriginal
originalbutest
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text ClassificationSai Srinivas Kotni
 
week9_Machine_Learning.ppt
week9_Machine_Learning.pptweek9_Machine_Learning.ppt
week9_Machine_Learning.pptbutest
 
CS364 Artificial Intelligence Machine Learning
CS364 Artificial Intelligence Machine LearningCS364 Artificial Intelligence Machine Learning
CS364 Artificial Intelligence Machine Learningbutest
 

Ähnlich wie My7class (20)

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
original
originaloriginal
original
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
ppt
pptppt
ppt
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
week9_Machine_Learning.ppt
week9_Machine_Learning.pptweek9_Machine_Learning.ppt
week9_Machine_Learning.ppt
 
Unit-2.ppt
Unit-2.pptUnit-2.ppt
Unit-2.ppt
 
CS364 Artificial Intelligence Machine Learning
CS364 Artificial Intelligence Machine LearningCS364 Artificial Intelligence Machine Learning
CS364 Artificial Intelligence Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

My7class

  • 1.
  • 2.
  • 3.
  • 4.
  • 5. Classification Process (1): Model Construction Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Training Data Classifier (Model)
  • 6. Classification Process (2): Use the Model in Prediction (Jeff, Professor, 4) Tenured? Classifier Testing Data Unseen Data
  • 7.
  • 8.
  • 9.
  • 10. Training Dataset This follows an example from Quinlan’s ID3
  • 11. Output: A Decision Tree for “ buys_computer” age? overcast student? credit rating? no yes fair excellent <=30 >40 no no yes yes yes 30..40
  • 12.
  • 13.
  • 14. Information gain slides adapted from Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email_address] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Entropy in a nut-shell Low Entropy High Entropy
  • 23. Entropy in a nut-shell Low Entropy High Entropy ..the values (locations of soup) unpredictable... almost uniformly sampled throughout our dining room ..the values (locations of soup) sampled entirely from within the soup bowl
  • 24.
  • 25.
  • 26. Specific Conditional Entropy Definition of Specific Conditional Entropy: H(Y | X=v) = The entropy of Y among only those records in which X has value v X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
  • 27.
  • 28. Conditional Entropy Definition of Specific Conditional Entropy: H(Y | X) = The average conditional entropy of Y = if you choose a record at random what will be the conditional entropy of Y , conditioned on that row’s value of X = Expected number of bits to transmit Y if both sides will know the value of X = Σ j Prob(X=v j ) H(Y | X = v j ) X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Visualization of a Decision Tree in SGI/MineSet 3.0
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47. Baysian Networks slides adapted from Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email_address] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53. Visualizing A Event space of all possible worlds Its area is 1 Worlds in which A is False Worlds in which A is true P(A) = Area of reddish oval
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62. Conditional Probability H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(H|F) = Fraction of flu-inflicted worlds in which you have a headache = #worlds with flu and headache ------------------------------------ #worlds with flu = Area of “H and F” region ------------------------------ Area of “F” region = P(H ^ F) ----------- P(F) F H
  • 63. Definition of Conditional Probability P(A ^ B) P(A|B) = ----------- P(B) Corollary: The Chain Rule P(A ^ B) = P(A|B) P(B)
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73. More General Forms of Bayes Rule
  • 74. More General Forms of Bayes Rule
  • 76.
  • 77. The Joint Distribution Recipe for making a joint distribution of M variables: Example: Boolean variables A, B, C
  • 78.
  • 79.
  • 80.
  • 81. Using the Joint Once you have the JD you can ask for the probability of any logical expression involving your attribute
  • 82. Using the Joint P(Poor Male) = 0.4654
  • 83. Using the Joint P(Poor) = 0.7604
  • 85. Inference with the Joint P( Male | Poor ) = 0.4654 / 0.7604 = 0.612
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98. A bit of notation S M L P(s)=0.3 P(M)=0.6 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 P(S  M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L  M ^ S) = 0.05 P(L  M ^ ~S) = 0.1 P(L  ~M ^ S) = 0.1 P(L  ~M ^ ~S) = 0.2
  • 99. A bit of notation S M L P(s)=0.3 P(M)=0.6 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 Read the absence of an arrow between S and M to mean “it would not help me predict M if I knew the value of S” Read the two arrows into L to mean that if I want to know the value of L it may help me to know M and to know S. This kind of stuff will be thoroughly formalized later P(S  M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L  M ^ S) = 0.05 P(L  M ^ ~S) = 0.1 P(L  ~M ^ S) = 0.1 P(L  ~M ^ ~S) = 0.2
  • 100.
  • 101.
  • 102.
  • 103.
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114.
  • 115.
  • 116.
  • 117.
  • 118.
  • 119.
  • 120.
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 128.
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137.
  • 138.
  • 139.
  • 140. A Neuron  k - f weighted sum Input vector x output y Activation function weight vector w  w 0 w 1 w n x 0 x 1 x n
  • 141. Multi-Layer Perceptron Output nodes Input nodes Hidden nodes Output vector Input vector: x i w ij
  • 142.
  • 143.
  • 144.
  • 145.
  • 146. Linear Support Vector Machines Support Vectors Small Margin Large Margin
  • 147.
  • 148.
  • 149.
  • 150.
  • 151. Non-Linear SVM Classification using SVM ( w,b ) In non linear case we can see this as Kernel – Can be thought of as doing dot product in some high dimensional space
  • 154.
  • 155.
  • 156.
  • 157.
  • 158.
  • 159.
  • 160.