My7class

Classification ,[object Object],[object Object],[object Object],modified by Donghui Zhang Integrated with slides from Prof. Andrew W. Moore http:// www.cs.cmu.edu/~awm/tutorials

Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Classification vs. Prediction

Classification—A Two-Step Process ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Classification Process (1): Model Construction Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Training Data Classifier (Model)

Classification Process (2): Use the Model in Prediction (Jeff, Professor, 4) Tenured? Classifier Testing Data Unseen Data

Supervised vs. Unsupervised Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Evaluating Classification Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Training Dataset This follows an example from Quinlan’s ID3

Output: A Decision Tree for “ buys_computer” age? overcast student? credit rating? no yes fair excellent <=30 >40 no no yes yes yes 30..40

Extracting Classification Rules from Trees ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Algorithm for Decision Tree Induction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Information gain slides adapted from Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email_address] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.

Bits ,[object Object],[object Object],[object Object],[object Object],[object Object],P(X=C) = 1/4 P(X=B) = 1/4 P(X=D) = 1/4 P(X=A) = 1/4

Fewer Bits ,[object Object],[object Object],[object Object],P(X=C) = 1/8 P(X=B) = 1/4 P(X=D) = 1/8 P(X=A) = 1/2

Fewer Bits ,[object Object],[object Object],[object Object],[object Object],P(X=C) = 1/8 P(X=B) = 1/4 P(X=D) = 1/8 P(X=A) = 1/2 111 D 110 C 10 B 0 A

Fewer Bits ,[object Object],[object Object],[object Object],[object Object],P(X=C) = 1/3 P(X=B) = 1/3 P(X=D) = 1/3 10 C 01 B 00 A

[object Object],[object Object],[object Object],[object Object],[object Object],General Case … . P(X=V 2 ) = p 2 P(X=V 1 ) = p 1 P(X=V m ) = p m

[object Object],[object Object],[object Object],[object Object],[object Object],General Case A histogram of the frequency distribution of values of X would be flat A histogram of the frequency distribution of values of X would have many lows and one or two highs … . P(X=V 2 ) = p 2 P(X=V 1 ) = p 1 P(X=V m ) = p m

[object Object],[object Object],[object Object],[object Object],[object Object],General Case A histogram of the frequency distribution of values of X would be flat A histogram of the frequency distribution of values of X would have many lows and one or two highs ..and so the values sampled from it would be all over the place ..and so the values sampled from it would be more predictable … . P(X=V 2 ) = p 2 P(X=V 1 ) = p 1 P(X=V m ) = p m

Entropy in a nut-shell Low Entropy High Entropy

Entropy in a nut-shell Low Entropy High Entropy ..the values (locations of soup) unpredictable... almost uniformly sampled throughout our dining room ..the values (locations of soup) sampled entirely from within the soup bowl

Exercise: ,[object Object],[object Object],[object Object],[object Object]

Specific Conditional Entropy Suppose I’m trying to predict output Y and I have input X ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X

Specific Conditional Entropy Definition of Specific Conditional Entropy: H(Y | X=v) = The entropy of Y among only those records in which X has value v X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X

Specific Conditional Entropy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X

Conditional Entropy Definition of Specific Conditional Entropy: H(Y | X) = The average conditional entropy of Y = if you choose a record at random what will be the conditional entropy of Y , conditioned on that row’s value of X = Expected number of bits to transmit Y if both sides will know the value of X = Σ j Prob(X=v j ) H(Y | X = v j ) X = College Major Y = Likes “Gladiator” Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X

Conditional Entropy ,[object Object],[object Object],[object Object],X = College Major Y = Likes “Gladiator” Example: H(Y | X) = 0.5 * 1 + 0.25 * 0 + 0.25 * 0 = 0.5 0 0.25 CS 0 0.25 History 1 0.5 Math H(Y | X = v j ) Prob(X=v j ) v j Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X

Information Gain ,[object Object],[object Object],[object Object],X = College Major Y = Likes “Gladiator” ,[object Object],[object Object],[object Object],[object Object],Yes Math No History Yes CS No Math No Math Yes CS No History Yes Math Y X

What is Information Gain used for? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Conditional entropy H(C|age) ,[object Object],[object Object],[object Object]

Select the attribute with lowest conditional entropy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],yes age? <=30 >40 30..40 student? no yes no yes credit rating? fair excellent no yes

Goodness in Decision Tree Induction ,[object Object],[object Object],[object Object],[object Object]

Scalable Decision Tree Induction Methods in Data Mining Studies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Visualization of a Decision Tree in SGI/MineSet 3.0

Bayesian Classification: Why? ,[object Object],[object Object],[object Object],[object Object]

Bayesian Classification ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Bayesian Theorem ,[object Object],[object Object],[object Object],[object Object],[object Object]

Basic Idea ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Naïve Bayes Classifier ,[object Object],[object Object],[object Object],[object Object]

Sample quiz questions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Naïve Bayesian Classifier: Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Pitfall: forget P(Ci)

Naïve Bayesian Classifier: Comments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Baysian Networks slides adapted from Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email_address] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.

What we’ll discuss ,[object Object],[object Object],[object Object],[object Object],[object Object]

Why this matters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Anomaly Detection Inference Active Data Collection

Ways to deal with Uncertainty ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Discrete Random Variables ,[object Object],[object Object],[object Object],[object Object],[object Object]

Probabilities ,[object Object],[object Object],[object Object]

Visualizing A Event space of all possible worlds Its area is 1 Worlds in which A is False Worlds in which A is true P(A) = Area of reddish oval

Interpreting the axioms ,[object Object],[object Object],[object Object],[object Object],The area of A can’t get any smaller than 0 And a zero area would mean no world could ever have A true

Interpreting the axioms ,[object Object],[object Object],[object Object],[object Object],The area of A can’t get any bigger than 1 And an area of 1 would mean all worlds will have A true

Interpreting the axioms ,[object Object],[object Object],[object Object],[object Object],A B

Interpreting the axioms ,[object Object],[object Object],[object Object],[object Object],P(A or B) B P(A and B) Simple addition and subtraction A B

These Axioms are Not to be Trifled With ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Theorems from the Axioms ,[object Object],[object Object],[object Object],[object Object],[object Object]

Another important theorem ,[object Object],[object Object],[object Object],[object Object],[object Object]

Conditional Probability ,[object Object],F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 “ Headaches are rare and flu is rarer, but if you’re coming down with ‘flu there’s a 50-50 chance you’ll have a headache.”

Conditional Probability H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(H|F) = Fraction of flu-inflicted worlds in which you have a headache = #worlds with flu and headache ------------------------------------ #worlds with flu = Area of “H and F” region ------------------------------ Area of “F” region = P(H ^ F) ----------- P(F) F H

Definition of Conditional Probability P(A ^ B) P(A|B) = ----------- P(B) Corollary: The Chain Rule P(A ^ B) = P(A|B) P(B)

Bayes Rule ,[object Object],[object Object],[object Object],[object Object],Bayes, Thomas (1763) An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418

Using Bayes Rule to Gamble ,[object Object],The “Lose” envelope has three beads and no money Trivial question: someone draws an envelope at random and offers to sell it to you. How much should you pay? R R B B R B B $1.00

Using Bayes Rule to Gamble ,[object Object],The “Lose” envelope has three beads and no money ,[object Object],[object Object],[object Object],$1.00

Another Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Multivalued Random Variables ,[object Object],[object Object],[object Object]

An easy fact about Multivalued Random Variables: ,[object Object],[object Object],[object Object],[object Object],[object Object]

An easy fact about Multivalued Random Variables: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Another fact about Multivalued Random Variables: ,[object Object],[object Object],[object Object],[object Object],[object Object]

Another fact about Multivalued Random Variables: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More General Forms of Bayes Rule

From Probability to Bayesian Net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

The Joint Distribution Recipe for making a joint distribution of M variables: Example: Boolean variables A, B, C

The Joint Distribution ,[object Object],[object Object],Example: Boolean variables A, B, C 1 1 1 0 1 1 1 0 1 0 0 1 1 1 0 0 1 0 1 0 0 0 0 0 C B A

The Joint Distribution ,[object Object],[object Object],[object Object],Example: Boolean variables A, B, C 0.10 1 1 1 0.25 0 1 1 0.10 1 0 1 0.05 0 0 1 0.05 1 1 0 0.10 0 1 0 0.05 1 0 0 0.30 0 0 0 Prob C B A

The Joint Distribution ,[object Object],[object Object],[object Object],[object Object],Example: Boolean variables A, B, C A B C 0.05 0.25 0.10 0.05 0.05 0.10 0.10 0.30 0.10 1 1 1 0.25 0 1 1 0.10 1 0 1 0.05 0 0 1 0.05 1 1 0 0.10 0 1 0 0.05 1 0 0 0.30 0 0 0 Prob C B A

Using the Joint Once you have the JD you can ask for the probability of any logical expression involving your attribute

Using the Joint P(Poor Male) = 0.4654

Using the Joint P(Poor) = 0.7604

Inference with the Joint P( Male | Poor ) = 0.4654 / 0.7604 = 0.612

Joint distributions ,[object Object],[object Object],[object Object],[object Object]

Using fewer numbers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Using fewer numbers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],What extra assumption can you make?

Independence ,[object Object],[object Object],[object Object],[object Object],[object Object]

Independence ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Independence ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],And in general: P(M=u ^ S=v) = P(M=u) P(S=v) for each of the four combinations of u=True/False v=True/False

Independence ,[object Object],[object Object],[object Object],[object Object],And since we now have the joint pdf, we can make any queries we like. From these statements, we can derive the full joint pdf. F F T F F T T T Prob S M

A more interesting case ,[object Object],[object Object],[object Object],[object Object]

A more interesting case ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

A more interesting case ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

A more interesting case ,[object Object],[object Object],[object Object],[object Object],P(S  M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L  M ^ S) = 0.05 P(L  M ^ ~S) = 0.1 P(L  ~M ^ S) = 0.1 P(L  ~M ^ ~S) = 0.2 Now we can derive a full joint p.d.f. with a “mere” six numbers instead of seven* *Savings are larger for larger numbers of variables.

A more interesting case ,[object Object],[object Object],[object Object],[object Object],P(S  M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L  M ^ S) = 0.05 P(L  M ^ ~S) = 0.1 P(L  ~M ^ S) = 0.1 P(L  ~M ^ ~S) = 0.2 Question: Express P(L=x ^ M=y ^ S=z) in terms that only need the above expressions, where x,y and z may each be True or False.

A bit of notation S M L P(s)=0.3 P(M)=0.6 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 P(S  M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L  M ^ S) = 0.05 P(L  M ^ ~S) = 0.1 P(L  ~M ^ S) = 0.1 P(L  ~M ^ ~S) = 0.2

A bit of notation S M L P(s)=0.3 P(M)=0.6 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 Read the absence of an arrow between S and M to mean “it would not help me predict M if I knew the value of S” Read the two arrows into L to mean that if I want to know the value of L it may help me to know M and to know S. This kind of stuff will be thoroughly formalized later P(S  M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L  M ^ S) = 0.05 P(L  M ^ ~S) = 0.1 P(L  ~M ^ S) = 0.1 P(L  ~M ^ ~S) = 0.2

An even cuter trick ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Conditional independence ,[object Object],[object Object],[object Object],[object Object],[object Object],M L R Given knowledge of M, knowing anything else in the diagram won’t help us with L, etc. ..which is also notated by the following diagram.

Conditional Independence formalized ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Example: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ Shoe-size is conditionally independent of Glove-size given height weight and age” means forall s,g,h,w,a P(ShoeSize=s|Height=h,Weight=w,Age=a) = P(ShoeSize=s|Height=h,Weight=w,Age=a,GloveSize=g)

Example: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ Shoe-size is conditionally independent of Glove-size given height weight and age” does not mean forall s,g,h P(ShoeSize=s|Height=h) = P(ShoeSize=s|Height=h, GloveSize=g)

Conditional independence M L R ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],‘ R and L conditionally independent given M’

Conditional independence M L R ,[object Object],[object Object],[object Object],[object Object],[object Object],Conditional Independence: P(R  M,L) = P(R  M), P(R  ~M,L) = P(R  ~M) Again, we can obtain any member of the Joint prob dist that we desire: P(L=x ^ R=y ^ M=z) =

Assume five variables ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Making a Bayes net ,[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T ,[object Object],[object Object]

Making a Bayes net ,[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T ,[object Object],[object Object],[object Object]

Making a Bayes net ,[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2 ,[object Object],[object Object]

Making a Bayes net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2

Bayes Nets Formalized ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Building a Bayes Net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Computing a Joint Entry ,[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2

Computing with Bayes Net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2

The general case ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],So any entry in joint pdf table can be computed. And so any conditional probability can be computed.

Where are we now? ,[object Object],[object Object],[object Object],[object Object],E.G. What could we do to compute P(R  T,~S)? S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2

Where are we now? ,[object Object],[object Object],[object Object],[object Object],E.G. What could we do to compute P(R  T,~S)? Step 1: Compute P( R ^ T ^ ~S ) Step 2: Compute P( T ^ ~S ) Step 3: Return P( R ^ T ^ ~S ) ------------------------------------- P( T ^ ~S ) S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2

Where are we now? ,[object Object],[object Object],[object Object],[object Object],E.G. What could we do to compute P(R  T,~S)? Step 1: Compute P( R ^ T ^ ~S ) Step 2: Compute P( T ^ ~S ) Step 3: Return P( R ^ T ^ ~S ) ------------------------------------- P( T ^ ~S ) Sum of all the rows in the Joint that match R ^ T ^ ~S Sum of all the rows in the Joint that match T ^ ~S S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2

Where are we now? ,[object Object],[object Object],[object Object],[object Object],E.G. What could we do to compute P(R  T,~S)? Step 1: Compute P( R ^ T ^ ~S ) Step 2: Compute P( T ^ ~S ) Step 3: Return P( R ^ T ^ ~S ) ------------------------------------- P( T ^ ~S ) Sum of all the rows in the Joint that match R ^ T ^ ~S Sum of all the rows in the Joint that match T ^ ~S Each of these obtained by the “computing a joint probability entry” method of the earlier slides 4 joint computes 8 joint computes S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2

The good news ,[object Object],[object Object]

The good news ,[object Object],[object Object],Suppose you have m binary-valued variables in your Bayes Net and expression E 2 mentions k variables. How much work is the above computation?

The sad, bad news ,[object Object],[object Object]

The sad, bad news ,[object Object],[object Object],[object Object],[object Object],[object Object]

The sad, bad news ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Bayes nets inference algorithms ,[object Object],A poly tree Not a poly tree (but still a legal Bayes net) S R L T L T M S M R X 1 X 2 X 4 X 3 X 5 X 1 X 2 X 3 X 5 X 4 ,[object Object],[object Object],[object Object]

Sampling from the Joint Distribution ,[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2

Sampling from the Joint Distribution ,[object Object],[object Object],[object Object],[object Object],[object Object],S M R L T P(s)=0.3 P(M)=0.6 P(R  M)=0.3 P(R  ~M)=0.6 P(T  L)=0.3 P(T  ~L)=0.8 P(L  M^S)=0.05 P(L  M^~S)=0.1 P(L  ~M^S)=0.1 P(L  ~M^~S)=0.2

A general sampling algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Stochastic Simulation Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

General Stochastic Simulation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Likelihood weighting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Likelihood weighting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Case Study I ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Questions ,[object Object],[object Object],[object Object],[object Object],[object Object]

What you should know ,[object Object],[object Object],[object Object],[object Object],[object Object]

Neural Networks ,[object Object],[object Object],[object Object]

A Neuron ,[object Object], k - f weighted sum Input vector x output y Activation function weight vector w  w 0 w 1 w n x 0 x 1 x n

A Neuron  k - f weighted sum Input vector x output y Activation function weight vector w  w 0 w 1 w n x 0 x 1 x n

Multi-Layer Perceptron Output nodes Input nodes Hidden nodes Output vector Input vector: x i w ij

Network Training ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Network Pruning and Rule Extraction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Linear Support Vector Machines ,[object Object],value( )= -1, e.g. does not buy computer value( )= 1, e.g. buy computer ,[object Object],Margin

Linear Support Vector Machines Support Vectors Small Margin Large Margin

Linear Support Vector Machines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],w·x + b = -1 w·x + b = 1

Linear Support Vector Machines ,[object Object],w·x + b = -1 w·x + b = 1 10 11 12 60° ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],M

Linear Support Vector Machines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

SVM – Cont. ,[object Object],[object Object],-1 0 +1 + + - (1,0) (0,0) (0,1) + + -

Non-Linear SVM Classification using SVM ( w,b ) In non linear case we can see this as Kernel – Can be thought of as doing dot product in some high dimensional space

SVM vs. Neural Network ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

SVM Related Links ,[object Object],[object Object],[object Object],[object Object],[object Object]

Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Bagging and Boosting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Classifier C Classification method (CM) CM Classifier C1 CM Classifier C2 Classifier C*

Bagging ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Boosting Technique — Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Summary ,[object Object],[object Object],[object Object],[object Object]

My7class

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie My7class

Ähnlich wie My7class (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

My7class