Anzeige
Anzeige

Más contenido relacionado

Anzeige

Module 1.pdf

  1. • PS: This file is for reference only. Do not depend solely on it for the content. It is to supplement your Text book content. It is recommended to go through suggested readings/Text book to have detailed knowledge of the content. 1
  2. 1. Introduction 2
  3. Definition • In 1959, Arthur Samuel, a pioneer in the field of machine learning (ML) defined it as the “field of study that gives computers the ability to learn without being explicitly programmed” 3
  4. Definition “A computer program is said to learn from experience with respect to some class of tasks and performance measure, if the performance at the tasks, as measured by the performance measure, improves with experience” Features of a well-defined learning problem: • The learning task • The measure of performance • The task experience • Types of learning tasks
  5. 5
  6. What is the Learning Problem? • Learning = Improving with experience at some task • Improve over task T , • with respect to performance measure P , • based on experience E. 6
  7. What is the Learning Problem? • E.g., Learn to play checkers T : Play checkers P : % of games won in world tournament E: opportunity to play against self • 7
  8. Learning to Play Checkers • E.g., Learn to play checkers T : Play checkers P : % of games won in world tournament • What Experience • What exactly should be learned? • How shall it be represented? • What specific algorithm to learn it? 8
  9. Designing a Learning System • Consider designing a program to learn to play checkers, with the goal of entering it in the world checkers tournament 9
  10. Designing a Learning System • Consider designing a program to learn to play checkers, with the goal of entering it in the world checkers tournament • Performance measure: the percentage of games it wins in this tournament. • Requires the following sets – Choosing Training Experience – Choosing the Target Function – Choosing the Representation of the Target Function – Choosing the Function Approximation Algorithm 10
  11. Choosing the Training Experience 1. What training experience should the system have? – A design choice with great impact on the outcome. 2. What amount of interaction should there be between the system and the supervisor? 3. Which training examples? 11
  12. Choosing the Training Experience 1. What training experience should the system have? – A design choice with great impact on the outcome. • Will the training experience provide direct or indirect feedback? – Direct Feedback: system learns from examples of individual checkers board states and the correct move for each Just a bunch of board states together with a correct move. 12
  13. Choosing the Training Experience • Direct 13
  14. Choosing the Training Experience 1. What training experience should the system have? – A design choice with great impact on the outcome. • Will the training experience provide direct or indirect feedback? – Direct Feedback: system learns from examples of individual checkers board states and the correct move for each Just a bunch of board states together with a correct move. – Indirect Feedback: A bunch of recorded games, where the correctness of the moves is inferred by the result of the game. • Credit assignment problem: Value of early states must be inferred from the outcome 14 Direct feedback easier to learn from
  15. Choosing the Training Experience 2. What amount of interaction should there be between the system and the supervisor? – Choice #1: No freedom. Supervisor provides all training examples. – Choice #2: Semi-free. Supervisor provides training examples, system constructs its own examples too, and asks questions to the supervisor in cases of doubt. – Choice #3: Total-freedom. System learns to play completely unsupervised • How “daring” the system should be in exploring new boards? 15
  16. Choosing the Training Experience 3. Which training examples? – There is an huge huge number of possible games. – No time to try all possible games. – System should learn with examples that it will encounter in the future. – For example, if the goal is to beat humans, it should be able to do well in situations that humans encounter when they play (this is hard to achieve in practice). 16
  17. Choosing the Training Experience – If training the checkers program consists only of experiences played against itself, it may never encounter crucial board states that are likely to be played by the human checkers champion – Most theory of machine learning rests on the assumption that the distribution of training examples is identical to the distribution of test examples 17
  18. Partial Design of Checkers Learning Program • A checkers learning problem: – Task T: playing checkers – Performance measure P: percent of games won in the world tournament – Training experience E: games played against itself • Remaining choices – The exact type of knowledge to be learned – A representation for this target knowledge – A learning mechanism 18
  19. Choosing the Target Function What should be learned exactly? • The computer program knows the legal moves. Should learn how to choose the best move. Program needs to learn the best move from among legal moves • The computer should learn a ‘hidden’ function. – target function: ChooseMove : B → M – B legal Board state, M – legal Move • ChooseMove is difficult to learn given indirect training 19
  20. Choosing the Target Function • What should be learned exactly? 20
  21. Choosing the Target Function • So, our Alternative target function – An evaluation function that assigns a numerical score to any given board state – V : B → ( where is the set of real numbers) • V(b) for an arbitrary board state b in B – if b is a final board state that is won, then V(b) = 100 – if b is a final board state that is lost, then V(b) = -100 – if b is a final board state that is drawn, then V(b) = 0 – if b is not a final state, then V(b) = V(b '), where b' is the best final board state that can be achieved starting from b and playing optimally until the end of the game 21  
  22. Choosing the Target Function • V(b) gives a recursive definition for board state b – Not usable because not efficient to compute except is first three trivial cases – nonoperational definition • Goal of learning is to discover an operational description of V • Learning the target function is often called function approximation – Referred to as 22 V̂
  23. Choosing a Representation for the Target Function • Choice of representations involve trade offs – Pick a very expressive representation to allow close approximation to the ideal target function V – More expressive, more training data required to choose among alternative hypotheses • Use linear combination of the following board features: – x1: the number of black pieces on the board – x2: the number of red pieces on the board – x3: the number of black kings on the board – x4: the number of red kings on the board – x5: the number of black pieces threatened by red (i.e. which can be captured on red's next turn) – x6: the number of red pieces threatened by black 23 6 6 5 5 4 4 3 3 2 2 1 1 0 ) ( ˆ x w x w x w x w x w x w w b V       
  24. 24
  25. Partial Design of Checkers Learning Program • A checkers learning problem: – Task T: playing checkers – Performance measure P: percent of games won in the world tournament – Training experience E: games played against itself – Target Function: V: Board → – Target function representation 25 6 6 5 5 4 4 3 3 2 2 1 1 0 ) ( ˆ x w x w x w x w x w x w w b V        
  26. Choosing a Function Approximation Algorithm • To learn we require a set of training examples describing the board b and the training value Vtrain(b) – Ordered pair 26 V̂   b V b train , 100 , 0 , 0 , 0 , 1 , 0 , 3 6 5 4 3 2 1        x x x x x x x1: the number of black pieces on the board x2: the number of red pieces on the board x3: the number of black kings on the board x4: the number of red kings on the board x5: the number of black pieces threatened by red (i.e. which can be captured on red's next turn) x6: the number of red pieces threatened by black
  27. Choosing a Function Approximation Algorithm • Need a procedure that first derives such training examples from the indirect training experience, then adjust the weights Wi to best fits these training examples. 27
  28. Estimating Training Values • Need to assign specific scores to intermediate board states • Approximate intermediate board state b using the learner's current approximation of the next board state following b – Simple and successful approach – More accurate for states closer to end states 28 )) ( ( ˆ ) ( b Successor V b Vtrain 
  29. Adjusting the Weights • Choose the weights wi to best fit the set of training examples • Minimize the squared error E between the train values and the values predicted by the hypothesis • Require an algorithm that – will incrementally refine weights as new training examples become available – will be robust to errors in these estimated training values • Least Mean Squares (LMS) is one such algorithm 29             examples training b V b train train b V b V E , 2 ˆ
  30. LMS Weight Update Rule • For each train example – Use the current weights to calculate – For each weight wi, update it as – where • is a small constant (e.g. 0.1) 30   b V b train ,   b V ˆ        i train i i x b V b V w w ˆ    
  31. Summary of Design Choices
  32. Suggested Readings • “Machine Learning” by Tom Mitchell, McGraw Hill Publisher, Chapter 1 32
Anzeige