3. AI is anything capable of mimicking human behavior
Machine learning algorithms apply statistical methodologies to identify patterns in past human behavior and
make decisions.
DL techniques can adapt on their own, uncovering features in data that we never specifically programmed
them to find, and therefore we say they learn on their own.
4. Machine Learning - Andrew Ng
“If a typical person can do a mental task with less than one second of thought, we
can probably automate it using AI either now or in the near future.”
5. Learning - Definition
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with experience
E.
To have a well defined learning Problem
● Three features must be identified
○ Class of Tasks, Measure of performance to be improved and Source of Experience
6. A checkers learning problem
Objective : Designing a program to learn to play Checkers
Checkers
Video : https://youtu.be/ScKIdStgAfU
How to : https://www.wikihow.com/Play-Checkers
7. A checkers learning problem
● Task T: playing checkers
● Performance measure P: percent of games won against opponents
● Training experience E: playing practice games against itself
9. A handwriting recognition learning problem
● Task T: recognizing and classifying handwritten words within images
● Performance measure P: percent of words correctly classified
● Training experience E: a database of handwritten words with given classifications
11. A robot driving learning problem
● Task T: driving on public four-lane highways using vision sensors
● Performance measure P: average distance traveled before an error (as judged by human overseer)
● Training experience E: a sequence of images and steering commands recorded while observing a
human driver
12. Designing a Learning System
● Choosing the Training Experience
● Choosing the Target Function
● Choosing a Representation for the Target function
● Choosing a Function Approximation Algorithm
○ Estimating Training Values
○ Adjusting Weights
● The Final Design
13. Choosing the Training Experience
First Design Choice
Choose the type of training experience from which our system will learn
The type of Training Experience has a significant impact on success or failure of the learner
14. Training Experience - Attributes 1
Type of Training Data
● Direct
○ Checkerboard status, Correct Move
● Indirect
○ Move sequences and final outcome of the various games played
○ Correctness of the specific moves early in the game must be inferred indirectly - from won or lost
○ Need to assign credits, - determining the degree to which each move in the sequence deserves credit or blame
for the final outcome
○ Credit assignment is a difficult problem - the game can be lost even when early moves are optimal, if these are
followed later by poor moves
15. Training Experience - Attributes 2
The degree to which the learner controls the sequence of training examples
● Teacher selects informative board states & provides the correct moves
● For each proposed confusing board state it asks the teacher for correct move
● Learner may have complete control
○ when it learns by playing itself with no teacher - learner may choose between experimenting with novel board
states or honing its skill by playing minor variations of promising lines of play
16. Training Experience - Attributes 3
How well E represents the distribution of examples over which the final P must be made
● learning is most reliable when the training examples follow a distribution similar to that of future test
examples.
17. Attributes 3 - checkers learning scenario
● The performance metric P is the percent of games the system wins in the world tournament.
● If its training experience E consists only of games played against itself, there is an obvious danger that
this training experience might not be fully representative of the distribution of situations over which
it will later be tested.
● For example, the learner might never encounter certain crucial board states that are very likely to be
played by the human checkers champion.
18. Attribute 3 (Contd…)
In practice, it is often necessary to learn from a distribution of examples that is somewhat different from
those on which the final system will be evaluated
Such situations are problematic because mastery of one distribution of examples will not necessary lead to
strong performance over some other distribution
19. Design of Learning System
Needs to choose
● the exact type of knowledge to be learned
● a representation for this target knowledge
● a learning mechanism
20. Choosing the Target Function
Determine what type of knowledge will be learned
Assume a checkers-playing program
● Can generate the legal moves from any board state
● Need to learn how to choose the best move from these legal moves
● This learning task is representative of a large class of tasks for which the legal moves that define
some large search space are known a priori, but for which the best search strategy is not known.
21. Choosing the Target Function contd..
The type of information to be learned is a
program that chooses the best move for any
given board state
● ChooseMove : B → M
● Where B is a set of legal board state
● M is a set of legal moves
Very difficult - indirect training experience
The problem of improving P at task T
Reduces to
Learning a Target Function such as ChooseMove
CHOICE OF THE TARGET FUNCTION WILL BE THE
KEY
22. Alternate Target Function
An evaluation function that assigns a numerical
score to any given board state
Should assign higher score to better board states
V : B → ℛ
● Where B is a set of legal board states
● ℛ denotes a set of real numbers
23. Alternate Target Function (Contd…)
If system can learn V
● It can select the best move from any current board position
○ Generate the successor board state for every legal move
○ Use V to choose the best successor
24. Possible Definition for Target function
● if b is a final board state that is won, then V(b) = 100
● If b is a final board state that is lost, then V(b) = -100
● if b is a final board state that is drawn, then V(b) = 0
● if b is a not a final state in the game, then V(b) = V(b’), where b' is the best final board state that can
be achieved starting from b and playing optimally until the end of the game
Cannot be efficiently computable - NON-OPERATIONAL Definition
Goal of learning is to discover operational description of V - evaluate and select moves within realistic
time bounds
25. Approximation to Target Function
The problem of improving P at task T
Reduces to
Learning a Target Function such as ChooseMove
Reduces to
Operational Description of the ideal target function V
Difficult to learn operational form of V perfectly
Acquire Approximation
Process of learning the target function with some
approximation - Function Approximation
The function that is actually learned by our
program -
26. Choosing a Representation for the Target
Function
Represent as
● A large table with all board states and a value for each board state
● A collection of rules that match against features of the board state
● A quadratic polynomial of predefined board features
● An ANN
27. A Simple Representation of
● xl: the number of black pieces on the board
● x2: the number of red pieces on the board
● xs: the number of black kings on the board
● x4: the number of red kings on the board
● x5: the number of black pieces threatened by red (i.e., which can be captured on red's next turn)
● X6: the number of red pieces threatened by black
where w0 through w6 are numerical coefficients or weights - to be chosen by the Learning Algorithm
28. Partial Design - Checker’s Learning
Program
● Task T: playing checkers
● Performance measure P: percent of games won in the world tournament Specification of L.Task
● Training experience E: games played against itself
● Target function: V : Board → ℛ Design choices for the implementation of the learning Problem
● Target function representation
29. Partial Design (Contd…)
Net effect of this set of Design choices is to reduce
The problem of Learning a Checkers Strategy
Problem of Learning the values of Coefficients w0 through w6 in the target function
Representation
30. Choosing a Function Approximation
Algorithm
To learn the target function , training examples are needed - (b,Vtrain (b))
For Example,
((x1 = 3,x2 = 0,x3 = 1,x4 = 0, x5 = 0,x6 = 0),+100) describes the board state b in which black has won
Steps Involved
Estimating Training values from the indirect Training Experience available
Adjusting the weights to best fit the training Example
31. Estimating Training Values
● The only training information available to our learner is whether the game was eventually won or
lost.
● We require training examples that assign specific scores to specific board states.
● It is easy to assign a value to board states that correspond to the end of the game, it is less obvious
how to assign training values to the more numerous intermediate board states that occur before the
game's end.
● The game was eventually won or lost does not necessarily indicate that every board state along the
game path was necessarily good or bad
● Even if the program loses the game, it may still be the case that board states occurring early in the
game should be rated very highly and that the cause of the loss was a subsequent poor move.
32. Estimating Training Values
One simple approach has been found to be surprisingly successful.
seem strange to use the current version of to estimate training values that will be used to refine this very
same function.
This will make sense if tends to be more accurate for board states closer to game's end.
33. Adjusting Weights - Common Approach
Is to define the best hypothesis, or set of weights, as that which minimizes the squared error E between the
training values and the values predicted by the hypothesis
34. Least Mean Squares LMS training rule
For each training example <b, Vtrain(b)>
● Use the current weights to calculate (b)
● For each weight wi, update its as
𝜼 is a small constant - 0.1
36. Final Design - Four Modules
● Performance System
○ the module that must solve the given performance task - playing checkers, by using the learned target
function(s).
○ Input : A new game
○ Output : generates game history
○ Select the next move determined by
○ The system’s performance to improve as becomes increasingly accurate
● Critic
○ Input : history of the game
○ Output : a set of training examples (b, Vtrain)
37. Final Design - Four Modules Contd...
● Generalizer
○ Input : Training Examples
○ Output : Hypothesis - its estimate of target function
○ Generalizes from specific examples
○ Our Case : LMS and
● Experiment Generator
○ Input : Current Hypothesis
○ Output : A new problem
○ Pick a new practice problem that will maximize the learning rate
○ E.g Initial game board or can create board positions designed to explore particular region of space
39. ➔ What Algorithms exist for learning general target function ? Which one performs best for what type
of algorithm?
➔ How much training data is sufficient?
➔ When and How can prior knowledge held by the learner to guide the process of generalizing from
examples?
.
.
Issues in Machine Learning