This document contains a 10 page machine learning exam for a course at the University of Wisconsin - Madison. The exam consists of 6 problems testing various machine learning concepts such as Naive Bayes classification, decision tree induction, reinforcement learning, inductive logic programming, and more. It instructs students to write their answers on the exam pages and show their work.
Problem 1 – First-Order Predicate Calculus (15 points)
1. University of Wisconsin – Madison
Computer Sciences Department
CS 760 - Machine Learning
Fall 2001
Exam
7:15-9:15pm, December 13, 2001
Room 1240 CS & Stats
CLOSED BOOK
(one sheet of notes and a calculator allowed)
Write your answers on these pages and show your work. If you feel that a question is not fully
specified, state any assumptions you need to make in order to solve the problem. You may use
the backs of these sheets for scratch work.
Write your name on this and all other pages of this exam. Make sure your exam contains
6 problems on 10 pages.
Name ________________________________________________________________
Student ID ________________________________________________________________
Problem Score Max Score
1 ______ 24
2 ______ 15
3 ______ 16
4 ______ 14
5 ______ 7
6 ______ 24
TOTAL ______ 100
2. Name: _______________________________________
Problem 1 – Learning from Labelled Examples (24 points)
Imagine that you are given the following set of training examples.
Each feature can take on one of three nominal values: a, b, or c.
F1 F2 F3 Category
a c a +
c a c +
a a c –
b c a –
c c b –
a) How would a Naive Bayes system classify the following test example?
Be sure to show your work.
F1 = a F2 = c F3 = b
b) Describe how a 3-nearest-neighbor algorithm would classify Part a’s test example.
Page 2 of 10
3. Name: _______________________________________
c) Show the calculations that ID3 would perform to determine the root node of a decision tree
using the above training examples.
d) Now consider augmenting the standard ID3 algorithm so that it also considers tests like
the value of feature X = the value of feature Y
for all pairs of features X and Y where X ≠ Y. Show what this variant of ID3 would choose as
a root node for the training set above.
Page 3 of 10
4. Name: _______________________________________
Problem 2 – Weight Space and Neural Networks (15 points)
Assume that you wish to train a perceptron on the simple training set below.
F1 Category
1 +
8 +
2 –
4 –
a) Draw the weight space for this task, assuming that the perceptron’s threshold is always set at
4. Also assume that the perceptron’s output is 1 (i.e., category = +) when the perceptron’s
weighted sum meets or exceeds its threshold; otherwise its output is 0. (Since the threshold
is constant, you need not draw its dimension in weight space. Also, do not normalize the
values of F1)
b) Assuming that we initially set the weight on the link between F1 and the output node to the
value 5, state the range of final weight settings that could result from applying
backpropagation training. Be sure to explain your answer. (Do not train the threshold in this
part; hold it constant at the value of 4. Assume that the step function of Part a is replaced with a very
steep sigmoidal activation function, so that the activation function is technically differentiable.)
c) Starting from the initial state of Part b and using a learning rate of 0.1, draw the perceptron
before and after training with (just) the last example in the training set above. For this part,
you do need to train the threshold.
Page 4 of 10
5. Name: _______________________________________
Problem 3 – Overfitting Avoidance (16 points)
For each of the following learning methods, briefly describe and motivate one (1) commonly
used technique for overfitting avoidance.
a) Nearest-neighbor learning
Brief Description (of an overfitting-avoidance technique):
Motivation (of why it might reduce overfitting):
b) Naïve Bayesian learning
Brief Description:
Motivation:
c) Decision-tree induction
Brief Description:
Motivation:
d) Neural network training
Brief Description:
Motivation:
Page 5 of 10
6. Name: _______________________________________
Problem 4 – Reinforcement Learning (14 points)
Consider the deterministic reinforcement environment drawn below. The numbers on the arcs
indicate the immediate rewards. Let the discount rate equal 0.9.
10 5
star a b
t -10
-10
-10
c
5
end 0
a) What is the best route for going from start to end? Why?
b) Represent the Q table by placing Q values on the arcs on the environment's state-action
graph; initialize all of the Q values to 2 except initialize all of the arcs directly involving
node a to have a Q value of -1. For Step 1, do exploitation. Show on the graph below the
full Q table after Step 1. Specify the action chosen and display the calculations involved in
altering the Q table.
star a b
t
c
end
Page 6 of 10
7. Name: _______________________________________
c) Assume that after Step 1, the RL agent is magically transported back to the state start. Show
the resulting Q table after the learner takes its second step from the starting state. Step 2
should be exploration. Be sure to again state the action chosen and display your calculations.
star a b
t
c
end
d) Explain one (1) major advantage and one (1) major disadvantage of using a Q network
instead of a Q table in reinforcement learning.
advantage:
disadvantage:
Page 7 of 10
8. Name: _______________________________________
Problem 5 – Inductive Logic Programming (7 points)
Assume that we tell FOIL that P(a) and P(b) are positive instances of P(?X) and that P(c) and
P(d) are negative instances (where ?X is a variable, while a, b, c, and d are constants).
We also give the following background knowledge to FOIL:
Q(a) ¬Q(b) Q(c) ¬Q(d)
R(a) ¬R(b) R(c) R(d)
(where “¬” means “not”).
Show the calculations that FOIL would go through in order to choose its first rule for P(?X).
Page 8 of 10
9. Name: _______________________________________
Problem 6 – Short Discussion Questions (24 points)
a) Why might it make sense to learn a "world model" when learning from reinforcements?
b) What is the major advantage that FOIL has over ID3? Explain your answer.
c) Would you expect ensemble methods to work better for decision-tree induction or for Naïve
Bayes classifiers? Why?
d) Assume that we want to empirically compare the accuracies of two learning algorithms on a
given dataset,what experimental methodology should we use?
Page 9 of 10
10. Name: _______________________________________
e) Assuming one has linearly separable data,what is the key difference between standard
perceptron training and Support Vector Machines?
f) Briefly explain one (1) connection between the Minimal Description Length principle and
Support Vector Machines.
g) What role does the VC Dimension play in machine learning?
h) Why does one need both tuning and testing sets in machine learning?
Have a good vacation!
Page 10 of 10