This document describes a course recommender system that aims to recommend courses to students based on their past performance. It does this by using a machine learning algorithm to predict students' grades in courses based on their grades in previous courses. The algorithm trains on a dataset containing students' grades in various courses. It learns weights representing the importance of each previous course on performance in the target course. These weights are used to calculate predicted grades. The algorithm is tested on masked data and iteratively updated to reduce error between predictions and actual grades. The goal is to predict grades with under 10% error to provide good course recommendations. Alternate methods like course-specific and student-specific regression are also described.
2. The Problem
The aim of this project is to recommend the courses based on the student
liking. The recommendations will be generated based on the previous
courses which student took, trying to find similarities between the courses
and the predicted future performance of the student.
Students face constant dilemma when they want to select a subject out of
elective subjects. If we can determine the performance of student on each
and every elective subject by predicting the grade we can sort the results
and provide the student best recommendation out of all the elective
subjects.
3. Approach To The Problem
Dataset:
We have been given 5 CSV files which contains information about the
students and the courses offered by DigiPen. All the data is anonymous:
1)StudGPAInfo : Contains info about 937 students and their cumulative
GPA, last attended.(Students who passed DigiPen. The student id’s are not
same as the other file which makes it not useful for the application).
2)StudGradesInfo : Contains info about 4102 students and mid/final grade
in each subject.
3)StudChangeMajorInfo : Contains info about 448 students who changed
their major.
4)ProgramCourses : Contains info about all the courses offered since 2011
to 2017 (Credits,Core,Semester). There are 478 unique courses that are
offered by DigiPen.
5)CourseRequisites : All the courses which have pre requisites to current
course.
Currently the program only uses StudGradesInfo and ProgramCourses file
to make predictions.
The first step is selecting the relevant features and setting up the dataset to
train our algorithm. We use python to extract features and create a sparse
matrix of students and the subjects that they took.
4. Student 1 Student 2 ………….. Student 4102
Subject 1 .98 -1 …... .88
Subject 2 -1 .77 …... -1
………….
………….
………….
Subject 478 .52 .73 …... -1
Each number in the matrix represents the grade that student achieved at
the end of the semester in a particular subject. 0 represents student
achieved 0% and 1 represents 100%. If the student didn’t take the subject
it’s represented as -1. Hence we have a sparse matrix of dimension
478x4102. We export this file as a text file to train our algorithm.
Creating Training Set and Test Set:
In conventional data splitting stage we divide the dataset into 70/30 split by
choosing first 70% values as training set and testing remaining 30% values
on test set.
We can’t apply the same method in this case since we have to train on
each subject and train on each student. Thus, we use the method of
masking the data. For example in the above table we will select 30% of
cells randomly with GPA values on them and dump it into test set. We will
treat it as -1 as if the student hasn’t taken that course and try to predict the
grade.
5. Student 1 Student 2 ………….. Student 4102
Subject 1 ? (-1) -1 …... .88
Subject 2 -1 .77 …... -1
………….
………….
………….
Subject 478 .52 ? (-1) …... -1
After training our algorithm we will try to predict the ? values. If the
predicted value was .90 for Student 1 and Subject 1 we know that our error
was .08 which is 8.16%. If we can determine the grade with average
accuracy of less than 10% we can determine how a student will perform on
future elective subjects based on past performance.
Learning:
The grade that student receives depends on :
● Difficulty of subject.
● Performance of student.
Every subject has requirements to fulfill in-order to pass that subject like
homework, assignments, midterm, final-exam, etc. Student performance
depends on his/her performance on each requirement.
Thus we consider the subject and student to be a vector of 7 traits.
Subject : [x1, x2, x3, x4, x5, x6, x7]
Student : [y1, y2,y3, y4, y5, y6, y7 ]
6. We are trying to learn individual weights of each vector. Each variable in
the vector represents weight for one of the requirements and performance
of student on each requirement respectively.
The grade in the end will be the dot product of both the vectors.
Grade = [x1*y1 + x2*y2 + x3*y3 + x4*y4 + x5*y5 + x6*y6 + x7*y7]
(Note : We chose to have 7 elements in each vector after testing with
different vectors of different size from 2-10 and 7 variable vector gave the
best accuracy. We will experiment with different variable in future if it
affects the accuracy.)
The application is using Q-Learning algorithm to adjust the weights by
using the error term and actual score the student received in that subject.
Pseudocode for our algorithm (LearningSubjectVector)
For each course in all courses
For each weight in all course vector
For each student in all students
If student has taken the course
Prediction = Calculate dot product from both the vectors
Error = prediction - actual
TotalError += Error
discountFactor = discoutFactor*Stud_Actual_Score
newWeight= Stud_Actual_Score+learning_rate(TotalError-discountFactor)
This process is repeated 2000 times to learn the weights of subjects and
students. Same algorithm is used for learning the student vector.
After every 100 iteration the weights are saved into two separate text files.
These weights could be loaded back to the application when making a new
prediction
7. Performance
The algorithm starts with 59% error and after 2000 iterations it reaches
27-28% error.
Next steps of improvement are if we want to predict the grade of student in
CS class it should be only based on only CS + MAT class instead of
considering all the classes. It will improve the accuracy of the predictions.
Considering batch training instead of training the data at once which could
improve the prediction. Also playing with different learning rates and
discount factor values.
Making changes to the main algorithm. Instead of using Q-Learning to
reach the optimal values I would implement different variations of gradient
descent like vanilla, SGD, Adam and try test the results.
If we reach the error less than 10% I will store the weights and try
predicting the best recommendation for the elective subject for a particular
student after 1st year of performance data.
8. Alternate Methods
1) Course Specific Regression (CSR)
Undergraduate courses are structured in such a way that the courses that
student takes at the beginning prepares students for the future class.
This method assumes that the performance of student in the previous class
directly impacts the future class performance.
Example: grade in class CS300 depends on student’s performance in MAT
100 and CS 200.
Hence if we are able to calculate the weight/contribution of each class and
multiply with the students performance on that class then we can calculate
the final grade.
Future_Grade_For_Class_CS300 =
Grade_In_ClassMAT100 * Contribution_MAT100 +
Grade_In_ClassCS200 * Contribution_CS200
Suppose:
Grade_In_ClassMAT100 = 80%
Grade_In_ClassCS200 = 70%
Contribution_MAT100 = 30%
Contribution_CS200 = 70%
Future_Grade_For_Class_CS300 = (0.8)(0.3) + (0.7)(0.7)
Future_Grade_For_Class_CS300 = 0.73
9. Thus we get the formula
Future_Grade = [Grades_In_Previous_Class]^T* [Weights_Each_Class]
How to get the matrix of Weights_Each_Class?
We will create a matrix only including the rows of students who have
actually taken that class previously. We will include the grade of each
subject that they have taken before attempting the current target course.
: Actual grade received in the subjectY
︿
1 : Vector of 1s
: Bias Valueb0
W : Vector of Weights (We are trying to learn)
G : Matrix of grades of other student grades similar to target student
, : Regularization parameters to control overfittingλ1 λ2
W WY 1b W|
|
︿
− 0 − G |
|
2
+ λ1| |
2
+ λ2| |
2
2) Student Specific Regresssion(SSR)
The downfall of CSR (Course Specific Regression) method is sometimes
there is too much flexibility in terms of selection of subjects.The order of
selection of subjects by students is messed up.
Hence there is alternative method which solves this problem. It’s called
Student Specific Regression(SSR).
10. We will compare our target student (Who has taken N number of courses
uptill now) with other students and see if they have taken a minimum of K
subjects which are common with our target student (K is subset of N and
always K<N)
If yes then we include the other student in our training data else we just
exclude the other student from dataset while training to improve accuracy.
Also we remove the data of the subjets which have not been taken by the
target student from other students data. Thus we don't consider the weights
of subjects that has not been taken by the target student.
We train apply the same formula for learning the weights of each subject
and predicting the grade.