Slides presenting preliminary overview of thesis work presented at the International Conference on Electronic Learning in the Workplace at Columbia University on June 11, 2010.
IRJET- Analysis of Student Performance using Machine Learning Techniques
ICELW Conference Slides
1. The Design and Development of an
Expert System Prototype for Enhancing
Exam Quality
Presenter: Paul DeCarlo
Professor: Nouhad Rizk
Date: 6 / 11/ 2010
DeCarlo, P., & Rizk, N. (2010). The Design and Development of an Expert
System Prototype for Enhancing Exam Quality. International Conference on
2. Introduction
Data mining or knowledge discovery in databases (KDD) is the
automatic extraction of implicit and interesting patterns from
large data collections (Klosgen & Zytkow, 2002).
Rule discovery is one of the most popular data mining
techniques, especially in EDM(Educational Data Mining),
because it shows the teacher information that has been
discovered and does so in an intuitive way (Romero & Ventura,
2004).
Conventional rule-based expert systems, use human expert
knowledge to solve real-world problems that normally would
require human intelligence. Expert knowledge is often
represented in the form of rules or as data within the computer.
Rule-based expert systems have played an important role in
modern intelligent systems and their applications in strategic
goal setting, planning, design, scheduling, fault monitoring,
diagnosis and so on (Abraham, 2005).
3. Study Purpose
Course evaluations are typically done once per semester at college
universities. Furthermore, students who drop a course are usually
not considered in these evaluations.
Course evaluations typically do not consider evaluating the efficacy
of course materials including assignments, textbooks, review
materials, etc.
Evaluations done at intervals, would be able to capture issues as
they happen and include students who are intending on dropping.
This could inform the educator of what is actually happening in their
course instead of providing information after the fact.
Most of the current data mining tools are too complex for educators
to use, thus a system which can automate this process to create a
human readable evaluations dynamically is of much importance if a
system of this type is expected to be adopted by educators.
4. What is Exam Quality?
Exam quality refers to how well an examination of learned material
reflects the information provided in course learning materials. Think of
the way validity is defined in research methods.
5. What is Association Rule
Learning?
Association rules are a data mining technique used to discover relations
between variables in large example sets.
Support is defined as the probability that an example contains a subset X
when randomly chosen from the total set of responses. Support of an
association rule ‗A=>B‘ is defined as the ‗support of (A union B)‘.
Confidence refers to the likelihood that for a transaction containing A, how
likely is it that it also contains B. Confidence of an association rule ‗A=>B‘ is
defined as the ‗probability that an example contains B given A divided by
the probability that an example contains A‘. This is the same as the
‗support of (A union B) divided by the support of (A)‘.
Algorithm developed by Rakesh Agrawal (1993).
1. Minimum support is applied to find all frequent itemsets in a database.
2. These frequent itemsets and the minimum confidence constraint are used
to form rules.
The rules this technique produces can be interpreted as easily as they can
be read. For example, a typical rule may take the form {studies daily} =>
{has a high GPA}. This would mean that the feature 'studies daily‘ implies a
given example {has a high GPA}.
6. Data Collection Driven by W-CAT
model
We must have a set of data from which to mine our rules. There has been
research done applying data mining techniques to CMS logs (i.e. Moodle).
Our system seeks to include subjective data and requires a survey interface
to collect current data. Some of the information we ask in the survey could
be gathered from a CMS (homework / exam review completion).
To drive our data collection process and the overall workflow of our system,
we used the Witty Cat model developed by Dr. Nouhad Rizk at the
University of Houston to guide the creation of our survey.
The Witty Cat Model
7. Example Survey Questions &
Results
•Data gathered using
the open-source
LimeSurvey online
survey software.
•The responses can
be considered valid,
as invitations to the
survey are distributed
using a secure token
system.
8. The RapidMiner Process Tree
•We used the open-source data mining tool,
RapidMiner for the generation of our rulesets.
This tool allows for visualization and handling
of remote databases.
•In our initial study (DeCarlo & Rizk 2010), the
survey data was cleansed by converting the
numerical grades to nominals A-F. These
were then converted to binomial data.
•Questions which used a 5 point ranking scale
were discretized into bins and processed as
binomial data.
•Frequent Itemsets were generated and we
then applied then generated our association
rules using a built-in implementation of
Agrawal’s Apriori algorithm.
9. Rules generated in our case
study
Results from 50 students enrolled in a College level Computer
Organization and Design
Course at the University of Houston Fall 2009.
10. Impact of Pilot Study on Instructor
Methodology
Our study showed that 62% of students owned the course textbook. Of
that 62% only 8% found it useful. This information allowed the instructor
to consider teaching more from the text, removing the text completely ,
or adopting an alternative text. Further inquiry from the students
revealed that they were in favor of a better textbook. Specifically one
which offered more MIPS programming overview. This was
corroborated by 62% of students supporting an increase in
programming exercises.
Our rules implied that students who viewed the video->expected to do
well on exam2‘ and students who studied primarily using the review
video->received F. We can consider the pairing of these rules to imply
that the review video instills false confidence in students. I personally
inform my students now that there is a review video, but research
indicates it may lead to a failing grade if used on its own.
From these examples we can see that a system of this type may be
beneficial to both students and instructors.
11. Development Issues raised in Pilot
Study
We need to implement safe guards to protect against meaningless or
contradictory rule generation. By seeking pre-defined rules this can cut
the computational resources needed to generate our rules and solve
both of these issues while remaining adaptive.
Other techniques may prove to be more useful to achieve a dynamic
assessment. For example neural networks . A neural network of
Moodle data combined with information on the final evaluation of the
students has been used to obtain models that permit to predict what
students are in situation to surpass a course(Calvo-Florez 2006). Other
researchers suggest using a combination of techniques to achieve more
interesting results (Romero 2007).
12. Current state of WittyCat
We have created a stand-alone data collection system which does not
rely on LimeSurvey.
We are implementing the automation of the Apriori algorithm on
collected data for one-click rule generation.
We are currently developing an inference engine to provide backward
chaining driven explanations of the conclusions arrived at in a human-
readable format.
13. Overview of the Desired Final
Product
A dynamic, adaptive, specified, intelligent, assessment tool with expert
adaptation.
14. How you can contribute
If you are teaching an online course with examination given at intervals,
we can use your data and generate feedback. We are also interested in
your subjective critiques of the W-CAT analysis.
To use the system simply visit wittycat.volatileassertion.com and
register. Next, watch the instructional video and import your students
and course materials as outlined in your syllabus.
Your participation can help us determine the subjective validity of the
assessments produced by our tool
You may contact Paul DeCarlo at pjdecarlo@uh.edu or Dr. Nouhad Rizk
at njrizk@uh.edu for further information.