The use of e-assessment methods to facilitate and evaluate learning is a growing trend in the higher education space. In particular, the use of online tests has increased rapidly concomitant with the expansion of digital technologies for teaching purposes. Online tests, in the context of this presentation, refer to computer assisted-assessment where the deployment and marking is automated and typically involves objective types of questions such as multiple choice questions (MCQs), true/false questions, matching questions as well as predetermined short answer questions. The growing sophistication of Learning Management Systems(LMSs) such as Blackboard provide an increasing capacity for different types of online tests to be deployed, administered and marked efficiently. Additionally, most major textbook publishers and authors in certain disciplines provide online question banks that can easily integrate with LMSs meaning less time is spent on creating tests from scratch.
With these trends in mind, questions arise around the efficacy of online tests in higher education.
In this presentation we will share findings of a study investigating practices around online tests. First, we will explore what the literature reveals about the role of online tests in higher education and particularly how online tests are used to lead to student learning through formative assessment processes and feedback practices. Secondly, the presentation will review the practices around online tests at the Charles Darwin University Business School and discuss emerging issues. Thirdly, the presentation will distil some preliminary guiding principles around designing, developing, administering and reviewing online tests for effective learning and assessment. Finally, ongoing and further research by the team on the topic of online tests will be highlighted.
Online Tests: Can we do them better? | Bopelo Boitshwarelo, Jyoti Vemuri, Hannah Reedy & Anna Stack (CDU) | TLCANZ17
1. Online Tests: Can we do
them better?
Charles Darwin University
Bopelo Boitshwarelo
Jyoti Vemuri
Anna Stack
Hannah Reedy
2. 2
Outline of the presentation
• Why a study on online tests?
• What the literature says
• Practices @ CDU(Business School) around online tests
– Data from LMS practices and trends
– A practical experience -Jyoti
• Further research work
• Conclusion
• Discussions
3. 33
Why a study on online tests?
• High usage of online tests in HE
• In 2016, an analytics report: 5000 tests in
CDU units
• Over 40% of tests in Business School
4. 44
What have we
learnt from the
literature?
Boitshwarelo, B., Reedy, A. K., & Billany, T. (2017). Envisioning the use of online
tests in assessing twenty-first century learning: a literature review. Research and
Practice in Technology Enhanced Learning, 12(1), 16.
5. 5
Defining online tests
We use the term ‘online tests’ to specify a particular type of ICT-
based assessment, or e-assessment that can be used for
diagnostic, formative and summative purposes. While e-
assessment can be used to broadly refer to any practice where
technology is used to enhance or support assessment and feedback
activities, online tests specifically refer to computer assisted-
assessment where the deployment and marking is automated
(Davies, 2010; Gipps, 2005). (Boitshwarelo et al, 2017 p. 3)
6. 66
What have we learnt from the literature?
REASONS/RATIONALE
– Efficiency
– Breadth of content particularly foundational knowledge
– Versatility ( e.g. test banks and LMS)
– Reliability(objectivity)
CONTEXT OF USE
– Should assess appropriate learning outcomes
– Best used as part of a whole learning experience including scaffolding,
feedback practices and learner self-regulation.
• E.g. confidence based marking, EVS and defence
– Can assess depth of understanding if blended with other assessment methods
QUESTION TYPES AND COGNITIVE LEVELS
– MCQs are the most common forms of questions
– Online tests used to test mostly knowledge, less so as you go up the Bloom’s
taxonomy.
– The synthesis/creation level, is the most difficult to assess through online tests,
MCQs in particular.
7. 77
What have we learnt from the literature?
FORMATIVE LEARNING
– Help learners prepare for summative assessment
– Effective as formative assessment if :
• Students are motivated and engaged e.g. regular low stakes tests
• There is opportunities for multiple attempts
• Feedback(beyond just wrong/right) is provided either as part of the online
test or as part of the teaching process
STUDENTS ATTITUDES
– Online tests as strategies for formative learning mostly benefit student if:
• they are motivated to achieve high performance
• make an effort to engage and participate with learning
• are not constrained by circumstances such as time or access to technologies
CHALLENGES/ISSUES
– Cheating
– Feedback (lack of, when to, how?)
– Targeting low cognitive levels
– Frequency (how frequent)
– Student diversity(is it for everyone)
8. 88
Practices @ CDU
Business School
Ongoing investigations
– Survey of tests in units in
Learnline (Blackboard) and
accreditation information
– Practical experience
9. 9
Extent of use
2016-2017(2 semester)
Number of units: 78 units
Total number of tests available: 490
Graded= 228
Ungraded = 262
Average graded test per unit: 2.92
10. 1010
Distribution across disciplines
Discipline No of
units
No of
tests
Graded tests per
unit
Accounting 20 84 4.2
Economics: 13 14 1.1
Management 20 121 6.1
Marketing
8 4 0.5
Law 7 2 0.3
Other (Research/
Placement/
Honours)
10 3 0.3
Number of units per discipline
0
1
2
3
4
5
6
7
Averagenumberofgradedtests
Average graded tests per unit
Average graded tests per unit
12. 1212
Weighting
0 1 2 3 4 5 6 7 8 9 10
Level 100
Level 200
Level 300
Level 400
Level 500
Number of tests by weighting
Test weighting per unit distributed by level
Weighting over 50% Weighting 41-50%
Weighting 31-40% Weighting 21-30%
Weighting 11-20% Weighting 1-10%
22%
59%
19%
Weighting of tests per unit across
the Business School
Weighting 1-10%
Weighting 11-20%
Weighting 21-30%
14. 1414
Frequency of question types within School of Business units
100%
46%
9.00% 10.00% 2.00% 5.00% 2.00% 2.00% 2.00% 2.00%
MCQ T/F Fill in the
blank
Short answer Multiple
answer
Essay Formula Jumbled
sentence
Matching Order
16. 1616
<1 day, 18%
2-4 days, 12%
5-7 days, 25%
8-20 days, 3%
20+, 1%
no restrictions, 37%
unspecified, 4%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Test options- Attempts, availability, randomisation
96%
4%
Units With Multiple Attempts Allowed In Graded
Tests
Single Attempt
Multiple attempts
91%
9%
Question order in all tests
Randomised
Set order
68%
23%
9%
Presentation of questions by unit (includes graded and
ungraded tests)
One at a time
All at once
Varies within unit
Availability ranges for all tests within the CDU Business School
17. 1717
Feedback
93%
7%
When feedback is provided by unit
Feedback after
submission
Feedback delayed
92%
8%
Qualitative feedback
Qualitative
Feedback
provided
“Lack of feedback can have negative memorial consequences on
student learning particularly when multiple choice questions
(MCQ) are used. MCQs expose students to answers that are
incorrect and this can reinforce incorrect understandings and
influence students to learn false facts if feedback is not given
(Fazio, Agarwal, March & Roediger, 2010; Roediger & Mash,
2005). This negative impact of MCQs is reduced when immediate
or delayed feedback is provided (Butler & Roediger, 2008).”
(Boitshwarelo et al, 2017 p. 15)
19. 19
Practical experience
Online tests as formative/summative assessment
• Used in accounting course
• Units that are problem solving
• Lend well to multiple choice questions or true/false questions.
• Give instantaneous feedback to students
• Are time efficient in cases of large cohorts.
20. 2020
Practical experience
Online tests as summative assessment
• Used in first year accounting unit
• Quick test of student’s understanding of concepts
• Administered weekly, so induces a student to read the material before attempting the test.
• The tests are graded so students are motivated to do them in a timely fashion.
21. 2121
Practical experience
Online tests as formative assessment
• Used in second year finance unit
• Quick test of student’s understanding of
concepts.
• Administered weekly, to motivate a
student to read the material & practice
taking the test.
• The tests are ungraded so students are not
motivated to do them in a timely fashion,
thus defeats the objective
23. 2323
Further work Further research work will be done through:
– Student survey
– Staff survey
– Staff Interview
– Intervention
24. 24
Online tests: Can we do them better?
Yes, if we consider:
- Pedagogy
- Curriculum mapping
- Practical considerations
- Technical capabilities of LMS(Blackboard)
We are going to discuss online tests, in particular whether we can improve the way we do them.
Our presentation team is multi-dimensional and by that I mean we have different roles that we play at the university.
I work for a central office called the OLT and I do academic development or educational development work i.e. I provide support around the area of curriculum development and learning design.
Jyoti is a lecturer in the Business School and teaches accounting units.
Hannah is an accreditation officer, so she supports the Business School around the course accreditation process.
Anna is one of the administrators of our learner management system but also has a strong background in teaching. We do hope that these different roles will add richness to our discussion to day around this topic of tests.
To start off the discussion, why did we decide to study online tests? Well we realised that with the growing versatility of educational technologies there has been an increase in the use of online tests in higher education in general and at CDU in particular.
This was evidenced when we ran some analytics on our LMS in 2016 we discovered that there were well over 5000 tests ( this includes undeployed tests and other tests that have accumulated over the last few years, but it is an indication of how widespread the tests are).
Of these over of 40 % were from the Business School and that prompted us to explore the practices at the school in terms of why tests are used so extensively, how they are used. However, before we delved into this investigation we obviously needed to find if there was literature around this topic and if there was, what it was saying about, for example, the pedagogical merits of tests or lack thereof.
So what did we learn from the literature. While the literature was not exhaustive it was indicative of what seems to be the practices in online tests. Our findings are summarised in this paper which was authored by myself and two other colleagues.
In this paper we limited our definition of online tests to any form of e-assessment where the deployment and marking or grading is automated or largely automated.
Therefore, although the test tool in Blackboard allows for question types such as essays and file responses, our definition only included the objective kind of tests where you have a definite answer. That was our way of scoping our study and it is quite consistent with most of the literature we have read.
So what have we learnt from the literature? In reviewing the literature we identified a number of themes or categories. Most of these would be obvious to some people and probably not so obvious to others so we think it is useful
Firstly the reasons for using online tests are varied: they are mostly used for efficiency purposes especially where large student groups are concerned. Related to this point is that online tests can cover broad areas of content within a short time and this usually foundational concepts in a unit or module. Another reason for this widespread use of online tests is that the tools are versatile for example you have publisher test banks that integrate seamlessly with LMSs, so there is ease of use.
Another thing we learnt from the literature is around the context within which tests are used or should be used. They should assess appropriate learning outcomes implying that there are contexts or situations where online tests are inappropriate. The other thing is that online should not be used in isolation, they should be part of a whole learning experience which includes the use of scaffolding and feedback practices and the use of other assessment methods.
The most common type of tests questions by far MCQs and they are predominantly used to test knowledge levels in the Blooms taxonomy of cognitive levels. Comprehension and application related questions are pretty common too and some analysis level questions although limited are not uncommon. Essentially, the higher you go the Bloom’s taxonomy the less appropriate online tests become as a method of assessment with the synthesis/create levels the hardest to assess.
We have found that online tests play a very important role in formative learning in that they can be effectively used to help learners to prepare for summative assessment either by ensuring students master the basic concepts before they apply them through other forms of assessment or and/or by providing practice and providing mastery of content before high stakes tests or exams.
The formative learning role is enhanced if students are motivated and engaged and the engagement can be through regular low stakes tests.
For the formative tests to be effective students should have opportunities for multiple attempts at a test or have practice tests, with feedback provided not just in terms of marks but also in terms of qualitative feedback.
The effectiveness of online tests as formative learning strategies is also dependent on student attitudes. The online tests would normally benefit students who are motivated to achieve high performance and who make an effort to engage and participate with learning.
Sometimes even students who are motivated are constrained by circumstances to engage fully with online tests and benefit from them or technology related issues.
We have also identified a number of issues and/or challenges:
Cheating, online tests can generally be taken anywhere, anytime increasing the chances of cheating especially if the tests are high stakes. Cheating is often deterred by utilizing a variety of test settings in an LMS that for example, randomization of questions and responses, single question delivery on each screen, no backtracking to previous questions, and/or setting very tight time frames in which to answer the questions, i.e. limited availability of the test as well as online or face-to-face proctoring.
Feedback: there are issues around what kind of feedback, how much feedback, when to be given, is it being used. The key issue is usually around deciding on efficient and/or effective feedback strategies.
There is issue of how much depth can you really assess with online tests especially multiple choice questions.
Frequency of tests: How many is too many or how little is too little?
Are online tests for everybody, some students may prefer them, others may not
This is what the literature is telling us but what are the practices at CDU especially in the Business School? Exploring this question is part of a broader ongoing investigation. So far we have surveyed tests in all the Business units in Learnline( Blackboard environment) as well as looking at the key unit accreditation documentation related to those online tests. So we would like to share some of the trends with you and how we think they relate to the literature. Hanna and Anna will lead us in that discussion.
After they have taken us through Jyoti will share her practical experience around how she has used online tests over a number of years.
Here we see the distribution across the CDU Business School disciplines. We have categorised the disciplines across the school: Accounting, Economics, Management, Marketing, Law and Other (which includes Research/ Placement and Honours units).
There is evidence in the literature that online tests are used across the various disciplines of Business, however it is not clear what disciplines seem to use it the most.
At the CDU Business School, Accounting and Management units had the highest relative use of online tests (i.e. well above the 1:3 ratio). Law, Marketing and others were on the lower end of the spectrum with an average of way less than 1 test per unit).
The literature identified that commonly foundational units, usually at first year of university study, use online tests, and less and less as you go up the levels.
So far in our research we are not seeing any obvious trend in our case, except perhaps for the Accounting discipline.
On average there are slightly more tests in 100 level units.
There are a number of possible reasons including:
Generally there are less units at the level 100 than at the level 200 and/or 300 level.
Some foundational concepts may be introduced at 200 level
Numerical subjects e.g. in Economics do lend themselves to online tests even if they are at a higher level.
the Level 500 units are for conversion masters programs, meaning they have whole array of units, including foundational units. Expand
However, further investigation is warranted.
The literature indicates that student learning is generally enhanced when online tests are regular (but not too overwhelming) and are assigned low stakes credit (low weighting).
In our data, over 80% of units that have weighted tests have between 1-20% weighting, with 20% been the most common. This could be considered low stakes given that the 20% is usually divided across a few tests.
Preliminary data
Online tests used to test mostly knowledge, but also comprehension, application
Sometimes analysis with less and less use as you go up the cognitive levels.
The synthesis/creation level, the most difficult to assess through online tests, MCQs in particular.
The cognitive levels were determined on the basis of our interpretation of the learning outcomes(LOs), with the assumption that they (LOs) were mapped to the assessment items correctly.
So it was a subjective exercise. However assuming there is a level of accuracy in the learning outcomes and our interpretation of them, the chart would be a fair representation of what is currently the case. It does not quite map with what the literature seems to be showing. Generally we would have expected the charts to be going down as you go up the levels.
However there is possible explanations for some of the observations:
The lower number of tests assessing remember could be because there is a limited number of unit learning outcomes at the remember level. Generally at AQF level 7 (Bachelor Degree) and AQF level 9 (Masters Degree) learning outcomes are a higher skill level than simply Remember.
The level 500 units are skewing the results. If they are taken out the expected pattern largely holds except at the tail ends.
Note that the Level500 units are for conversion masters programs, meaning they are masters level courses that require no prior discipline knowledge that their undergraduate degree hasn’t prepared them for- therefore they have whole array of units, including foundational units.
The interesting observation is the substantial number of units assessing create, at the level 200. This is unexpected and warrants further investigations especially the nature of tests and the type of questions being asked. It is possible that the Learning Outcomes have not been mapped correctly or perhaps our interpretation of the Unit Learning Outcomes was incorrect.
Distribution across question types were more or less as anticipated and reflected what we had found within the literature. That is that typically, online tests involve the use of multiple choice questions (MCQs), true/false questions, matching questions as well as predetermined short answer questions. Of these, MCQs are the most commonly used question type (Davies, 2010; Nicol, 2007; Simkin & Kuechler, 2005).
There is, though, close to 20% pockets of “innovation” in terms of other types of questions, and while this does relate to the wide range of cognitive levels being tested, we did also find that learning outcomes were covered by multiple assessments, not just tests and that anecdotally tests help students to master concepts and apply in consequent information. This is an area of interest that we will took to interrogate further through out staff interviews.
Here we see the range of frequency by unit. With over half falling into the 1-2 test per unit category and this reflects the prevalence of “mid-semester tests”.
Interestingly the literature points to the benefit of ‘regular, low stake’ online tests designed to prevent students from falling behind. It will be interesting to look into student responses to our survey and whether they agree to this point.
While this points to the frequency of graded tests, there were instances of ungraded ‘practice tests’ in units, which is common in the literature as a method to familiarise students with the functionality and requirements of summative tests.
While regular online tests and multiple attempts are recommended, there is a risk that a high frequency of tests can become overwhelming for both staff and students. Therefore, a balance needs to be struck between optimising student learning and consideration for staff and student workload. Efficiency vs Effectiveness. The mid-range of 3-6 tests is the lowest with only 19%; perhaps this should be the dominant range as it best meets the “regular but not too overwhelming” criteria.
This depiction of availability settings for tests includes data for all tests, both graded and ungraded, to show the wide range of implementation across the Business School. This is clearly a test option that is well utilised. As expected the majority of tests with restrictions were those that were graded, which similarly tended to utilise single attempts. As we anticipated, there was less opportunity for multiple attempts when tests are graded. Anecdotally, those with multiple attempts looked to employ random blocks and test pools. Randomisation of question order was used extensively, as was the presentation of questions one at a time.
This looks to reflect the challenge of academic integrity within the literature, where cheating is often deterred by utilizing control LMS features that for example, randomization of questions and responses, single question delivery on each screen, no backtracking to previous, questions, and/or setting very tight time frames in which to answer the questions.
Top 4 deterrent to cheating in order of effectiveness: using multiple versions of a test so that students do not all receive the same questions, randomizing question order and response order, not using identical questions from previous semesters, and proctor Vigilance.
Single attempt tests is the common practice in the school and with feedback in the form of a score and indicating what is wrong or right.
The literature indicates that student learning is enhanced when there is opportunity for multiple attempts at a test accompanied by qualitative feedback. In addition practice tests play a role of building confidence in taking test.
Publisher textbook Q&As -
Feedback we see that despite multiple functionalities and options available through Learnline, that we see limited variation from the defaults here. This links with anecdotal evidence of the misunderstanding the nature of the feedback settings which could be something addressed at the system administration level, i.e. updating the default settings in keeping with recommendations within the literature. Though this does sit with the
Single attempt tests is the common practice in the school and with feedback in the form of a score and indicating what is wrong or right. Though our review did see the use of qualitative feedback directing students to chapters or learning objectives, as well as examples of the correct calculations to arrive at the correct multiple choice answer.
This is something we will need to review at greater length in determining where on this matrix of efficiency and effectiveness online tests will best sit.
Jyoti has shared some of her experiences, but moving forward we will be doing a student survey, starting next week, followed by a staff survey as well as Staff interview to dig deeper into the nature of these practices and experiences across the whole school.
Our presentation topic is a question: can we do Online tests better? And the answer is a resounding yes: However as we have hopefully implied from this discussion, it is not just a matter of grabbing questions from textbook testbank. It’s about thinking about a number of things:
Pedagogically, what do we want to achieve, the tests should not just be a grade-generating exercise but should be a tool that promotes feedback practices and learner self-regulation and engagement as the literature suggests.
We should also think about how the curriculum mapping or design informs our practices. The way we do our mapping can constrain or enhance good practices: decisions such as the weighting we give to tests, what learning outcomes we assess with them can affect how we implement tests. We should map the right things at the right level and give them the right value.
There is also the practical considerations such your student cohorts, study mode, student needs and interests and academic integrity issues etc.
What is possible within our LMS e.g. there is a number of settings that can help you enhance your practices some of which were covered by the presentation from.