Computer adaptive testing (CAT) is a form of computer-based test that adapts to the examinee's ability level by selecting subsequent test items based on the correctness of previous responses. CATs require fewer items than traditional tests to estimate a test-taker's ability level accurately. Key components of CAT include an item pool, entry level, item selection rule, scoring method, and termination criteria. Major advantages of CAT include increased precision, shorter test length, and a more positive experience for examinees. Many standardized tests now use CAT formats.
4. Computer Adaptive Testing?
Computerized adaptive testing (CAT) is a form of computer-based test that adapts to the
examinee's ability level.
It has also been called tailored testing.
In other words, it is a form of computer-administered test in which the next item or set of items
selected to be administered depends on the correctness of the test taker's responses to the
most recent items administered.
5. Origin of CAT
Generally, computerized testing originated in the early 1970s (Drasgow, 2002; Wainer, 1990).
Specifically, the first Computer Adaptive Test (CAT) was created by Larson and Madsen (1985) at
Brigham Young University, in the USA.
Besides, After Larson and Madsen (1985), several scholars (e.g., Kaya-Carton, Carton & Dandonoli,
1991; Burston & Monville-Burston, 1995; Brown & Iwashita, 1996; Young, Shermis, Brutten &
Perkins, 1996) were motivated to construct and develop more computer adapted tests throughout
the 1990s.
In 1998, the Test of English as a Foreign Language (TOEFL) began to use computer-adaptive
testing format
6. How does CAT Works?
CAT successively selects questions for the purpose of maximizing the precision of the exam based on
what is known about the examinee from previous questions.
From the examinee's perspective, the difficulty of the exam seems to tailor itself to their level of
ability. For example, if an examinee performs well on an item of intermediate difficulty, they will then
be presented with a more difficult question. Or, if they performed poorly, they would be presented
with a simpler question.
Computer-adaptive tests require fewer test items to arrive at equally accurate scores. (Weiss and
Kingsbury,1984)
7. The basic computer-adaptive testing method is an iterative algorithm with the following steps:
The pool of available items is searched for the optimal item, based on the current estimate of the
examinee's ability
The chosen item is presented to the examinee, who then answers it correctly or incorrectly
The ability estimate is updated, based upon all prior answers
Steps 1–3 are repeated until a termination criterion is met
(Thissen & Mislevy, 2000).
8. Item response theory
Item response theory is a probabilistic model that attempts to explain the response of a person to an
item.
In its simplest form, item response theory posits that the probability of a random person j with ability θj
answering a random item i with difficulty bi correctly is conditioned upon the ability of the person and
the difficulty of the item. In other words, if a person has a high ability in a particular field, he or she will
probably get an easy item correct. Conversely, if a person has a low ability and the item is difficult, he or
she will probably get the item wrong.
9. Item Response Theory
For example, we can expect someone with a large vocabulary to respond that they know easy words like
„smile‟ and „beautiful‟ but we should not expect someone with a small vocabulary to know words like
„subsidy‟ or „dissipate.‟
When we analyze item responses, we are trying to answer the question,
“What is the probability of a person with a given ability responding correctly to an item with a given
difficulty?”
10. Components of CAT
A CAT procedure consists of the following components:
(a) Calibrated item pool,
(b) an entry level,
(c) an item selection rule,
(d) a scoring method, and
(e) a termination criterion.
A number of options are available for each of these components in the implementation of CAT for a
specific purpose.
11. Calibrated Item Pool
A pool of items must be available for the CAT to choose from. Such items can be created in the traditional
way (i.e., manually) or through Automatic Item Generation. The pool must be calibrated with a psychometric
model, which is used as a basis for the remaining four components.
Typically, item response theory is employed as the psychometric model. There are no specific guidelines for
the appropriate size of item pools; pools of 100 items can provide satisfactory results. What is essential to all
applications of CAT is that the difficulty levels of the items in the pool must span the full range of trait levels
in the population. Most efficient measurment results from using items with high discriminations.
12. Entry Level
In adaptive testing, it is possible to begin with items at different levels of difficulty for different students.
For example, if a student's achievement level is thought to be high, testing can begin with a relatively
difficult item. Because the level of difficulty of the items that are chosen will move to the student's trait
level as the test progresses, an erroneous entry level will not seriously affect the results, but accurate entry
levels will reduce the number of items required to achieve precise measurement. It will be shown later that
the choice of entry level should depend on the purpose of the test.
13. Item Selection Procedure
Two efficient item selection procedures are available.--- maximum information (Weiss, 1982) and Bayesian
(Owen, 1969, 1975). Both procedures involve searching the entire pool of administered items for a single
item: In maximum information item selection, it is the one that provides the maximum amount of item
information at the examinee's last trait level (0) estimate, and in Bayesian item selection, it is the one that
minimizes the expected posterior variance of the 0 estimate. Because information and Bayesian posterior
variance are related functions, in many cases a similar subset of items will be selected for a given individual
(Sympson, Weiss, & Ree, 1982).
14. Scoring Method
After an item is administered, the CAT updates its estimate of the examinee's ability level. If the examinee
answered the item correctly, the CAT will likely estimate their ability to be somewhat higher, and vice versa.
This is done by using the item response function to obtain a likelihood function of the examinee's ability.
15. Termination Criteria
Depends primarily on purpose of the test: point estimation or classification?
Point estimation: we want an accurate score for each student
Classification: we do NOT need an accurate score, just a classification into pass/fail etc.
Change: to see if score has gone up/down a certain amount
16. Advantages of CAT
Frequent retesting: the high number of permutations also enables more frequent receives a few weeks of
instruction, by the time they take a CAT again their ability has increased somewhat, and they will receive a
completely different test.
More precision: because CATs are more efficient, the organization has the option to design the CAT to
actually be more precise than a conventional test while still using fewer items.
Shorter tests: CAT can reduce testing time by 50% or more. This can obviously translate into huge financial
benefits.
Examinee experience: A CAT will provide an appropriate challenge for each examinee. Low examinees are not
discouraged or intimidated. High examinees enjoy receiving difficult items.
17. Advantages (Cont.)
Increased motivation: because of the better experience, there is likely an increase in examinee motivation.
Low examinees feel better, and high examinees feel challenged. Both will try harder than with a
conventional test.
Equiprecision: CATs can be designed so that examines are all measured with the same level of precision,
even though they all potentially see different items. This makes test extremely fair from a psychometric
perspective.
Security: because the CAT algorithm is very flexible and can adapt with potentially millions of
permutations, there is much greater security than if everyone was administered the same set of 200 items.
18. Disadvantages of CAT
Recovery of poor starts: CATs are susceptible to issues with examinee test anxiety, as the elimination of
item review prevents someone from going back to the first few items. If they answered all those items
incorrectly due to severe test anxiety, the test cannot correct itself.
Requirements: CAT require large sample size and extensive expertise. Item exposure: CATs are designed
to select the best items in the bank, and these items often become overexposed if a control algorithm is
not implemented.
No review: CATs rarely allow for examinees to return to items already administered as the CAT has since
adapted and it cannot updated.
Public relation: because of the complexity and the departure from the familiarity of the traditional exam
paradigm, an organization must put forth more effort into public relations, explaining CAT and the
reasons for using it.
19. Tests based on CAT
Test of English as a Foreign Language (TOEFL)
Graduate Record Examinations (GRE)
Graduate Management Admission Test (GMAT)
Scholastic Aptitude Test
Microsoft’s qualifications
Computerized Adaptive Assessment of Personality Disorder: Introducing the CAT–PD Project
20. Who is Using CAT?
U.S. military has pioneered basic and applied research in CATs. One step in this research program is the
development of a computerized version of the Armed Services Vocational Aptitude Battery
(ASVAB). Administered to roughly a half million applicants each year, the paper-and-pencil version of
the ASVAB takes three hours to complete while the experimental CAT version takes about 90 minutes.
Another test developed by military research laboratories -- the Computerized Adaptive Screening
Test (CAST) -- was implemented in 1984. CAST was the first nationwide use of CAT. This 15-minute
screening test gives prospects a quick but accurate estimate of their chances of passing the full ASVAB
and of qualifying for enlistment bonuses.
21. Who is Using CAT?
Two public school systems are forerunners in using CATs in the educational arena.
In Portland (OR) Public Schools, CATs have been well received by examinees, test administrators, and
test users.
Montgomery County (MD) Public Schools has asked for approval from the State Board of Education to
make its mathematics and reading CATs available to students as an alternative to the state-sponsored
high school graduation examinations.
22. Who is Using CAT?
Smarter Balanced Assessment Consortium
http://www.smarterbalanced.org/
Partnership for Assessment of Readiness for College and Careers (PARCC)
https://parcc-assessment.org/
23. Businesses Involved in CAT
Assessment Systems Corporation markets the MicroCAT system, which runs on IBM-PCs and compatibles.
MicroCAT is a complete authoring and administration system and includes routines for item analysis and
item-pool development. The Montgomery County Schools CAT program is based on MicroCAT.
WICAT markets software to support CAT developers, a battery of 45 tests, and custom CAT computer
systems. Schools use Wicat's battery of CATs to screen and identify gifted and talented students.
The American Institutes for Research recently completed a major revision of the Army's Computerized
Applicant Screening Test (CAST). The CAST item pool was expanded, fairness analyses were conducted, item
selection procedures were modified to increase accuracy at key points, and the feedback provided to
examinees and recruiters was significantly improved.
24. Businesses involved in CAT
The Psychological Corporation markets a CAT version of the popular Differential Aptitude Test (DAT) to
junior and senior high schools. It has versions for the IBM-PC and Apple II computers.
American College Testing Program (ACT) is working on several computerized adaptive tests. ACT is
developing training CATs for the Marine Corps and for college placement mathematics. It is also
researching the development of a multidimensional CAT.
The Educational Testing Service is working with the College Entrance Examination Board to develop and
refine a CAT to aid in college placement. An initial version of the system is being used by about 20
colleges across the country.
25. CAT In Pakistan
Measuring risk literacy: The Berlin Numeracy Test
Cokely, E.T., Galesic, M., Schulz, E., Ghazal, S., & Garcia-Retamero, R. (2012).
Measuring risk literacy: The Berlin Numeracy Test. Judgment and Decision
Making, 7, 25-47.
26. Resources
International Association for Computerized Adaptive Testing
http://www.iacat.org/
Concerto: Open-source CAT Platform
https://www.psychometrics.cam.ac.uk/newconcerto
27. References
Weiss, D. J.; Kingsbury, G. G. (1984). "Application of computerized adaptive
testing to educational problems". Journal of Educational Measurement. 21 (4):
361–375. doi:10.1111/j.1745-3984.1984.tb01040.x.
Thissen, D., & Mislevy, R.J. (2000). Testing Algorithms. In Wainer, H. (Ed.)
Computerized Adaptive Testing: A Primer. Mahwah, NJ: Lawrence Erlbaum
Associates.
Kaya-Carton, E., Carton, A. S. & Dandonoli, P. (1991). Developing a computer-
adaptive test of French reading proficiency. In P. Dunkel (Ed.), Computer-
assisted language learning and testing: Research issues and practice (pp. 259-
84) .New York: Newbury House.
Burston, J. & Monville-Burston, M. (1995). Practical design and implementation
considerations of a computer-adaptive foreign language test: The Monash/
Melbourne French, CALICO Journal, 13(1), 26-46.CAT