CAPTCHA is an effective and widely used solution for preventing computer programs (i.e., bots) from performing automated but often malicious actions, such as registering thousands of free email accounts or posting advertisement on Web blogs. To make CAPTCHAs robust to automatic character recognition techniques, the text in the tests are often distorted, blurred, and obscure. At the same time, those robust tests may prevent genuine users from telling the text easily and thus distribute the cost of crime prevention among all the users. Thus, we are facing a dilemma, that is, a CAPTCHA should be robust enough so that it cannot be broken by programs, but also needs to be easy enough so that users need not to repeatedly take tests because of wrong guesses.
In this article, we attempt to resolve the dilemma by proposing a human computation game for quantifying the usability of CAPTCHAs. In our game, DevilTyper, players try to defeat as many devils as possible by solving CAPTCHAs, and player behavior in completing a CAPTCHA is recorded at the same time. Therefore, we can evaluate CAPTCHAs’ usability by analyzing collected player inputs. Since DevilTyper provides entertainment itself, we conduct a large-scale study for CAPTCHAs’ usability without the resource overhead required by traditional survey-based studies. In addition, we propose a consistent and reliable metric for assessing usability. Our evaluation results show that DevilTyper provides a fun and efficient platform for CAPTCHA designers to assess their CAPTCHA usability and thus improve CAPTCHA design.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
DevilTyper: A Game for CAPTCHA Usability Evaluation
1. Chien-Ju Ho1, Chen-Chi Wu2,
Kuan-Ta Chen1, Chin-Kuang Lai2
Presenter: Derec Wu1
1Institute of Information Science, Academia Sinica
2Department of Electrical Engineering, NationalTaiwan University
2.
3.
4. Acronym for Computer Automated Public
Turing test to tell Computers and Humans
Apart
Challenge-Response test
Require users type letters or digits from a
distorted image to distinguish humans from
computers
5. CAPTCHAs tests must be
Secure
▪ Hard for computers
▪ Prevent computer programs from performing automated
malicious tasks
Usable
▪ easy for human beings ?
7. Determine the difficulty of the CAPTCHA test
for human beings
Traditional approach
human survey
▪ cost a lot of money
▪ difficult to scale up
8. A human computation game for CAPTCHA
usability evaluation
Players are engaged to solve the problem for
us while having fun themselves
Lower monetary cost and easier to scale up
10. Each devil is attached with a CAPTCHA test
Players are required to solve the test
correctly to win the game
Player behaviors are recorded and are used to
evaluate the CAPTCHA usability
11. Players must solve the CAPTCHA before the
devil from the top reaches the bottom
Get scores by solving CAPTCHAs
Lose HP if the devil reaches the bottom
12. High score lists are maintained to encourage
players playing more
14. The following player behaviors for solving
each CAPTCHA test are collected
Finish time
Rate of typing error
Rate of giving up the test
Rate of repeated typing
Rate of failing to solve the test within time limit
16. We announced the game in a popular social
network PTT and held a four-week campaign
Total cost: US$ 30
Total number of games being played: 6,500
Total CAPTCHAs being solved: 1,407,055
18. The results of different metrics are consistent
* A-F:
different types of CAPTCHAs
*The results are normalized to
0 to 1 for comparisons
19. The DevilTyper results are consistent with
traditional survey method (MechanicalTurk)
* A-F:
different types of CAPTCHAs
*The results are normalized to
0 to 1 for comparisons
DevilTyper provides an open platform
for evaluating CAPTCHA usability
22. Three strategies for text distortion in
CoolCAPTCHA
Character Distance X-AxisWave Y-AxisWave
23. Three strategies for noise addition in
TgCAPTCHA
LongArcs Noise ShortArcs Noise Short Line Noise
24. The difficulty of recognizing each character in
differentCAPTCHA types can be determined
“i” is hardly recognizable
in TgCAPTCHA
“i” is easier to recognize
in CoolCAPTCHA
Q V
C
T
25. We proposed a human computation game,
Deviltyper, for evaluating CAPTCHA usability
Monetary cost is much lower than traditional
surveys
Evaluation is easier to scale up
We show how this open platform can be used
to help the CAPTCHA designers to design
more user-friendlyCAPTCHAs
To ensure that the response is not generated by a computer
The common procedures to generate such images often include distortions,overlapping, clipping, and noise addition. These proceduresare performed to make image recognition algorithms unableto resolve the text in the images. However, the distortion ofthe text should be controlled to a reasonable level so thathuman can still tell the text clearly.
The most intuitive way to assess the usability of CAPTCHAsis to ask numerous human subjects to solve assignedCAPTCHAs repeatedly.However, such surveys are cost prohibitiveif a large-scale study is required and the investigatedCAPTCHAs are constantly updating. For example,investigating how different background noises affect theuser perception would require a large number of user inputs,which requires significant monetary investment to conductuser studies.
DevilTyper provides an open platform for evaluating CAPTCHA usability
Character distance stands for the distance between characters.In our experiment, we randomly set the characterdistances between 0.8 and 1.3, where a larger value correspondsto a tighter character arrangement.X-axis wave controls the degree of sine-wave distortions ofcharacters along the x-axis. In the experiment, this parameteris randomly set within the range from 0.5 to 1.2, wherea larger magnitude corresponds to stronger distortion.the x-axis wave distortion does notmake a systematic influence on users’ error rate, which impliesthat this type of distortion does not harm the CAPTCHA’susability.the y-axis distortions lead to a much moresignificant impact on CAPTCHA usability than x-axis distortions.Therefore, CAPTCHA designers should be carefulin choosing the appropriate degree for this type of distortionswhen adopting such CAPTCHAs in real use.the y-axis wave controlsthe degree of sine-wave distortions of characters alongthe y-axis, which we set within the range of 0.5 and 1.2in our experiments.
TgCAPTCHA, which is similar to theprevious Microsft CAPTCHA scheme, to demonstrate howsuch analysis is done by using the traces produced by DevilTyper.==========================================Long ArcsThe long arcs parameter controls the number of long arcsoverlaid on the image, where the position, length, and curvatureof the arcs are randomly chosen. In the experiment,we set this parameter between 0 and 5. we can see that the long arcs do not influence the usabilityof the CAPTCHAs significantly even when 5 long arcs wereadded.==========================================Short ArcsSimilar to long arcs, the short arcs parameter controls thenumber of short arcs overlaid on the image. In our experiment,the number of short arcs are randomly drawn fromthe range 0 to 20. Interestingly, while long arcs do not impactthe CAPTCHA’s usability, short arcs do, as shown inFigure 13(b). We believe it is due to the length of short arcsare similar to that of the character strokes so that short arcsare more likely to interfere with distorted text and increasethe difficulty of text recognition.==========================================Short LinesThe short lines parameter controls the number of short linesoverlaid on the rendered CAPTCHA. As with long and shortarcs, the position, length, and direction of each segmentis randomly decided. Our results show that users’ averageerror rates slightly but steadily increase with more shortlines, as shown in Figure 13(c). However, the impact ofshort lines is slightly less than that of short arcs, which isreasonable because arcs are more like the strokes of distortedtext and therefore more interference on readers’ recognitionis induced.
Each CAPTCHA scheme has its own obscuration algorithmto distort the text, which may have different impactson the recognition difficulty of different characters.We believe such results provide helpful informationwhen designing and applying CAPTCHAs. One obviousapplication is that, if a user happens to correctly solve allthe characters beside a ‘C’ character with the SecurImagescheme, we may allow the user pass the test as the ‘C’ characteris really difficult to recognize with that scheme.