Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
WebBots
1. Fighting the WebBots
• A webbot is a program that visits web
sites for all kinds of purposes.
• For example, Google webbots make
copies of all web sites for their search
engines.
• The challenge is to stop malicious
webbots
6/7/2011 ITS102-12, Third Class 1
2. Webbots and Spam
• Spammers send webbots to get e-mail
accounts from sites that offer them for
free.
• How can you tell that someone who
asks for an e-mail account is a person
or a webbot?
6/7/2011 ITS102-12, Third Class 2
3. Are you a person or a bot?
• We know that there are certain things
that computers cannot do.
• Ask the “applicant” to do something that
computers cannot do.
• Cook a meal?
• Read something impossible for
computers to read!
6/7/2011 ITS102-12, Third Class 3
4. CAPTCHA
• Completely
Automated
Public
Turing test to tell
Computers and
Humans
Apart
6/7/2011 ITS102-12, Third Class 4
5. CAPTCHA
• CAPTCHA does not have to be text, but
“computer unreadable” text is
convenient.
• Alternatives include pictures.
• For example, ask if a person in a
pictures is smiling or not. What is wrong
with such a CAPTCHA
6/7/2011 ITS102-12, Third Class 5
6. How Computers Read
Optical Character Recognition (OCR)
• Step 1: Separate print (usually dark) from
background (usually light).
• Step 2: Pick up individual characters (group
of dark pixels)
• Step 3: Identify their shape by looking for
strokes, loops, corners, etc
• Step 4: Use rules to classify. For example, an
H has two vertical strokes and a short
horizontal strokes.
6/7/2011 ITS102-12, Third Class 6
7. Frustrating OCR
Separate
1 background Use messy background.
from print
Pick up
Have them blend with
2 individual
characters each other.
Find strokes,
3 loops, etc
Make the letters “wiggly”
Apply
It should be hopeless by
4 classification
rules this point.
6/7/2011 ITS102-12, Third Class 7
8. Make your own CAPTCHA
• A web site that offers you the means:
• www.codeproject.com/KB/aspnet/
CaptchaImage.aspx
• For a general tutorial see:
• www.theopavlidis.com/technology/
captcha/tutorial.htm
6/7/2011 ITS102-12, Third Class 8
9. Some Weak CAPTCHAs
From Paypal
From Yahoo’s
Briefcase
6/7/2011 ITS102-12, Third Class 9
10. Some CAPTCHAs that may be
too hard for people
From Yahoo:
From Passport:
6/7/2011 ITS102-12, Third Class 10
13. Non Text CAPTCHAs
• Use pictures as CAPTCHAs
• Plus: There are very tough to break
• Minus:
– Need to label a huge number of pictures.
– If we use few pictures the webbot can just
keep guessing.
6/7/2011 ITS102-12, Third Class 13
14. Synthetic Pictures
(an idea by M. Kaplan)
Please click on or enter each
letter corresponding to the
following list in the field
below. You must enter them
in the exact sequence listed.
C K
6/7/2011 ITS102-12, Third Class 14