1. A Method for Automated D etection of P hishing
Websites: Through B oth S ite Characteristics and
Image Analysis
Joshua S. White
Jeanna N. Matthews, PhD
2. Outline
• Problem
• Method
– Image Analysis (in detail)
• Method Verification
• Results
• Conclusion
• References
3. P roblem
• Phishing site detection
– A largely manual process
• Requires human visual review of site to
eliminate false positives / negatives
– URL's comes from actual phishing attempts
• Email, and other user report URL's
– Analysis is responsive, not proactive
5. Method
• For rapid proof of concept
– Data collected using the 140Dev php script
and MySQL schema
• Page characteristics collected using PHP for
DOM object parsing
– Links, Images, Forms, Iframes, Meta Tags
6. Image Analysis
• Collected using headless web-browser
– CutyCapt, XVFB-RUN
• Hashing of resultant images
– MD5Sum, SHA512, PHash
• Final choice was PHash (Perceptual Hash)
– Uses descrete cosign transformation
» Reduces Sampling Frequency
• Hamming Distance used to compare
each hash value
8. Image Analysis
• Process:
– Reduce the size of the image 32 x 32
– Reduce the color to greyscale
– Calculate the DCT (creates frequency scalars)
– Reduce the DCT to 8 x 8 pixels
– Second DCT reduction, set bits to 1 or 0 depending on
placement above or below average DCT
– Take Hash
10. R esults
• After our method was verified we concentrated
on the top 5 most spoofed sites:
• Some False Characteristic Matches:
11. Conclusion
• Phishing URL posting on social media networks
is a growing problem
• We have developed a tool that quickly and
effectively detects matches between legitimate
and spoofed sites
• Future work includes:
– Integration of our characteristic mapping and
image analysis technique into our social
media analytics toolkit