SlideShare a Scribd company logo
1 of 14
A Method for Automated D etection of P hishing
Websites: Through B oth S ite Characteristics and
Image Analysis



Joshua S. White
Jeanna N. Matthews, PhD
Outline
 • Problem
 • Method
    – Image Analysis (in detail)
 • Method Verification
 • Results
 • Conclusion
 • References
P roblem
 • Phishing site detection
   – A largely manual process
      • Requires human visual review of site to
        eliminate false positives / negatives
   – URL's comes from actual phishing attempts
      • Email, and other user report URL's
   – Analysis is responsive, not proactive
Method (Overview)
Method
 • For rapid proof of concept
   – Data collected using the 140Dev php script
     and MySQL schema




 • Page characteristics collected using PHP for
   DOM object parsing
   – Links, Images, Forms, Iframes, Meta Tags
Image Analysis
 • Collected using headless web-browser
   – CutyCapt, XVFB-RUN
 • Hashing of resultant images
   – MD5Sum, SHA512, PHash
      • Final choice was PHash (Perceptual Hash)
         – Uses descrete cosign transformation
           » Reduces Sampling Frequency
 • Hamming Distance used to compare
  each hash value
Image Analysis
Image Analysis
• Process:
  – Reduce the size of the image 32 x 32
  – Reduce the color to greyscale
  – Calculate the DCT (creates frequency scalars)
  – Reduce the DCT to 8 x 8 pixels
  – Second DCT reduction, set bits to 1 or 0 depending on
    placement above or below average DCT
  – Take Hash
Method Verification
R esults
 • After our method was verified we concentrated
   on the top 5 most spoofed sites:




 • Some False Characteristic Matches:
Conclusion
 • Phishing URL posting on social media networks
   is a growing problem
 • We have developed a tool that quickly and
   effectively detects matches between legitimate
   and spoofed sites
 • Future work includes:
   – Integration of our characteristic mapping and
     image analysis technique into our social
     media analytics toolkit
Questions




            ?
R eferences
R eferences

More Related Content

Similar to Phishing spie 2012 presentation - jsw - d2

The high resolution web
The high resolution webThe high resolution web
The high resolution webPatric Lanhed
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problemGrokking VN
 
Web image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawlingWeb image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawlingREVEAL - Social Media Verification
 
Web image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawlingWeb image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawlingKaterina Andreadou
 
Data Collection and Consumption
Data Collection and ConsumptionData Collection and Consumption
Data Collection and ConsumptionBrian Greig
 
Data quality is more important than you think
Data quality is more important than you thinkData quality is more important than you think
Data quality is more important than you thinkAmine Bendahmane
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratchDr. Amit Sachan
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC
 
How to Develop a Genealogical Website
How to Develop a Genealogical WebsiteHow to Develop a Genealogical Website
How to Develop a Genealogical Websitewebhostingguy
 
How to eliminate ideas as soon as possible
How to eliminate ideas as soon as possibleHow to eliminate ideas as soon as possible
How to eliminate ideas as soon as possibleRoman Zykov
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
30thSep2014
30thSep201430thSep2014
30thSep2014Mia liu
 
Splunk Digital Intelligence
Splunk Digital IntelligenceSplunk Digital Intelligence
Splunk Digital IntelligenceDmitry Anoshin
 
Emotion Recognition from Frontal Facial Image
Emotion Recognition from Frontal Facial ImageEmotion Recognition from Frontal Facial Image
Emotion Recognition from Frontal Facial ImageSakib Hussain
 
WhyR? Analiza sentymentu
WhyR? Analiza sentymentuWhyR? Analiza sentymentu
WhyR? Analiza sentymentuŁukasz Grala
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern PresentationDaniel Cahall
 
Crowdsourcing the Semantic Web
Crowdsourcing the Semantic WebCrowdsourcing the Semantic Web
Crowdsourcing the Semantic WebElena Simperl
 

Similar to Phishing spie 2012 presentation - jsw - d2 (20)

The high resolution web
The high resolution webThe high resolution web
The high resolution web
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problem
 
Web image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawlingWeb image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawling
 
Web image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawlingWeb image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawling
 
Data Collection and Consumption
Data Collection and ConsumptionData Collection and Consumption
Data Collection and Consumption
 
Data quality is more important than you think
Data quality is more important than you thinkData quality is more important than you think
Data quality is more important than you think
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratch
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
 
How to Develop a Genealogical Website
How to Develop a Genealogical WebsiteHow to Develop a Genealogical Website
How to Develop a Genealogical Website
 
How to eliminate ideas as soon as possible
How to eliminate ideas as soon as possibleHow to eliminate ideas as soon as possible
How to eliminate ideas as soon as possible
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
ICIECA 2014 Paper 05
ICIECA 2014 Paper 05ICIECA 2014 Paper 05
ICIECA 2014 Paper 05
 
30thSep2014
30thSep201430thSep2014
30thSep2014
 
Splunk Digital Intelligence
Splunk Digital IntelligenceSplunk Digital Intelligence
Splunk Digital Intelligence
 
Emotion Recognition from Frontal Facial Image
Emotion Recognition from Frontal Facial ImageEmotion Recognition from Frontal Facial Image
Emotion Recognition from Frontal Facial Image
 
WhyR? Analiza sentymentu
WhyR? Analiza sentymentuWhyR? Analiza sentymentu
WhyR? Analiza sentymentu
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern Presentation
 
Advanced Analytics Overview
Advanced Analytics OverviewAdvanced Analytics Overview
Advanced Analytics Overview
 
Crowdsourcing the Semantic Web
Crowdsourcing the Semantic WebCrowdsourcing the Semantic Web
Crowdsourcing the Semantic Web
 

More from Joshua S. White, PhD josh@securemind.org

Presentation - Hybrid Sentiment Analysis Utilizing Multiple Indicators To Det...
Presentation - Hybrid Sentiment Analysis Utilizing Multiple Indicators To Det...Presentation - Hybrid Sentiment Analysis Utilizing Multiple Indicators To Det...
Presentation - Hybrid Sentiment Analysis Utilizing Multiple Indicators To Det...Joshua S. White, PhD josh@securemind.org
 
Presentation - Social Relevance Toward Understanding the Impact of the Indivi...
Presentation - Social Relevance Toward Understanding the Impact of the Indivi...Presentation - Social Relevance Toward Understanding the Impact of the Indivi...
Presentation - Social Relevance Toward Understanding the Impact of the Indivi...Joshua S. White, PhD josh@securemind.org
 
Presentation - Application of Actor Level Social Characteristic Indicator Sel...
Presentation - Application of Actor Level Social Characteristic Indicator Sel...Presentation - Application of Actor Level Social Characteristic Indicator Sel...
Presentation - Application of Actor Level Social Characteristic Indicator Sel...Joshua S. White, PhD josh@securemind.org
 
Physical Layer Optical Network Security Thesis Presentation To The CNY ISSA C...
Physical Layer Optical Network Security Thesis Presentation To The CNY ISSA C...Physical Layer Optical Network Security Thesis Presentation To The CNY ISSA C...
Physical Layer Optical Network Security Thesis Presentation To The CNY ISSA C...Joshua S. White, PhD josh@securemind.org
 

More from Joshua S. White, PhD josh@securemind.org (12)

Presentation - Hybrid Sentiment Analysis Utilizing Multiple Indicators To Det...
Presentation - Hybrid Sentiment Analysis Utilizing Multiple Indicators To Det...Presentation - Hybrid Sentiment Analysis Utilizing Multiple Indicators To Det...
Presentation - Hybrid Sentiment Analysis Utilizing Multiple Indicators To Det...
 
Presentation - Social Relevance Toward Understanding the Impact of the Indivi...
Presentation - Social Relevance Toward Understanding the Impact of the Indivi...Presentation - Social Relevance Toward Understanding the Impact of the Indivi...
Presentation - Social Relevance Toward Understanding the Impact of the Indivi...
 
Presentation - Application of Actor Level Social Characteristic Indicator Sel...
Presentation - Application of Actor Level Social Characteristic Indicator Sel...Presentation - Application of Actor Level Social Characteristic Indicator Sel...
Presentation - Application of Actor Level Social Characteristic Indicator Sel...
 
Supraja_SMS_presentation
Supraja_SMS_presentationSupraja_SMS_presentation
Supraja_SMS_presentation
 
ase-social-informatics (6)
ase-social-informatics (6)ase-social-informatics (6)
ase-social-informatics (6)
 
Social Network Analysis Applications and Approach
Social Network Analysis Applications and ApproachSocial Network Analysis Applications and Approach
Social Network Analysis Applications and Approach
 
Clarkson joshua white - ids testing - spie 2013 presentation - jsw - d1
Clarkson   joshua white - ids testing - spie 2013 presentation - jsw - d1Clarkson   joshua white - ids testing - spie 2013 presentation - jsw - d1
Clarkson joshua white - ids testing - spie 2013 presentation - jsw - d1
 
Malware bek slides 20131023 final
Malware bek slides 20131023 finalMalware bek slides 20131023 final
Malware bek slides 20131023 final
 
CSIAC - Social Media Analysis and Privacy
CSIAC - Social Media Analysis and PrivacyCSIAC - Social Media Analysis and Privacy
CSIAC - Social Media Analysis and Privacy
 
Clarkson - Joshua White - Research Proposal Presentation
Clarkson - Joshua White - Research Proposal PresentationClarkson - Joshua White - Research Proposal Presentation
Clarkson - Joshua White - Research Proposal Presentation
 
Coalmine spie 2012 presentation - jsw -d3
Coalmine   spie 2012 presentation - jsw -d3Coalmine   spie 2012 presentation - jsw -d3
Coalmine spie 2012 presentation - jsw -d3
 
Physical Layer Optical Network Security Thesis Presentation To The CNY ISSA C...
Physical Layer Optical Network Security Thesis Presentation To The CNY ISSA C...Physical Layer Optical Network Security Thesis Presentation To The CNY ISSA C...
Physical Layer Optical Network Security Thesis Presentation To The CNY ISSA C...
 

Recently uploaded

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Phishing spie 2012 presentation - jsw - d2

  • 1. A Method for Automated D etection of P hishing Websites: Through B oth S ite Characteristics and Image Analysis Joshua S. White Jeanna N. Matthews, PhD
  • 2. Outline • Problem • Method – Image Analysis (in detail) • Method Verification • Results • Conclusion • References
  • 3. P roblem • Phishing site detection – A largely manual process • Requires human visual review of site to eliminate false positives / negatives – URL's comes from actual phishing attempts • Email, and other user report URL's – Analysis is responsive, not proactive
  • 5. Method • For rapid proof of concept – Data collected using the 140Dev php script and MySQL schema • Page characteristics collected using PHP for DOM object parsing – Links, Images, Forms, Iframes, Meta Tags
  • 6. Image Analysis • Collected using headless web-browser – CutyCapt, XVFB-RUN • Hashing of resultant images – MD5Sum, SHA512, PHash • Final choice was PHash (Perceptual Hash) – Uses descrete cosign transformation » Reduces Sampling Frequency • Hamming Distance used to compare each hash value
  • 8. Image Analysis • Process: – Reduce the size of the image 32 x 32 – Reduce the color to greyscale – Calculate the DCT (creates frequency scalars) – Reduce the DCT to 8 x 8 pixels – Second DCT reduction, set bits to 1 or 0 depending on placement above or below average DCT – Take Hash
  • 10. R esults • After our method was verified we concentrated on the top 5 most spoofed sites: • Some False Characteristic Matches:
  • 11. Conclusion • Phishing URL posting on social media networks is a growing problem • We have developed a tool that quickly and effectively detects matches between legitimate and spoofed sites • Future work includes: – Integration of our characteristic mapping and image analysis technique into our social media analytics toolkit