SlideShare ist ein Scribd-Unternehmen logo
1 von 15
CANTINA
A Content-Based Approach to Detecting Phishing Web
Sites
•CANTINA is a content-based
approach.
•Examines whether the content is
legitimate or not.
•Detects phishing URLs and links.
ABSTRACT
INTRODUCTION
• Phishing
A kind of attack in which victims are tricked by
spoofed emails and fraudulent web sites into giving
up personal information
•How many phishing sites are there?
9,255 unique phishing sites were reported in June of
2006 alone
•How much phishing costs each year?
$1 billion to 2.8 billion per year
EXISTING SYSTEM
• NetCraft(Surface Characteristics)
• SpoofGuard(Surface Characteristics and
blacklist)
• Cloudmark(Blacklist )
PROPOSED SYSTEM
• Detects phishing websites
• Examines text-based content along with surface
characteristics.
• Text based content includes:
-Age of Domain.
-Known Images.
-Suspicious URL.
-Suspicious links.
 Detects phishing links in users email.
TF-IDF ALGORITHM
• Term Frequency (TF)
–The number of times a given term appears
in a specific document
–Measure of the importance of the term
within the particular document
• Inverse Document Frequency (IDF)
–Measure how common a term is across an
entire collection of documents
• High TF-IDF weight means High TF
REAL EBAY WEBPAGE
FAKE EBAY WEBPAGE
MODULES
• Parsing the web pages
• Generating the lexical signature
• Testing Process
• Report Generation
Parsing the web pages
• Link, anchor tag, form tag and attachment in the
web pages is turned into corresponding Text Link,
HTML Link e.t.c.
•Done by parsing each Text
• Uses HTML Parser API
• It is used for extracting information from
HTML code
Generating the lexical signature
• TF-IDF algorithm used to generate
lexical signatures.
• Calculating the TF-IDF value for each
word in a document.
• Selecting the words with highest
value.
Testing Process
• Feed this lexical signature to a search
engine.
• Check domain name of the current
web page matches the domain name
of the N top search results.
Report Generation
• If a page is Legitimate it returns
“legitimate”
• If a page is phishing it returns
“phishing”
• Used to detect fraudulent websites,
emails.
•Protects from giving up personal
information like credit card numbers,
bank details, account passwords etc.
•Used to detect suspicious links in
email.
APPLICATIONS
•Content-based approach for detecting
phishing websites.
•User friendly interface for the users.
•Anti-phishing website that protects users
from giving their personal information.
CONCLUSION

Weitere ähnliche Inhalte

Ähnlich wie Cantina content based approach to detect phishing websites

Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites Nikhil Soni
 
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010Yahoo Developer Network
 
Cyberscout Corporate Security
Cyberscout   Corporate SecurityCyberscout   Corporate Security
Cyberscout Corporate SecurityFiroze Hussain
 
introduction for web connectivity (IoT)
introduction for web connectivity (IoT)introduction for web connectivity (IoT)
introduction for web connectivity (IoT)FabMinds
 
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptxChapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptxborith10b
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text MiningHemant Sharma
 
Eba ppt rajesh
Eba ppt rajeshEba ppt rajesh
Eba ppt rajeshRajeshP153
 
Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learningijtsrd
 
Automation Attacks At Scale
Automation Attacks At ScaleAutomation Attacks At Scale
Automation Attacks At ScaleMayank Dhiman
 
Identity Theft
Identity TheftIdentity Theft
Identity TheftSimpletel
 
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...Chi En (Ashley) Shen
 
An introduction to web analytics
An introduction to web analyticsAn introduction to web analytics
An introduction to web analyticsShilpa P
 
1. web technology basics
1. web technology basics1. web technology basics
1. web technology basicsJyoti Yadav
 
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...Selman Bozkır
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC
 
Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Yahoo Developer Network
 
BlueVenn: Creating and Using the 'Golden Customer Record'
BlueVenn: Creating and Using the 'Golden Customer Record'BlueVenn: Creating and Using the 'Golden Customer Record'
BlueVenn: Creating and Using the 'Golden Customer Record'Daniel Williams
 

Ähnlich wie Cantina content based approach to detect phishing websites (20)

Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
 
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
 
Cyberscout Corporate Security
Cyberscout   Corporate SecurityCyberscout   Corporate Security
Cyberscout Corporate Security
 
Web mining
Web miningWeb mining
Web mining
 
introduction for web connectivity (IoT)
introduction for web connectivity (IoT)introduction for web connectivity (IoT)
introduction for web connectivity (IoT)
 
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptxChapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
Eba ppt rajesh
Eba ppt rajeshEba ppt rajesh
Eba ppt rajesh
 
Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learning
 
Automation Attacks At Scale
Automation Attacks At ScaleAutomation Attacks At Scale
Automation Attacks At Scale
 
Identity Theft
Identity TheftIdentity Theft
Identity Theft
 
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
 
Phishing
PhishingPhishing
Phishing
 
An introduction to web analytics
An introduction to web analyticsAn introduction to web analytics
An introduction to web analytics
 
1. web technology basics
1. web technology basics1. web technology basics
1. web technology basics
 
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
 
Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010
 
BlueVenn: Creating and Using the 'Golden Customer Record'
BlueVenn: Creating and Using the 'Golden Customer Record'BlueVenn: Creating and Using the 'Golden Customer Record'
BlueVenn: Creating and Using the 'Golden Customer Record'
 
DC presentation 1
DC presentation 1DC presentation 1
DC presentation 1
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Kürzlich hochgeladen (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Cantina content based approach to detect phishing websites

  • 1. CANTINA A Content-Based Approach to Detecting Phishing Web Sites
  • 2. •CANTINA is a content-based approach. •Examines whether the content is legitimate or not. •Detects phishing URLs and links. ABSTRACT
  • 3. INTRODUCTION • Phishing A kind of attack in which victims are tricked by spoofed emails and fraudulent web sites into giving up personal information •How many phishing sites are there? 9,255 unique phishing sites were reported in June of 2006 alone •How much phishing costs each year? $1 billion to 2.8 billion per year
  • 4. EXISTING SYSTEM • NetCraft(Surface Characteristics) • SpoofGuard(Surface Characteristics and blacklist) • Cloudmark(Blacklist )
  • 5. PROPOSED SYSTEM • Detects phishing websites • Examines text-based content along with surface characteristics. • Text based content includes: -Age of Domain. -Known Images. -Suspicious URL. -Suspicious links.  Detects phishing links in users email.
  • 6. TF-IDF ALGORITHM • Term Frequency (TF) –The number of times a given term appears in a specific document –Measure of the importance of the term within the particular document • Inverse Document Frequency (IDF) –Measure how common a term is across an entire collection of documents • High TF-IDF weight means High TF
  • 9. MODULES • Parsing the web pages • Generating the lexical signature • Testing Process • Report Generation
  • 10. Parsing the web pages • Link, anchor tag, form tag and attachment in the web pages is turned into corresponding Text Link, HTML Link e.t.c. •Done by parsing each Text • Uses HTML Parser API • It is used for extracting information from HTML code
  • 11. Generating the lexical signature • TF-IDF algorithm used to generate lexical signatures. • Calculating the TF-IDF value for each word in a document. • Selecting the words with highest value.
  • 12. Testing Process • Feed this lexical signature to a search engine. • Check domain name of the current web page matches the domain name of the N top search results.
  • 13. Report Generation • If a page is Legitimate it returns “legitimate” • If a page is phishing it returns “phishing”
  • 14. • Used to detect fraudulent websites, emails. •Protects from giving up personal information like credit card numbers, bank details, account passwords etc. •Used to detect suspicious links in email. APPLICATIONS
  • 15. •Content-based approach for detecting phishing websites. •User friendly interface for the users. •Anti-phishing website that protects users from giving their personal information. CONCLUSION