More Related Content Similar to The Anatomy of Comment Spam (20) The Anatomy of Comment Spam1. © 2014 Imperva, Inc. All rights reserved.
The Anatomy of Comment Spam
Shelly Hershkovitz, Sr. Security Research Engineer, Imperva
1
2. © 2014 Imperva, Inc. All rights reserved.
Agenda
2
§ Comment Spam - What & Why?
§ Comment Spam Attacks
§ Data Analysis
§ Mitigation Techniques
§ Case Studies
§ Conclusion
§ Q&A
3. © 2014 Imperva, Inc. All rights reserved.
Shelly Hershkovitz,
Sr. Security Research Engineer, Imperva
3
§ Leads the efforts to capture and
analyze hacking activities
• Authored several Hacker Intelligence
Initiative (HII) Reports
§ Experienced in machine learning
and computer vision
§ Holds BA in Computer Science &
M.Sc degree in Bio-Medical
Engineering
4. © 2014 Imperva, Inc. All rights reserved.
Comment Spam - What & Why?
4
§ What?
• Wikipedia: ”Comment spam is a term used to refer to a broad
category of spam bot postings which abuse web-based forms to
post unsolicited advertisements as comments on forums,
blogs, wikis and online guest books.”
§ Why?
• Search engine optimization
• Advertisements
• Malware distribution
• Click fraud
5. © 2014 Imperva, Inc. All rights reserved.
Search Engine Optimization
5
MyWebSite.com
OtherWebSite.com
OtherBlog.com
OtherWebSite.com
OtherNewsWebSite.com
Backlink
Backlink
6. © 2014 Imperva, Inc. All rights reserved.
Comment Spam Attack
6
Target
Acquisition
Comment
Generation
Posting
Verification
7. © 2014 Imperva, Inc. All rights reserved.
Comment Spam in Practice
7
§ Success relies on large scales
§ Automated tools are used
§ Inputs
• The site to be promoted
• Relevant keywords
8. © 2014 Imperva, Inc. All rights reserved.
§ URL Harvesting
• Locate relevant websites
• Locate suitable URLs for commenting
§ An alternative – buy ‘Quality URLs’ lists
• A typical price is $40 for ~13,000 URLs
Target Acquisition
8
9. © 2014 Imperva, Inc. All rights reserved.
Selecting the Targets
9
Target
Selection
Relevance
Quality Difficulty
Policy
• Relevance:
Relevance to the
promoted site
• Quality:
The URL’s own search
engine ranking
• Difficulty:
The difficulty of posting
comments (Captcha)
• Policy:
The site’s policy regarding
search engine (follow/
nofollow attribute)
10. © 2014 Imperva, Inc. All rights reserved.
Target Acquisition in Action
10
11. © 2014 Imperva, Inc. All rights reserved.
§ Verbal comments attached to the promoted site
• Input keywords
Comment Generation
11
12. © 2014 Imperva, Inc. All rights reserved.
Comment Generation in Action
12
13. © 2014 Imperva, Inc. All rights reserved.
§ Post comments on many URLs
§ Authentication, CAPTCHA, or user details handling
Posting
13
15. © 2014 Imperva, Inc. All rights reserved.
§ Collect feedback whether or not the comments
were posted
Verification
15
18. © 2014 Imperva, Inc. All rights reserved.
§ 17% of the attackers generated 58% of comment
spam traffic
Data Analysis
18
19. © 2014 Imperva, Inc. All rights reserved.
§ 80% of comment spam traffic is generated by 28%
of attackers
Data Analysis
19
28.00%
Source IP
20. © 2014 Imperva, Inc. All rights reserved.
Mitigation Techniques
20
§ Content inspection
§ Source reputation
§ Anti-automation
§ Demotivation
§ Manual inspection
21. © 2014 Imperva, Inc. All rights reserved.
Mitigation Techniques: Content Inspection
21
§ Inspecting the content of the posted comments
§ Rule based
• Large number of links
• Logical sentences not related to the subject
§ Akismet
22. © 2014 Imperva, Inc. All rights reserved.
Mitigation Techniques: Source Reputation
22
§ Based on the reputation of the poster
§ Online repositories based on crowdsourcing
23. © 2014 Imperva, Inc. All rights reserved.
Mitigation Techniques: Anti-Automation
23
§ Anti-automation tools
• CAPTCHA
• Check-box for posting the
comment
• Client type classification
24. © 2014 Imperva, Inc. All rights reserved.
Mitigation Techniques: Demotivation
24
§ Make comment spam useless
§ Follow/nofollow value of the rel attribute of an HTML
anchor <A>
• Specifies whether a link should be followed by search engines
§ Penguin update for Google search engine algorithms
25. © 2014 Imperva, Inc. All rights reserved.
Mitigation Techniques: Manual Inspection
25
§ Effective but not scalable
§ Effective against manual comment spam
26. © 2014 Imperva, Inc. All rights reserved.
Case Studies
26
§ Attack Target: Specific Victim
§ Attack Source: Specific Attacking IP
§ Google App Engine
27. © 2014 Imperva, Inc. All rights reserved.
§ A non-profit organization
§ A single host with many URLs
§ Our theory associates popular phrases within the URL
address and page content, to the attack rate
Specific Victim
27
Numberof
Attacks
28. © 2014 Imperva, Inc. All rights reserved.
§ 52% of source IPs produce 80% of the traffic
Specific Victim
28
52%
Source IP
29. © 2014 Imperva, Inc. All rights reserved.
Specific Attacking IP
29
§ Comment spam posting from a specific IP
§ Rapid response (IP reputation feed) would have
significantly reduce the impact of the attack
Numberof
Attacks
30. © 2014 Imperva, Inc. All rights reserved.
§ Five target websites were attacked from this source
§ Most had suffered a relative high amount of comment
spam attacks
Specific Attacking IP
30
1
41%
2
25%
3
21%
4
11%
5
2%
Percentage of Traffic per Target
31. © 2014 Imperva, Inc. All rights reserved.
§ Hyperlinks in a single request are for different websites
§ Consecutive requests have similar hyperlinks
§ Using different URLs for the same website avoids bad
reputation
Specific Attacking IP
31
32. © 2014 Imperva, Inc. All rights reserved.
Case Studies: Google App Engine
32
§ Google App Engine can be used to spread comment
spam through proxy services
§ This technique can be used to bypass IP based
mitigations
33. © 2014 Imperva, Inc. All rights reserved.
Conclusion
33
§ Comment spam is a prosperous industry
• Many tools and services are available for comment spam
generation and distribution
§ Identifying the attacker as a comment spammer early on
and blocking its requests prevents most of the malicious
activity
• Reputation based controls are effective (IP / source application)
§ Reputation based controls must be combined with some
content based controls to avoid false positives
§ Anti-automation and bot-detection controls can reduce
the likelihood of an application becoming a target
34. © 2014 Imperva, Inc. All rights reserved.
Webinar Materials
34
Post-Webinar
Discussions
Answers to
Attendee
Questions
Webinar
Recording Link
Join Group
Join Imperva LinkedIn Group,
Imperva Data Security Direct, for…