httpscreenshot is a tool developed internally over the past year and a half. It has become one of our go to tools for the reconnaissance phase of every penetration test. The tool itself takes a list of addresses, domains, URLs, and visits each in a browser, parses SSL certificates to add new hosts, and captures a screenshot/HTML of the browser instance. Similar tools exist but none met our needs with regards to speed (threaded), features (JavaScript support, SSL auto detection and certificate scraping), and reliability.
The cluster portion of the tool will go through and group "similar" websites together, where "similar" is determined by a fuzzy matching metric.
This tool can be used by both blue and red teams. The blue teams can use this tool to quickly create an inventory of applications and devices they have running in their environments. This inventory will allow them to quickly see if there is anything running in their environment that they may not know about which should be secured or in many cases removed.
The red teams can use this tool to quickly create the same inventory as part of our reconnaissance, which is often very effective in identifying potential target assets.
2. Outline
• Who we are
• The problem
• Our solution
• Demo
• How we’ve used it
• Q&A
3. Who We Are – Steve Breen
• Senior penetration tester
• Former “Enterprise” developer
– Current hacky script developer
• Vulnerability and exploit development hobbyist
by night
@breenmachine
breenmachine.blogspot.com
4. Who we are – Justin Kennedy
• Lifelong security hobbyist, actively for ~15 years
• Intern -> Computer Tech -> Help Desk -> SOC ->
Network Security -> Junior PT -> Senior PT ->
Team Lead
• Terrible at making slides look pretty… if you came
to see pretty slides, you may be in the wrong talk.
If you came to see an awesome tool that is
available (and is OSS) as of today, you’re in the
right place.
• @jstnkndy / juken (freenode)
6. Blue Team challenge
Let’s start off with a question…
Can you account for every device or application
on your network?
Why not?
7. Blue Team challenge reasons
1. You’ve inherited an infrastructure that you didn’t
build and (of course) not everything was documented by
your predecessor.
2. You work in an environment where business units
don't necessarily communicate as much as they
should and another business unit has spun up some
demo or test application without telling you.
3. You forgot about that old NT4 or tomcat box that no
one has touched in the past 10 years.
4. Or someone just plugged some shit into a network
jack.
8. Red Team challenge
1. We are constantly attempting to compromise
organizations that we don't know anything
about (besides our recon).
2. It's our job to identify what the target attack
surface looks like.
3. Anyone in here ever masscan a /8 for
common web ports?
4. Let’s face it, we don’t always have as much
time as we’d like on an assessment.
15. Our Solution
HTTPScreenshot/Cluster
• HTTPScreenshot: A python script to
screenshot thousands of websites really
quickly (and reliably)
• Cluster: A script to do “fuzzy matching” on
HTML pages. Produce immediately usable
output with “similar” pages grouped together.
16. HTTPScreenshot.py
• Goals: Fast, Thorough, Automagic
• Challenges: Code was hacked together during
assessments – needs some TLC
• Fun features:
– Input is nmap/masscan output
– Javascript parsed and executed
– SSL autodetect
– SSL Certificate domain scraping for vhosts
– Headless (configurable fail-over to FireFox)
– Threaded
– Saves PNG and HTML (good for grep’ing)
– Attempts tls 1.0 and falls back to sslv3 when necessary
17. Cluster.py
• Identify “similar” websites and group them together
• Displays the resulting groups in a useful way (HTML output with JS
“hoverzoom”)
• Algorithm - Reduces to DBSCAN:
– Needed a clustering algorithm that didn’t require definition of “k”
– Uses HTML tag/attr values – computes a “similarity” score for two
sites
• href, name, src, id, class, title, h1
– Works fairly well – could DEFINITELY be improved upon
• Supports “diff” reports. Sites that have been
changed/added/removed since the last scan