The document introduces Shatter, a semi-automatic tool for mapping and scanning dark web sites. It allows researchers to manually crawl sites, mark targets, and generate target maps to guide repeatable scans. Two case studies are presented where Shatter was used to automatically solve CAPTCHAs and breach the protections of fake credit card and darknet markets. The tool remains free under AGPL-3 and the authors hope it can help assess vulnerabilities while remaining essential for security research.
2. TEXT
WHO WE ARE
▸ Ken-ya YOSHIMURA (@ad3liae)
▸ Takahiro YOSHIMURA (@alterakey)
▸ Security researchers
▸ Monolith Works Inc. CEO/CTO
https://moonlithworks.co.jp/
3. TEXT
WHAT WE DO
▸ Security research and development
▸ iOS/Android Apps
→Financial, Games, IoT related, etc. (>200)
→trueseeing: Non-decompiling Android Application Vulnerability Scanner
[2017]
▸ Windows/Mac/Web/HTML5 Apps
→POS, RAD tools etc.
▸ Network/Web penetration testing
→PCI-DSS etc.
▸ Search engine reconnaissance
(aka. Google Hacking)
▸ Whitebox testing
▸ Forensic analysis
▸ Research
→Clairvoyance: concurrent lip reader [2019]
4. TEXT
WHAT WE DO
▸ CTF
▸ Enemy10, Sutegoma2
▸ METI CTFCJ 2012 Qual.: 1st
▸ METI CTFCJ 2012: 3rd
▸ DEF CON 21 CTF: 6th
▸ DEF CON 22 OpenCTF: 4th
▸ Talks:
DEF CON 25 Demo Labs
CODE BLUE 2017
DEF CON 27 AI Village etc.
DEFCON 2016 by Wiyre Media on flickr, CC-BY 2.0
5. TEXT
RELATED WORKS
▸ Web application vulnerability scanners
▸ Manual: Burp Suite, ZAP etc.
▸ Automatic: WebInspect etc.
6. TEXT
WHAT IS THE DARK WEB?
▸ Anonymized Web on (mostly) Tor
▸ Pure freedom and anarchism
▸ Hard-ish to identify users
→ CAPTCHAs are often deployed
▸ Traffic routes are randomized
→ Rather high TTLs
Onions by Mike Mozart on flickr, CC-BY 2.0
9. TEXT
PREPARATION - TRADITIONAL
▸ Manual
▸ Crawl and build data flows:
Tedious, error-prone, and not repeatable
▸ Automatic
▸ Spider:
Not so comprehensive — insufficient
coverages
10. TEXT
SHATTER: THE IN-BETWEEN BEAUTY
▸ Our answer: Shatter
▸ Semi-automatic
▸ Repeatable
▸ Comprehensive
Shattering by chiaralily on flickr, CC-BY-NC 2.0
11. TEXT
PREPARATION - SHATTER
▸ Manually crawl, mark, and map
→ “Target maps”
▸ Edit target maps and go
▸ Target maps describe scans
▸ Marked requests will be recognized as
“targets”
▸ Data flows are mostly automatically deduced
— thus semi-automatic
▸ Same map gives same scan — repeatable
Planning by Jeremy Keith on flickr, CC-BY 2.0
12. TEXT
SHATTER TARGET MAP
▸ Are terse and readable YAMLs
▸ Comprised of:
▸ Analysises: What should we do
▸ Sessions: How should we do
▸ Identities: Who should we are
▸ Targets: Whom we approach to
▸ Flows: How we deduce parameters (opt.)
▸ Exploits: What we should do on findings
13. TEXT
ATTACK PLAN / EXECUTE
▸ Data flow map
▸ Flows are wholly deduced
▸ Massive parallel scan
→combats high TTLs
▸ Scanner is ZAP-compatible
(for now)
15. TEXT
AFTERMATH
▸ Insanely old middleware
→Automatic exploitation attempt gave 500
▸ Operator identity:
“Evgenij Sokolov”,
“Bertrand Rasse”, possibly etc.
omerta.sup@gmail.com
▸ Operator works:
http://omerta.wf/ etc.
▸ cf. omerta (n)
1: a code of silence practiced by the Mafia; a refusal
to give evidence to the police about criminal activities
19. TEXT
PREPARATION - SHATTER
▸ CAPTCHA
▸ Parameters can be deduced with code-
blocks
→ NN-based solvers can be attached!
20. CAPTCHA 102
▸ Recognizing glyphs in an image
▸ Hard to solve algorithmically
▸ 3-dimensional distortion
▸ Noise
21. LEARN TO RECOGNIZE
▸ Image classification problem
▸ CNN
Convolutional Neural Networks
▸ Supervised learning model
▸ Similar to visual cortex
▸ Good at spatial pattern recog.
▸ Robust against distortions and shifts
Typical CNN architecture by Aphex34 on Wikipedia, CC-BY-SA 4.0
22. LEARN TO RECOGNIZE
▸ For 5-chars:
(10+26)5 → 107∼ patterns
▸ Cannot be solved at once
▸ Just classifiers
Typical CNN architecture by Aphex34 on Wikipedia, CC-BY-SA 4.0
23. DIVIDE AND CONQUER
▸ OpenCV2
▸ De-speckling
▸ Extracting glyphs
▸ Errors due to lack of spacing
→ignoring for now
24. BREACH PLAN
▸ OpenCV2
▸ Glyph extraction
▸ CNN
▸ Glyph classfication
Chess Teacher by JB Kilpatrick on flickr, CC-BY 2.0
25. BREACH PLAN?
▸ What should we learn?
▸ Synthesized with generators
(tag=parameters)
▸ Gathered truths
(tag=pre-coordinated truths)
Question by Florence Ivy on flickr, CC-BY-ND 2.0
26. HUMANS TO SAVE US
▸ Anti-Captcha
▸ CAPTCHA recognition service run by
humans
▸ Gathered images and tags
→Now we can learn
▸ Human powered…? but:
▸ Tedious to recon generators
▸ Of course Shatter can use AC directly
27. GRAB THEM OUT
▸ Let’s gather CAPTCHAs
▸ We need ~2000
▸ High RTT!
(2~sec..)
Grab by Rutger Tuller on flickr, CC-BY 2.0
28. GRAB THEM OUT!
▸ asyncio super-parallel grabber
→No mercy
▸ 2000 imgs / ~48s
(24ms/img)
▸ Throughputs are not so bad
29. READ THEM OUT
▸ Read 2000 CAPTCHAs
▸ Out-of-charset reads
▸ Inaccurate glyph extracts
▸ Take only good reads!
31. DIVIDE AND CONQUER
▸ Samples: 6305
▸ Should be around 10000… but
▸ Dropping glyph mis-extractions
▸ Dropping CAPTCHA mis-reads
32. RELENTLESS LEARNER
▸ CNN on Keras
▸ N×32x32x1 → 36 ([A-Z0-9])
▸ Preprocessing
▸ resize and thresholding
▸ Normalization: [0.0f .. 1.0f]
33. RELENTLESS LEARNER
▸ Keeping effective learning
▸ Small input: 32x32×1
▸ amsgrad (i.e. modified Adam)
▸ Test dataset
▸ 10% of original dataset
▸ Store the model in HDF5 format
→to continuous learning
34. LEARN TO BREAK
▸ 50 epochs → 30min.
Tensorflow 2.0 @ MBP 2017
▸ GPU?
▸ Keras uses automatically
▸ Only CUDA — MBP falls short :(
Early Learner by Aaron Freimark on flickr, CC-BY-ND 2.0
35. LEARN TO BREAK!
▸ 99% acc. (even in other datasets)
→Excellent
▸ Recognizes even Anti-Captcha fails
▸ CNN: should need 500..1000/cls
▸ 175.1/cls in reality
▸ Small dataset :(
Early Learner by Aaron Freimark on flickr, CC-BY-ND 2.0
40. TEXT
AFTERMATH (2)
▸ We have breached CAPTCHA protection for
Nightmare
(again)
▸ Their CAPTCHAs are rather weak
(again)
No lock 2 by Jens Eilers Bischoff on flickr, CC-BY 2.0
41. TEXT
FREE AS FREEDOM
▸ http://sha.tter.io/
(GitHub repos will be announced there)
▸ AGPL-3: It remains free for good
▸ Currently under heavy workings on fixes and ..
▸ We are striving to make it not only useful but
also essential
Freedom by Mochamad Arief on flickr, CC-BY-NC-ND 2.0
42. TEXT
CONCLUSION
▸ The dark web
▸ Anonymized Web
▸ Hard to name attackers
▸ CAPTCHAs are often deployed but _not_
effective!
▸ Related works are not sufficient
▸ Automatic: non-comprehensive
▸ Manual: non-repeatable
IMG_2988s by 不憂照相館 on flickr, CC-BY-NC-ND 2.0
43. TEXT
CONCLUSION
▸ Our answer: Shatter
▸ Semi-automatic
Crawl, mark, map, edit — you do
Scan — we do
▸ Repeatable
Same map gives the same scan
▸ Comprehensive
Because you crawl
▸ Beauty lies in “semi-autonomy”
Shattering by chiaralily on flickr, CC-BY-NC 2.0
44. TEXT
CONCLUSION
▸ Shatter can…
▸ Deduce params automatically, or with some
code
(solving CAPTCHAs, 2FAs, …)
▸ Fingerprint and stage attacks
▸ Actively exploit vulnerabilities
▸ Cooperate with other toolchains to deeper
analysis/exploitation
Mise en scène nocturne by Jean-François Renaud on flickr, CC-BY-ND 2.0
45. TEXT
CONCLUSION
▸ Shatter is
▸ At: http://sha.tter.io/
(GitHub repos will be announced there)
▸ Under AGPL-3: Free as freedom, for good
▸ Stay tuned!
▸ Under heavy workings on fixes and ..
▸ Should be available at 12/24/2019
Freedom by Mochamad Arief on flickr, CC-BY-NC-ND 2.0
46. TEXT
CONCLUSION
▸ For hidden service operators:
▸ CAPTCHAs are not effective
▸ Better update your stack
▸ If you do bad things, you must be prepared
to be exposed
Menace by Kilworth Simmonds on flickr, CC-BY-ND 2.0