3. Yet another talk on PDF from me?
â this one is high-level
â awareness without the hardcore details
â a new kind of leak happened ITW recently
â itâs still worth spreading the knowledge!
8. Text
â explicitly spelled in the data
â can be
â invisible
â white, invisible style, covered
â forbidden to copy/paste
â but this can be disabled instantly
â mapped to some weird unicode
but still technically there!
â it can still be extracted, often automatically
pdftotext -layout ...
10. Even if the image is not used (displayed),
the image object (and content) may still be present.
11. Images
â embedded as a dedicated object
â can be automatically extracted
â pdfimages -j -layout ...
â then referenced in pagesâ contents
â useful for multiple uses
â images can be present (and extracted)
even if not used
12. Images
â JPEG are stored as-is (the complete file)
Extra risk: leak via thumbnail, EXIF, RDF
14. Drawings
(rectangles, linesâŠ)
â the information is not trivial to extract
â can still be modified without any problem
â remove covering layers (censorship)
17. So you get a new document, showing only what you wantedâŠ
(cropme.pdf is much smaller because it was hand-written, while cropped.pdf is bloated)
$ du -b cropme.pdf cropped.pdf
595 cropme.pdf
10203 cropped.pdf
19. If you remove the âCropBoxâ, you get back the original content.
20. Importing
â Copy/paste from OSX preview
â Import via LaTeX
â âŠ?
What it actually does:
1/ imports the whole doc
(to prevent incompatibilities)
2/ adds a limiting view
Risk: the original content is still there!
21. Incremental updates
updates (even deletions) are appended,
like in Microsoft Office, etcâŠ
â âsave asâŠâ a new document to prevent it
23. Forms
â Time saver:
â type (copy/paste) your info in the doc, then print!
â you can even save the info in the doc
â this info is not stored like standard text
Risk:
you spread an updated document
containing private info!
25. Forms
â Forms are not always supported
â you wonât even get a warning!
â Content is not stored like standard text
â not as easy to extract, but still there!
Bigger risk :
Just opening the file to double-check
may be not enough!
26. The only fully reliable way ?
(the one that *NSA* usesâŠ)
27. Convert pages to pictures !
Just use Imagemagick convert
then import to a new PDF
Damn ugly, but fully reliable.
29. PDF sucks to prevent leaks
PDF is a monster for attack surface
(and metadata embedding)
No free PDF âdissectorâ
because we only focus on malware
â No solution anytime soon
(Btw, how much is worth the map of a petroleum reservoir ?)