SlideShare ist ein Scribd-Unternehmen logo
1 von 57
Downloaden Sie, um offline zu lesen
PDF:
Myths vs Facts
a Digital Preservation Coalition online event
Preserving Documents Forever: When is a PDF not a PDF?
Ange Albertini
Oxford University, 15th July 2015
Ange Albertini
reverse engineering &
visual documentation
@angealbertini
ange@corkami.com
http://www.corkami.com
Disclaimer:
this is my first digipres event
I come here with a very different perspective:
I might sound pessimistic (or provocative/killjoy)…
Give me hope, give me peace on earth ;)
I might be entirely wrong - please let me know!
I used to think:
“PDF is perfect”
Complex documents,
yet uniform rendering on any system
(no wonder it’s omnipresent)
⇒ I believed the myth...
Professionally,
I analyse PDFs
Malware, security
(It originally happened by “accident”,
but I’ve been doing it since then…)
I created fact sheets
about PDF
I gave
presentations
about PDF
Personally,
I play with PDF
proactive, and fun
Yes, I write PDFs by hand...
[...and I open them in hex editors]
%PDF-1.
1 0 obj
<< /Kids [<<
/Parent 1 0 R
/Resources <<>>
/Contents 2 0 R
>>]
>>
2 0 obj
<<>>
stream
BT
/F1 110 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
trailer <<
/Root << /Pages 1 0 R >>
>>
...like this one
truncated signature
missing parent /Type
/Kids should be indirect
missing /Font
missing kid /Type
missing /Count
missing endobj
missing /Length
missing xref
/Root should be indirect, missing /Size, missing root /Type
missing startxref, %%EOF
%PDF-1.
1 0 obj
<< /Kids [<<
/Parent 1 0 R
/Resources <<>>
/Contents 2 0 R
>>]
>>
2 0 obj
<<>>
stream
BT
/F1 110 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
trailer <<
/Root << /Pages 1 0 R >>
>>
It’s not
standard...
INVALID?
...but it works
exactly as planned!
(without any reported error)
ACCEPTED!
Binary art
PDF + creativity = … ?
the slides for my talk at 44Con
are distributed as a file that is
simultaneously
a PDF and a PE (a PDF viewer)
so that the slides can view themselves
(oh, and it’s also HTML + Java)...
PDF slides
PDF viewer
...and it’s also schizophrenic (PDF documents appear different with different readers)
(Also available in PDF/A flavour)
NES Music
Super NES
Megadrive
What you see is not always what you print - when you use Layers [Optional Content Groups]!
Fun fact: you can’t change the printing output with Adobe Reader ;)
JPEG + ZIP + PDF Chimera (3 headers but only 1 image data)
PDFLaTeX quine (the document is its own source)
JPEG-encoded JavaScript
(deprecated)
script == picture
PoC||GTFO
International Journal of Proof-of-Concept or Get The F*** Out
the “new” 2600 / Phrack...
Distributed as PDF ⇒ each issue is a PoC
MBR (bootable)
+ PDF + ZIP
raw audio +
JPG + AES(PNG)
+ PDF + ZIP
TrueCrypt
+ PDF + ZIP
Flash + bootable ISO + PDF + ZIP
$ unzip -l pocorgtfo06.pdf
Archive: pocorgtfo06.pdf
warning [pocorgtfo06.pdf]: 10672929 extra bytes at...
(attempting to process anyway)
Length Date Time Name
--------- ---------- ----- ----
4095 11/24/2014 23:44 64k.txt
818941 08/18/2014 23:28 acsac13_zaddach.pdf
4564 10/05/2014 00:06 burn.txt
342232 11/24/2014 23:44 davinci.tgz.dvs
3785 11/24/2014 23:44 davinci.txt
5111 09/28/2014 21:05 declare.txt
0 08/23/2014 19:21 ecb2/
TAR + PDF + ZIP
$ tar -tvf pocorgtfo06.pdf
-rw-r--r-- Manul/Laphroaig 0 2014-10-06 21:33 %PDF-1.5
-rw-r--r-- Manul/Laphroaig 525849 2014-10-06 21:33 1.png
-rw-r--r-- Manul/Laphroaig 273658 2014-10-06 21:33 2.bmp
$ unzip -l pocorgtfo07.pdf
Archive: pocorgtfo07.pdf
******* PWNED ********
dumping credentials...
**********************
Length EAs ACLs Date Time Name
-------- --- ---- ---- ---- ----
6325 0 0 02/02/15 20:56 500miles.txt
0 0 0 19/03/15 15:51 abusing_file_formats/
370375 0 0 06/03/15 21:51 abusing_file_formats/3in1.png
512 0 0 06/03/15 21:51 abusing_file_formats/abstract.tar
BPG + HTML (incl. a BPG viewer in JS)
+ PDF + ZIP
$ unzip -l pocorgtfo08.pdf
Archive: pocorgtfo08.pdf
Length EAs ACLs Date Time Name
-------- --- ---- ---- ---- ----
988446 0 0 08/06/15 22:46 ECCpolyglots.pdf
440648 0 0 09/06/15 20:36 airtel-injection.tar.bz2
522633 0 0 09/06/15 19:18 airtel.png
1546 0 0 08/06/15 22:46 alexander.txt
118696 0 0 08/06/15 22:46 browsersec.zip
31337 0 0 08/06/15 22:46 exploit2.txt
38109 0 0 08/06/15 22:46 geer.langsec.21v15.txt
303926 0 0 08/06/15 22:46 ifthisgoeson.txt
160225 0 0 08/06/15 22:46 jt65.pdf
3149 0 0 08/06/15 22:46 leehseinloong.cpp
2244652 0 0 08/06/15 22:46 madelinek.wav
Shell script + PDF + ZIP
$ echo "terrible raccoons achieve their escapades" | ./pocorgtfo08.pdf -d 4321
good neighbors secure their communications
… and others
Bootable quine in assembly,
2 switchable PDFs via ROT13,
hash collisions,
GameBoy + Sega Master System...
You get the idea...
The worst case for preservation?
I explore corner cases, before attackers do it
How is it possible?
● signature offset not enforced
● stream object (containing anything)
● comments can contain binary data
● appended data
● objects tolerated between XREF and startxref
and a few specific abuses (some are fixed now)
What is PDF ?
I asked online...
...and I wasn’t disappointed :)
Postscript Derived Failure
Practically Destructive File
Paper Dimensions Fixed
Polyglot (Definition|Deployment|Delivery) Framework
Posterity Depends on Forensics
Please Don't Fail / Again
Proven Dysfunctional Format
POC||GTFO Demonstration Format
Penile Dysfunction Format
Postscript Didn't Fit
Pants-Down Format
Pathetic & Dangerous Format
Posthoc Depression Format
Proprietary Document Fee
Public Domain Farce
Penetrate Dodgy Firewall
Pretty Demented Format
Payload Deployment File
Perpetually Disagreeable Format
Potential Disaster Forever
Perversely Designed Format
PDF is a Disaster for the Future
Preservation Dooming Format
Preserving Document Forever
More seriously...
(from my personal point of view)
A miracle?
Fonts are embedded in the document
Rendering is following complex rules
(overly-complex, from a security standpoint)
An open format?
ISO $pec$ = 200$
These specs only cover the main part :(
They are unclear - no formal guarantee :(
http://www.iso.org/iso/catalogue_detail.htm?csnumber=51502
A strict format ?
No reader completely enforces the specs
⇒ recovery mode (sometimes ‘explicit’)
signature, stream length, XREF…
Many possible malformations handled specifically by each reader (high level)...
standard structure
(each object should be distinct)
non-standard but tolerated structure
(inlined objects)
Many possible abuses
signature endobj /Count
text
operators
/Font font use xref /Resources trailer
Adobe
Reader
MuPDF
PDF.js
PDFium
Poppler
… different readers have different tolerances ...
follows the specs
corruption tolerated
absence tolerated
...so a PDF specifically crafted for one reader, may fail with all other readers.
A uniform format?
Many free readers, but…
● Many (useful) features only available in Adobe
Reader:
forms, signature, layers…
(it’s Adobe’s business model)
● Other readers just aim to support “standard”
PDFs
A beautiful mess!
(an artist's interpretation)
A consistent format?
Adobe Reader is closing security issues.
This is good, but...
⇒ Some features are not supported anymore
⇒ Potential lack of backward compatibility
It’s a complex patchwork!
JPGs are stored entirely as-is, but PNG have to be converted to raw
Forms as XML
PostScript Transfer function
Web (Flash, JavaScript...)
3D objects
A coherent format?
- text + line comments, yet binary
- unusual whitespace, binary also in comments
- different escaping
- read forward+no separator and object reference
- hex as nibbles and odd-numbered
- bottom up but also possibly top down (who wins?)
- corrupted ZLIB still tolerated
- image compression for non-images
What if...
...Adobe would stop supporting PDF ?
We’re just left with the ‘specs’ ?
After all...
...Flash is being killed for security reasons,
after becoming progressively redundant.
PDF could be converted to something else.
PDF & preservation
● JPG + OCR’ed text = simple
...so simple that we wouldn’t need PDF ?
other PDFs = complex (Adobe-dependent)
Is PDF/A the solution? more $pec$
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=38920
“Backward compatibility”
...is a beautiful utopia!
And it leads to saying
“we've always done it this way"
even after several generations :(
“Backward compatibility”
...can be incompatible with security fixes
JPEG-encoded JavaScript
PDF polyglots
Brace yourself...
PDF 2.0 is coming!
It’s not improving stability and preservability
Will Adobe adhere to it ? Since it’s distinct now…
*https://www.youtube.com/watch?v=wGmcTf-uMrE
Conclusion
“a complex puzzle because the original picture is messy”
Conclusion
● PDF is very useful - omnipresent for a reason
● it’s still involved in computer security
○ recent complete takeover of Windows 8.1 by @j00ru
● it’s quite a monster
○ I’m merely scratching the surface
○ its specs were messy from the beginning
● it’s far from perfect
○ “if only Adobe Reader was open”
*https://www.youtube.com/watch?v=FVBSvjYQgq8
ACK
Paul Wheatley
@doegox @pdfkunfoo @newsoft
@internot @insertscript @avlidienbrunn
@foxgrrl @chrisjohnriley @travisgoodspeed
and everybody for the PDF suggestions :)
PDFs:
myths vs facts
corkami.com
@angealbertini
Hail to the king, baby!

Weitere ähnliche Inhalte

Was ist angesagt?

Lisp in a Startup: the Good, the Bad, and the Ugly
Lisp in a Startup: the Good, the Bad, and the UglyLisp in a Startup: the Good, the Bad, and the Ugly
Lisp in a Startup: the Good, the Bad, and the UglyVsevolod Dyomkin
 
Code quality. Patch quality
Code quality. Patch qualityCode quality. Patch quality
Code quality. Patch qualitymalcolmt
 
Multilingual sites in plone
Multilingual sites in ploneMultilingual sites in plone
Multilingual sites in ploneRamon Navarro
 
The Go features I can't live without, 2nd round
The Go features I can't live without, 2nd roundThe Go features I can't live without, 2nd round
The Go features I can't live without, 2nd roundRodolfo Carvalho
 
Why I Love Python
Why I Love PythonWhy I Love Python
Why I Love Pythondidip
 
GDG Helwan Introduction to python
GDG Helwan Introduction to pythonGDG Helwan Introduction to python
GDG Helwan Introduction to pythonMohamed Hegazy
 
The Go features I can't live without
The Go features I can't live withoutThe Go features I can't live without
The Go features I can't live withoutRodolfo Carvalho
 
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++David Beazley (Dabeaz LLC)
 
What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)wesley chun
 
Learning Python from Data
Learning Python from DataLearning Python from Data
Learning Python from DataMosky Liu
 
Intro to Python Workshop San Diego, CA (January 19, 2013)
Intro to Python Workshop San Diego, CA (January 19, 2013)Intro to Python Workshop San Diego, CA (January 19, 2013)
Intro to Python Workshop San Diego, CA (January 19, 2013)Kendall
 
Packer Genetics: The selfish code
Packer Genetics: The selfish codePacker Genetics: The selfish code
Packer Genetics: The selfish codejduart
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caringsporst
 
Introduction to programming with python
Introduction to programming with pythonIntroduction to programming with python
Introduction to programming with pythonPorimol Chandro
 

Was ist angesagt? (20)

Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
NLP Project Full Circle
NLP Project Full CircleNLP Project Full Circle
NLP Project Full Circle
 
Lisp in a Startup: the Good, the Bad, and the Ugly
Lisp in a Startup: the Good, the Bad, and the UglyLisp in a Startup: the Good, the Bad, and the Ugly
Lisp in a Startup: the Good, the Bad, and the Ugly
 
Code quality. Patch quality
Code quality. Patch qualityCode quality. Patch quality
Code quality. Patch quality
 
HackIM 2012 CTF Walkthrough
HackIM 2012 CTF WalkthroughHackIM 2012 CTF Walkthrough
HackIM 2012 CTF Walkthrough
 
Multilingual sites in plone
Multilingual sites in ploneMultilingual sites in plone
Multilingual sites in plone
 
The Go features I can't live without, 2nd round
The Go features I can't live without, 2nd roundThe Go features I can't live without, 2nd round
The Go features I can't live without, 2nd round
 
Why I Love Python
Why I Love PythonWhy I Love Python
Why I Love Python
 
GDG Helwan Introduction to python
GDG Helwan Introduction to pythonGDG Helwan Introduction to python
GDG Helwan Introduction to python
 
The Go features I can't live without
The Go features I can't live withoutThe Go features I can't live without
The Go features I can't live without
 
A Tour of Go - Workshop
A Tour of Go - WorkshopA Tour of Go - Workshop
A Tour of Go - Workshop
 
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
 
What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)
 
Start dart
Start dartStart dart
Start dart
 
Learning Python from Data
Learning Python from DataLearning Python from Data
Learning Python from Data
 
Intro to Python Workshop San Diego, CA (January 19, 2013)
Intro to Python Workshop San Diego, CA (January 19, 2013)Intro to Python Workshop San Diego, CA (January 19, 2013)
Intro to Python Workshop San Diego, CA (January 19, 2013)
 
Dart
DartDart
Dart
 
Packer Genetics: The selfish code
Packer Genetics: The selfish codePacker Genetics: The selfish code
Packer Genetics: The selfish code
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
 
Introduction to programming with python
Introduction to programming with pythonIntroduction to programming with python
Introduction to programming with python
 

Ähnlich wie PDF: myths vs facts

The challenges of file formats
The challenges of file formatsThe challenges of file formats
The challenges of file formatsAnge Albertini
 
Schizophrenic files v2
Schizophrenic files v2Schizophrenic files v2
Schizophrenic files v2Ange Albertini
 
Improving file formats
Improving file formatsImproving file formats
Improving file formatsAnge Albertini
 
Messing with binary formats
Messing with binary formatsMessing with binary formats
Messing with binary formatsAnge Albertini
 
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...Area41
 
PDF - Secrets - 140519092839-phpapp01
PDF - Secrets - 140519092839-phpapp01PDF - Secrets - 140519092839-phpapp01
PDF - Secrets - 140519092839-phpapp01dumbfuckery
 
ePubs-RollYourOwn(for_supercon2012)
ePubs-RollYourOwn(for_supercon2012)ePubs-RollYourOwn(for_supercon2012)
ePubs-RollYourOwn(for_supercon2012)windsordi
 
File Polyglottery; or This Proof of Concept is Also a Picture of Cats
File Polyglottery; or This Proof of Concept is Also a Picture of CatsFile Polyglottery; or This Proof of Concept is Also a Picture of Cats
File Polyglottery; or This Proof of Concept is Also a Picture of CatsEvan Sultanik
 
Infrastructure as code might be literally impossible
Infrastructure as code might be literally impossibleInfrastructure as code might be literally impossible
Infrastructure as code might be literally impossibleice799
 
Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2ice799
 
What every C++ programmer should know about modern compilers (w/ comments, AC...
What every C++ programmer should know about modern compilers (w/ comments, AC...What every C++ programmer should know about modern compilers (w/ comments, AC...
What every C++ programmer should know about modern compilers (w/ comments, AC...Sławomir Zborowski
 
Ruby in the Browser - RubyConf 2011
Ruby in the Browser - RubyConf 2011Ruby in the Browser - RubyConf 2011
Ruby in the Browser - RubyConf 2011Ilya Grigorik
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerTreasure Data, Inc.
 
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationFlink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationMatthew Ring
 
Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?Docker, Inc.
 
Multimedia on the web - HTML5 video and audio
Multimedia on the web - HTML5 video and audioMultimedia on the web - HTML5 video and audio
Multimedia on the web - HTML5 video and audioChristian Heilmann
 
JS Fest 2019. Ryan Dahl. Deno, a new way to JavaScript
JS Fest 2019. Ryan Dahl. Deno, a new way to JavaScriptJS Fest 2019. Ryan Dahl. Deno, a new way to JavaScript
JS Fest 2019. Ryan Dahl. Deno, a new way to JavaScriptJSFestUA
 
Design and Evolution of cyber-dojo
Design and Evolution of cyber-dojoDesign and Evolution of cyber-dojo
Design and Evolution of cyber-dojoJon Jagger
 

Ähnlich wie PDF: myths vs facts (20)

The challenges of file formats
The challenges of file formatsThe challenges of file formats
The challenges of file formats
 
Schizophrenic files v2
Schizophrenic files v2Schizophrenic files v2
Schizophrenic files v2
 
Improving file formats
Improving file formatsImproving file formats
Improving file formats
 
Messing with binary formats
Messing with binary formatsMessing with binary formats
Messing with binary formats
 
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
 
Schizophrenic files
Schizophrenic filesSchizophrenic files
Schizophrenic files
 
PDF - Secrets - 140519092839-phpapp01
PDF - Secrets - 140519092839-phpapp01PDF - Secrets - 140519092839-phpapp01
PDF - Secrets - 140519092839-phpapp01
 
ePubs-RollYourOwn(for_supercon2012)
ePubs-RollYourOwn(for_supercon2012)ePubs-RollYourOwn(for_supercon2012)
ePubs-RollYourOwn(for_supercon2012)
 
File Polyglottery; or This Proof of Concept is Also a Picture of Cats
File Polyglottery; or This Proof of Concept is Also a Picture of CatsFile Polyglottery; or This Proof of Concept is Also a Picture of Cats
File Polyglottery; or This Proof of Concept is Also a Picture of Cats
 
Infrastructure as code might be literally impossible
Infrastructure as code might be literally impossibleInfrastructure as code might be literally impossible
Infrastructure as code might be literally impossible
 
Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2
 
What every C++ programmer should know about modern compilers (w/ comments, AC...
What every C++ programmer should know about modern compilers (w/ comments, AC...What every C++ programmer should know about modern compilers (w/ comments, AC...
What every C++ programmer should know about modern compilers (w/ comments, AC...
 
Ruby in the Browser - RubyConf 2011
Ruby in the Browser - RubyConf 2011Ruby in the Browser - RubyConf 2011
Ruby in the Browser - RubyConf 2011
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationFlink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data Constellation
 
Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?
 
Multimedia on the web - HTML5 video and audio
Multimedia on the web - HTML5 video and audioMultimedia on the web - HTML5 video and audio
Multimedia on the web - HTML5 video and audio
 
JS Fest 2019. Ryan Dahl. Deno, a new way to JavaScript
JS Fest 2019. Ryan Dahl. Deno, a new way to JavaScriptJS Fest 2019. Ryan Dahl. Deno, a new way to JavaScript
JS Fest 2019. Ryan Dahl. Deno, a new way to JavaScript
 
Introduction To JSFL
Introduction To JSFLIntroduction To JSFL
Introduction To JSFL
 
Design and Evolution of cyber-dojo
Design and Evolution of cyber-dojoDesign and Evolution of cyber-dojo
Design and Evolution of cyber-dojo
 

Mehr von Ange Albertini

Technical challenges with file formats
Technical challenges with file formatsTechnical challenges with file formats
Technical challenges with file formatsAnge Albertini
 
Relations between archive formats
Relations between archive formatsRelations between archive formats
Relations between archive formatsAnge Albertini
 
Abusing archive file formats
Abusing archive file formatsAbusing archive file formats
Abusing archive file formatsAnge Albertini
 
You are *not* an idiot
You are *not* an idiotYou are *not* an idiot
You are *not* an idiotAnge Albertini
 
An introduction to inkscape
An introduction to inkscapeAn introduction to inkscape
An introduction to inkscapeAnge Albertini
 
Exploiting hash collisions
Exploiting hash collisionsExploiting hash collisions
Exploiting hash collisionsAnge Albertini
 
Connecting communities
Connecting communitiesConnecting communities
Connecting communitiesAnge Albertini
 
TASBot - the perfectionist
TASBot - the perfectionistTASBot - the perfectionist
TASBot - the perfectionistAnge Albertini
 
Preserving arcade games - 31c3
Preserving arcade games -  31c3Preserving arcade games -  31c3
Preserving arcade games - 31c3Ange Albertini
 
Preserving arcade games
Preserving arcade gamesPreserving arcade games
Preserving arcade gamesAnge Albertini
 
Hide Android applications in images
Hide Android applications in imagesHide Android applications in images
Hide Android applications in imagesAnge Albertini
 
Let's play with crypto! v2
Let's play with crypto! v2Let's play with crypto! v2
Let's play with crypto! v2Ange Albertini
 
Malicious Hashing: Eve’s Variant of SHA-1
Malicious Hashing: Eve’s Variant of SHA-1Malicious Hashing: Eve’s Variant of SHA-1
Malicious Hashing: Eve’s Variant of SHA-1Ange Albertini
 

Mehr von Ange Albertini (20)

Technical challenges with file formats
Technical challenges with file formatsTechnical challenges with file formats
Technical challenges with file formats
 
Relations between archive formats
Relations between archive formatsRelations between archive formats
Relations between archive formats
 
Abusing archive file formats
Abusing archive file formatsAbusing archive file formats
Abusing archive file formats
 
TimeCryption
TimeCryptionTimeCryption
TimeCryption
 
You are *not* an idiot
You are *not* an idiotYou are *not* an idiot
You are *not* an idiot
 
KILL MD5
KILL MD5KILL MD5
KILL MD5
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
 
Beyond your studies
Beyond your studiesBeyond your studies
Beyond your studies
 
An introduction to inkscape
An introduction to inkscapeAn introduction to inkscape
An introduction to inkscape
 
Exploiting hash collisions
Exploiting hash collisionsExploiting hash collisions
Exploiting hash collisions
 
Infosec & failures
Infosec & failuresInfosec & failures
Infosec & failures
 
Connecting communities
Connecting communitiesConnecting communities
Connecting communities
 
TASBot - the perfectionist
TASBot - the perfectionistTASBot - the perfectionist
TASBot - the perfectionist
 
Hacks in video games
Hacks in video gamesHacks in video games
Hacks in video games
 
Preserving arcade games - 31c3
Preserving arcade games -  31c3Preserving arcade games -  31c3
Preserving arcade games - 31c3
 
Preserving arcade games
Preserving arcade gamesPreserving arcade games
Preserving arcade games
 
Let's talk about...
Let's talk about...Let's talk about...
Let's talk about...
 
Hide Android applications in images
Hide Android applications in imagesHide Android applications in images
Hide Android applications in images
 
Let's play with crypto! v2
Let's play with crypto! v2Let's play with crypto! v2
Let's play with crypto! v2
 
Malicious Hashing: Eve’s Variant of SHA-1
Malicious Hashing: Eve’s Variant of SHA-1Malicious Hashing: Eve’s Variant of SHA-1
Malicious Hashing: Eve’s Variant of SHA-1
 

Kürzlich hochgeladen

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

PDF: myths vs facts

  • 1. PDF: Myths vs Facts a Digital Preservation Coalition online event Preserving Documents Forever: When is a PDF not a PDF? Ange Albertini Oxford University, 15th July 2015
  • 2. Ange Albertini reverse engineering & visual documentation @angealbertini ange@corkami.com http://www.corkami.com
  • 3. Disclaimer: this is my first digipres event I come here with a very different perspective: I might sound pessimistic (or provocative/killjoy)… Give me hope, give me peace on earth ;) I might be entirely wrong - please let me know!
  • 4. I used to think: “PDF is perfect” Complex documents, yet uniform rendering on any system (no wonder it’s omnipresent) ⇒ I believed the myth...
  • 5. Professionally, I analyse PDFs Malware, security (It originally happened by “accident”, but I’ve been doing it since then…)
  • 6. I created fact sheets about PDF
  • 8. Personally, I play with PDF proactive, and fun
  • 9. Yes, I write PDFs by hand... [...and I open them in hex editors]
  • 10. %PDF-1. 1 0 obj << /Kids [<< /Parent 1 0 R /Resources <<>> /Contents 2 0 R >>] >> 2 0 obj <<>> stream BT /F1 110 Tf 10 400 Td (Hello World!) Tj ET endstream endobj trailer << /Root << /Pages 1 0 R >> >> ...like this one
  • 11. truncated signature missing parent /Type /Kids should be indirect missing /Font missing kid /Type missing /Count missing endobj missing /Length missing xref /Root should be indirect, missing /Size, missing root /Type missing startxref, %%EOF %PDF-1. 1 0 obj << /Kids [<< /Parent 1 0 R /Resources <<>> /Contents 2 0 R >>] >> 2 0 obj <<>> stream BT /F1 110 Tf 10 400 Td (Hello World!) Tj ET endstream endobj trailer << /Root << /Pages 1 0 R >> >> It’s not standard... INVALID?
  • 12. ...but it works exactly as planned! (without any reported error) ACCEPTED!
  • 13. Binary art PDF + creativity = … ?
  • 14. the slides for my talk at 44Con are distributed as a file that is simultaneously a PDF and a PE (a PDF viewer) so that the slides can view themselves (oh, and it’s also HTML + Java)... PDF slides PDF viewer
  • 15. ...and it’s also schizophrenic (PDF documents appear different with different readers)
  • 16. (Also available in PDF/A flavour)
  • 19. What you see is not always what you print - when you use Layers [Optional Content Groups]! Fun fact: you can’t change the printing output with Adobe Reader ;)
  • 20. JPEG + ZIP + PDF Chimera (3 headers but only 1 image data)
  • 21. PDFLaTeX quine (the document is its own source)
  • 23. PoC||GTFO International Journal of Proof-of-Concept or Get The F*** Out the “new” 2600 / Phrack... Distributed as PDF ⇒ each issue is a PoC
  • 25. raw audio + JPG + AES(PNG) + PDF + ZIP
  • 27. Flash + bootable ISO + PDF + ZIP
  • 28. $ unzip -l pocorgtfo06.pdf Archive: pocorgtfo06.pdf warning [pocorgtfo06.pdf]: 10672929 extra bytes at... (attempting to process anyway) Length Date Time Name --------- ---------- ----- ---- 4095 11/24/2014 23:44 64k.txt 818941 08/18/2014 23:28 acsac13_zaddach.pdf 4564 10/05/2014 00:06 burn.txt 342232 11/24/2014 23:44 davinci.tgz.dvs 3785 11/24/2014 23:44 davinci.txt 5111 09/28/2014 21:05 declare.txt 0 08/23/2014 19:21 ecb2/ TAR + PDF + ZIP $ tar -tvf pocorgtfo06.pdf -rw-r--r-- Manul/Laphroaig 0 2014-10-06 21:33 %PDF-1.5 -rw-r--r-- Manul/Laphroaig 525849 2014-10-06 21:33 1.png -rw-r--r-- Manul/Laphroaig 273658 2014-10-06 21:33 2.bmp
  • 29. $ unzip -l pocorgtfo07.pdf Archive: pocorgtfo07.pdf ******* PWNED ******** dumping credentials... ********************** Length EAs ACLs Date Time Name -------- --- ---- ---- ---- ---- 6325 0 0 02/02/15 20:56 500miles.txt 0 0 0 19/03/15 15:51 abusing_file_formats/ 370375 0 0 06/03/15 21:51 abusing_file_formats/3in1.png 512 0 0 06/03/15 21:51 abusing_file_formats/abstract.tar BPG + HTML (incl. a BPG viewer in JS) + PDF + ZIP
  • 30. $ unzip -l pocorgtfo08.pdf Archive: pocorgtfo08.pdf Length EAs ACLs Date Time Name -------- --- ---- ---- ---- ---- 988446 0 0 08/06/15 22:46 ECCpolyglots.pdf 440648 0 0 09/06/15 20:36 airtel-injection.tar.bz2 522633 0 0 09/06/15 19:18 airtel.png 1546 0 0 08/06/15 22:46 alexander.txt 118696 0 0 08/06/15 22:46 browsersec.zip 31337 0 0 08/06/15 22:46 exploit2.txt 38109 0 0 08/06/15 22:46 geer.langsec.21v15.txt 303926 0 0 08/06/15 22:46 ifthisgoeson.txt 160225 0 0 08/06/15 22:46 jt65.pdf 3149 0 0 08/06/15 22:46 leehseinloong.cpp 2244652 0 0 08/06/15 22:46 madelinek.wav Shell script + PDF + ZIP $ echo "terrible raccoons achieve their escapades" | ./pocorgtfo08.pdf -d 4321 good neighbors secure their communications
  • 31. … and others Bootable quine in assembly, 2 switchable PDFs via ROT13, hash collisions, GameBoy + Sega Master System...
  • 32. You get the idea... The worst case for preservation? I explore corner cases, before attackers do it
  • 33. How is it possible? ● signature offset not enforced ● stream object (containing anything) ● comments can contain binary data ● appended data ● objects tolerated between XREF and startxref and a few specific abuses (some are fixed now)
  • 34. What is PDF ? I asked online...
  • 35. ...and I wasn’t disappointed :) Postscript Derived Failure Practically Destructive File Paper Dimensions Fixed Polyglot (Definition|Deployment|Delivery) Framework Posterity Depends on Forensics Please Don't Fail / Again Proven Dysfunctional Format POC||GTFO Demonstration Format Penile Dysfunction Format Postscript Didn't Fit Pants-Down Format Pathetic & Dangerous Format Posthoc Depression Format Proprietary Document Fee Public Domain Farce Penetrate Dodgy Firewall Pretty Demented Format Payload Deployment File Perpetually Disagreeable Format Potential Disaster Forever Perversely Designed Format PDF is a Disaster for the Future Preservation Dooming Format Preserving Document Forever
  • 36. More seriously... (from my personal point of view)
  • 37. A miracle? Fonts are embedded in the document Rendering is following complex rules (overly-complex, from a security standpoint)
  • 38. An open format? ISO $pec$ = 200$ These specs only cover the main part :( They are unclear - no formal guarantee :( http://www.iso.org/iso/catalogue_detail.htm?csnumber=51502
  • 39. A strict format ? No reader completely enforces the specs ⇒ recovery mode (sometimes ‘explicit’) signature, stream length, XREF…
  • 40. Many possible malformations handled specifically by each reader (high level)... standard structure (each object should be distinct) non-standard but tolerated structure (inlined objects)
  • 41. Many possible abuses signature endobj /Count text operators /Font font use xref /Resources trailer Adobe Reader MuPDF PDF.js PDFium Poppler … different readers have different tolerances ... follows the specs corruption tolerated absence tolerated
  • 42. ...so a PDF specifically crafted for one reader, may fail with all other readers.
  • 43. A uniform format? Many free readers, but… ● Many (useful) features only available in Adobe Reader: forms, signature, layers… (it’s Adobe’s business model) ● Other readers just aim to support “standard” PDFs
  • 44. A beautiful mess! (an artist's interpretation)
  • 45. A consistent format? Adobe Reader is closing security issues. This is good, but... ⇒ Some features are not supported anymore ⇒ Potential lack of backward compatibility
  • 46. It’s a complex patchwork! JPGs are stored entirely as-is, but PNG have to be converted to raw Forms as XML PostScript Transfer function Web (Flash, JavaScript...) 3D objects
  • 47. A coherent format? - text + line comments, yet binary - unusual whitespace, binary also in comments - different escaping - read forward+no separator and object reference - hex as nibbles and odd-numbered - bottom up but also possibly top down (who wins?) - corrupted ZLIB still tolerated - image compression for non-images
  • 48. What if... ...Adobe would stop supporting PDF ? We’re just left with the ‘specs’ ?
  • 49. After all... ...Flash is being killed for security reasons, after becoming progressively redundant. PDF could be converted to something else.
  • 50. PDF & preservation ● JPG + OCR’ed text = simple ...so simple that we wouldn’t need PDF ? other PDFs = complex (Adobe-dependent) Is PDF/A the solution? more $pec$ http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=38920
  • 51. “Backward compatibility” ...is a beautiful utopia! And it leads to saying “we've always done it this way" even after several generations :(
  • 52. “Backward compatibility” ...can be incompatible with security fixes JPEG-encoded JavaScript PDF polyglots
  • 53. Brace yourself... PDF 2.0 is coming! It’s not improving stability and preservability Will Adobe adhere to it ? Since it’s distinct now… *https://www.youtube.com/watch?v=wGmcTf-uMrE
  • 54. Conclusion “a complex puzzle because the original picture is messy”
  • 55. Conclusion ● PDF is very useful - omnipresent for a reason ● it’s still involved in computer security ○ recent complete takeover of Windows 8.1 by @j00ru ● it’s quite a monster ○ I’m merely scratching the surface ○ its specs were messy from the beginning ● it’s far from perfect ○ “if only Adobe Reader was open” *https://www.youtube.com/watch?v=FVBSvjYQgq8
  • 56. ACK Paul Wheatley @doegox @pdfkunfoo @newsoft @internot @insertscript @avlidienbrunn @foxgrrl @chrisjohnriley @travisgoodspeed and everybody for the PDF suggestions :)