This document summarizes a presentation about PDF.js, an open source JavaScript library for rendering PDF files in web browsers without using native code plugins. It discusses how PDF files are structured and processed, including extracting data, transforming images, loading fonts, and executing drawing commands. It also covers the project's goals of security, building a web-specific viewer, driving innovation on the web platform, and improving performance. The presenter demonstrates a live demo and discusses opportunities for contributing to the project on GitHub.
2. Overview
5 • What is PDF.JS about
10 • How PDF is structured & processing in PDF.JS
15 • “Why are you doing this?”
5 • Firefox Integration
5 • What’s next?
15 • Demo
5 • Q &A
3. About me
Bespin
Firefox
ETH
?
Skywriter Zurich PDF.JS
DevTools
Ace (Physics)
5. What is PDF.JS
• building faithful & efficient PDF viewer
• HTML5 technology experiment
• no native code
• secure (web sandbox)
• Mozilla Labs Project - Open Source (Github)
6. What is PDF.JS
• Not Firefox-Specific - all modern browsers
• 1.3 MB uncompressed JS
• ~ 33`000 lines of code
• viewer in different languages
• async API
7. How PDF is structured
Header PDF version
sequence of objets
Body
[Objects] fonts, drawing cmds, images,
words, bookmarks, form fields
xRef Table mapping objID byte offset
Trailer root objID, xRef byte offset
PDF file root obj = ref to pages catalog
9. Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N) Operation
• page.startRendering(graphics)
List
• read & convert all PDF cmds ➟ OL PartialEvaluator
• load required objects (fonts, images)
• graphics.executeOperatorList(OL) CanvasGraphics
10. Execution Example
“get page 2” Partial
Data
Evaluator
obj#3? obj#3 = ”foo” builds
dict.x, .y? x = 20
y = 30 draw(
obj#3,
Graphics dict.x,
drawing cmds dict.y
)
draw on
canvas
11. Problem Processing
• Extracting data slow (compressed)
• Transform data (images) slow
• Sometimes a lot of objects on page
➡ Freezes UI
➡ Use WebWorker
➡ :( no direct memory access, postMessage
12. Main Web
Thread Worker
data Partial
Data Data
“get page 2” Evaluator
builds
draw(
draw(
obj#3, Op
Operation “foo”,
Graphics dict.x, List
20,
List + Data dict.y
30
)
draw on )
canvas
15. Jpeg, but...
• no natives support for Jpeg 2000, CMYK
➡ use JS implementation
‣ works, not that performant but good enough
16. Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
• use CSS for loading:
@font-face { font-family:'font0';
src:url(data:font/opentype;base64, ...)
• Fonts are sanitized by browser
• Need to rebuild malformed fonts :/
17. “Why are you doing this?”
aka.
∃ C/C++ libraries
= isn’t that faster?
33. New API: Printing
• Printing very limited on the web right now
• no way to achieve native printing experience
• NEED: New API for printing
• mozPrintCallback
• define canvas content during printing
• send drawing commands directly to printer
38. Firefox Integration
• PDF.JS as bundled Addon in Firefox Nightly
• Getting in Release Channel is hard
• 400M users have expectations
• more testing coverage
• accessibility
• match UX expectation
• fallback if something is not working
39. Firefox Integration
• Try to make it till Aurora Merge (6/5)
• Firefox Specific, BUT
• improving quality browser independent
• only small parts Firefox specific
40. What’s next
• Fix broken PDFs
• Improve performance
• Improve Text selection
• Text search
• Form support
• Printing support