Viewing PDFs with Open Web Standards

Julian Viereck

@jviereck
+julian.viereck

Overview
5 • What is PDF.JS about
10 • How PDF is structured & processing in PDF.JS
15 • “Why are you doing this?”
5 • Firefox Integration
5 • What’s next?
15 • Demo
5 • Q &A

About me

Bespin
Firefox
ETH
?
Skywriter Zurich PDF.JS
DevTools
Ace (Physics)

PDF Viewer
using
OpenWeb
Standards

What is PDF.JS

• building faithful & efﬁcient PDF viewer
• HTML5 technology experiment
• no native code
• secure (web sandbox)
• Mozilla Labs Project - Open Source (Github)

What is PDF.JS

• Not Firefox-Speciﬁc - all modern browsers
• 1.3 MB uncompressed JS
• ~ 33`000 lines of code
• viewer in different languages
• async API

How PDF is structured
Header PDF version

sequence of objets
Body

[Objects] fonts, drawing cmds, images,
words, bookmarks, form ﬁelds
xRef Table mapping objID byte offset
Trailer root objID, xRef byte offset
PDF ﬁle root obj = ref to pages catalog

Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N) Operation

• page.startRendering(graphics)
List

• read & convert all PDF cmds ➟ OL PartialEvaluator
• load required objects (fonts, images)
• graphics.executeOperatorList(OL) CanvasGraphics

Execution Example
“get page 2” Partial
Data
Evaluator

obj#3? obj#3 = ”foo” builds
dict.x, .y? x = 20
y = 30 draw(
obj#3,
Graphics dict.x,
drawing cmds dict.y
)
draw on
canvas

Problem Processing
• Extracting data slow (compressed)
• Transform data (images) slow
• Sometimes a lot of objects on page
➡ Freezes UI
➡ Use WebWorker
➡ :( no direct memory access, postMessage

Main Web
Thread Worker

data Partial
Data Data
“get page 2” Evaluator

builds

draw(
draw(
obj#3, Op
Operation “foo”,
Graphics dict.x, List
20,
List + Data dict.y
30
)
draw on )
canvas

5 0 obj xRef, catalog, OL
+ resources PartialEvaluator
<<
/Length 8 0 R
>> setGState:

[ LW: 10 ]
stream dependency:

[ font0 ]
/GS1 gs setFont:

font0, 12
/F0 12 Tf beginText
BT moveText:

100, 700
100 700 Td showText:

“Hello World!”
(Hello World!) Tj endText
ET moveTo:

50, 600
50 600 m lineTo:

400, 600
400 600 l stroke
S
endstream
endobj Graphics

Images
• JPEG streams:
• DOMImg.src = 'data:image/jpeg;base64,'
+ window.btoa(bytesToString(bytes));
• If not JPEG stream:
• read bytes, convert to colorspace
• imgData = canvas.getImageData()
• ﬁllWithPixelData(bytes, imgData)
• canvas.putImageData(imgData)

Jpeg, but...

• no natives support for Jpeg 2000, CMYK
➡ use JS implementation
‣ works, not that performant but good enough

Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
• use CSS for loading:
@font-face { font-family:'font0';
src:url(data:font/opentype;base64, ...)
• Fonts are sanitized by browser
• Need to rebuild malformed fonts :/

“Why are you doing this?”
aka.
∃ C/C++ libraries
= isn’t that faster?

“Performance is not
the only measure”

Most vulnerable programs

Source: http://www.csis.dk/en/csis/news/3321

~ 25% crashes in Firefox
are Plugin related

4. Speed
• Rendering slower then C/C++
• BUT
• Partial downloading
• Render page in background
• Make slow become faster
• Mostly: Good enough

New API: Printing
• Printing very limited on the web right now
• no way to achieve native printing experience
• NEED: New API for printing
• mozPrintCallback
• deﬁne canvas content during printing
• send drawing commands directly to printer

Print
WebPage Single Pages

• Find print canvas on page
• Execute printCallback
• All canvas done ➠ print page
Page 2

Firefox Integration
• PDF.JS as bundled Addon in Firefox Nightly
• Getting in Release Channel is hard
• 400M users have expectations
• more testing coverage
• accessibility
• match UX expectation
• fallback if something is not working

Firefox Integration

• Try to make it till Aurora Merge (6/5)
• Firefox Speciﬁc, BUT
• improving quality browser independent
• only small parts Firefox speciﬁc

What’s next
• Fix broken PDFs
• Improve performance
• Improve Text selection
• Text search
• Form support
• Printing support

Contributing

• Lots of areas
• Translation
• Writing Code (embeddable viewer?)
• Testing (Firefox Auto-Update Addon)

Github: Readme
https://github.com/mozilla/pdf.js Issues
Wiki
Twitter:
@pdfjs
Mailing List:
https://groups.google.com/group/
mozilla.dev.pdf-js/topics
IRC:
irc.mozilla.org #pdfjs
Engineering Weekly Call:
Thursday - 10:00am PDT

Viewing PDFs with Open Web Standards

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Viewing PDFs with Open Web Standards

Ähnlich wie Viewing PDFs with Open Web Standards (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Viewing PDFs with Open Web Standards