5. But governments love PDF
Source:
http://duff-johnson.com/2014/03/10/98-percent-of-dot-com-is-html-but-38-percent-of-dot-gov-is-pdf/
Percentage of PDF files:
.org: 15%
.gov: 38%
.edu: 27%
6. Publication versus …
• No need to be self-contained
• May change over time
• Not all content produced by the author
• e.g. Advertisements
• Becoming more interactive
• e.g. Comments on a news article
10. PDF/E
engineering
Since 2008
ISO 24517
PDF/VT
printing
Since 2010
ISO 16612
PDF/X
graphic arts
Since 2001
ISO 15930
PDF/A
archive
Since 2005
ISO 19005
PDF/UA
accessibility
Since 2012
ISO 14289
PDF
Portable Document Format
First released by Adobe in 1993
ISO Standard since 2008
ISO 32000
Related: XFDF (ISO), EcmaScript (ISO), PRC (ISO), PAdES (ETSI), ZUGFeRD
An umbrella of standards:
12. Image example
Image fox = new Image(ImageFactory.getImage(FOX));
Image dog = new Image(ImageFactory.getImage(DOG));
Paragraph p = new Paragraph("The quick brown ").add(fox)
.add(" jumps over the lazy ").add(dog);
document.add(p);
17. How do we read a spider chart?
RiskManagement
StructuredFinance
Mergers&acquisitions
Governance&Internal
Control
AccountingOperations
Treasuryoperations
ManagementInformation
&BusinessDecision
Support
BusinessPlanning&
Strategy
FinanceContributiontoIT
Management
CommercialActivities
Taxation
FunctionalLeadership
20. PDF/UA (part 1)
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
Document document = new Document(pdf);
//Setting some required parameters
Pdf.setTagged();
pdf.getCatalog().setLang(new PdfString("en-US"));
pdf.getCatalog().setViewerPreferences(
new PdfViewerPreferences().setDisplayDocTitle(true));
PdfDocumentInfo info = pdf.getDocumentInfo();
info.setTitle("iText7 PDF/UA example");
//Create XMP meta data
pdf.createXmpMetadata();
21. PDF/UA (part 2)
//Fonts need to be embedded
PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p = new Paragraph();
p.setFont(font);
p.add(new Text("The quick brown "));
Image foxImage = new Image(ImageFactory.getImage(FOX));
//PDF/UA: Set alt text
foxImage.getAccessibilityProperties().setAlternateDescription("Fox");
p.add(foxImage);
p.add(" jumps over the lazy ");
Image dogImage = new Image(ImageFactory.getImage(DOG));
//PDF/UA: Set alt text
dogImage.getAccessibilityProperties().setAlternateDescription("Dog");
p.add(dogImage);
document.add(p);
document.close();
24. PDF/A
• ISO-19005
– Long-term preservation of documents
– Approved parts will never become invalid
– Individual parts define new, useful features
• Obligations and restrictions
– Metadata: ISO 16684, eXtensible Metadata Platform (XMP)
– The document must be self-contained:
• All fonts need to be embedded
• No external movie, sound or other binary files
– No JavaScript allowed
– No encryption allowed
25. Three standards
• PDF/A-1 (2005)
– based on PDF 1.4
– Level B (“basic”): visual appearance
– Level A (“accessible”): visual appearance + structural and semantic properties
(Tagged PDF)
• PDF/A-2 (2011)
– Based on ISO-32000-1
– Features introduced in PDF 1.5, 1.6, and 1.7:
• Added support for JPEG2000, Collections, object-level XMP, optional content
• Improved support for transparency, comment types and annotations, digital
signatures
– Level U (“unicode”): visual appearance + all text is in Unicode
• PDF/A-3 (2012)
– Based on PDF/A-2 with only 1 difference: attachments do not need to be PDF/A
27. PDF/A-1b example
PdfADocument pdf = new PdfADocument(new PdfWriter(dest),
PdfAConformanceLevel.PDF_A_1B, new PdfOutputIntent("Custom", "",
"http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));
Document document = new Document(pdf);
//Create XMP meta data
pdf.createXmpMetadata();
//Fonts need to be embedded
PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p = new Paragraph();
p.setFont(font);
p.add(new Text("The quick brown "));
Image foxImage = new Image(ImageFactory.getImage(FOX));
p.add(foxImage);
p.add(" jumps over the lazy ");
Image dogImage = new Image(ImageFactory.getImage(DOG));
p.add(dogImage);
document.add(p);
document.close();
29. PDF/A-1a example
PdfADocument pdf = new PdfADocument(new PdfWriter(dest),
PdfAConformanceLevel.PDF_A_1A, new PdfOutputIntent("Custom", "",
"http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));
Document document = new Document(pdf);
pdf.setTagged();
pdf.createXmpMetadata();
PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p = new Paragraph();
p.setFont(font);
p.add(new Text("The quick brown "));
Image foxImage = new Image(ImageFactory.getImage(FOX));
foxImage.getAccessibilityProperties().setAlternateDescription("Fox");
p.add(foxImage);
p.add(" jumps over the lazy ");
Image dogImage = new Image(ImageFactory.getImage(DOG));
dogImage.getAccessibilityProperties().setAlternateDescription("Dog");
p.add(dogImage);
document.add(p);
document.close();
33. United States example
part 1: initializations
PdfADocument pdf = new PdfADocument(
new PdfWriter(dest), PdfAConformanceLevel.PDF_A_3A,
new PdfOutputIntent("Custom", "", "http://www.color.org",
"sRGB IEC61966-2.1", new FileInputStream(INTENT)));
Document document = new Document(pdf, PageSize.A4.rotate());
//Setting some required parameters
pdf.setTagged(); // PDF/UA and PDF/A Level a
pdf.getCatalog().setLang(new PdfString("en-US")); // PDF/UA
pdf.getCatalog().setViewerPreferences( // PDF/UA
new PdfViewerPreferences().setDisplayDocTitle(true)); // PDF/UA
PdfDocumentInfo info = pdf.getDocumentInfo(); // PDF/UA
info.setTitle("iText7 PDF/A-3 example"); // PDF/UA
//Create XMP meta data
pdf.createXmpMetadata(); // PDF/UA and PDF/A Level a
34. United States example
part 2: add attachment
//Add attachment
PdfDictionary parameters = new PdfDictionary();
parameters.put(PdfName.ModDate, new PdfDate().getPdfObject());
PdfFileSpec fileSpec = PdfFileSpec.createEmbeddedFileSpec(
pdf, Files.readAllBytes(Paths.get(DATA)), "united_states.csv",
"united_states.csv", new PdfName("text/csv"), parameters,
PdfName.Data, false);
fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));
pdf.addFileAttachment("united_states.csv", fileSpec);
PdfArray array = new PdfArray();
array.add(fileSpec.getPdfObject().getIndirectReference());
pdf.getCatalog().put(new PdfName("AF"), array);
35. United States example
part 3: parse CSV file
PdfFont font = PdfFontFactory.createFont(FONT, true);
PdfFont bold = PdfFontFactory.createFont(BOLD_FONT, true);
// Parsing a CSV file and add data to a table
Table table = new Table(new float[]{4, 1, 3, 4, 3, 3, 3, 3, 1});
table.setWidthPercent(100);
BufferedReader br = new BufferedReader(new FileReader(DATA));
String line = br.readLine();
process(table, line, bold, true);
while ((line = br.readLine()) != null) {
process(table, line, font, false);
}
br.close();
document.add(table);
document.close();
36. United States example
part 4: process each line
public void process(Table table, String line,
PdfFont font, boolean isHeader) {
StringTokenizer tokenizer = new StringTokenizer(line, ";");
while (tokenizer.hasMoreTokens()) {
if (isHeader) {
table.addHeaderCell(
new Cell().add(
new Paragraph(tokenizer.nextToken()).setFont(font)));
} else {
table.addCell(
new Cell().add(
new Paragraph(tokenizer.nextToken()).setFont(font)));
}
}
}
45. New in iText 7:
improved typography
and support for Indic
scripts
46. iText 5: missing links
Indic scripts:
•Only unsupported major script family
•Feature request #1
•Huge opportunity
•limited support in most other PDF libraries
Other features:
•Optional ligatures in Latin script
•Vowel diacritics in Arabic
47. Indic scripts: problems
•Lack of expertise
•Unicode encodes 49 Indic scripts
•Complex scripts with unique features
•Glyph repositioning: ह + ि = िह
•Glyph substitution: ம + ு = மு
•Half-characters: त + + य = त्य
•Unsolvable issues for iText 5 font engine
•No dedicated Unicode points for half-characters
•No font lookups past ‘uFFFF’
•Ligaturization is context-dependent (virama)
48. Indic scripts: solutions
Writing a new font engine
• Automatic script recognition
• Based on Unicode ranges
• Flexibility = extensibility
• Generic Shaper class
• Separate module, only called when necessary
• Glyph replacement rules
• Different per writing system
• Alternate glyphs are font-dependent
51. Status of advanced
typography in iText 7
•Indic scripts
•We already support:
•Devanagari
•Tamil
•Coming soon:
•Telugu
•Others: based on customer demand
•Arabic
•Support for vocalized Arabic (diacritics) is in development
•Latin
•Optional ligatures are fully supported