SlideShare a Scribd company logo
1 of 29
Download to read offline
Tag! Your PDF is It!
Alejandro Piñeiro and Joanmarie Diggs
GUADEC 2013
2
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
Topics
● Tagged PDFs:
– What They Are
– Why We Want Them
– How to Make Them
● Current Status of the Project
● Getting the Code (and what you'll see when you do)
`
Tagged PDFs
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
4
Tagged PDF > PDF
• Meta-information about page content
• HTMLish tags and IDs for text spans
• Alternative text for images
• Replacement text for symbols
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
5
Why We Want Them
• Enhanced document accessibility
• Through exposure of structural and semantic
information associated with the tags
Thanks (again) Friends of GNOME!!!
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
6
Why We Want Them (cont.)
• Reflow functionality (e.g. for mobile devices)
• Export to other applications with format, layout,
font data, etc.
• Copy and paste to other applications with some
fundamental retention of content format
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
7
Making Tagged PDFs
✘ AbiWord: No
✘ Google Docs: No
✘ LaTeX: No
✘ Scribus: No
✘ PDF Studio: No
✘ python-pisa: No
✔ LibreOffice: Yes
(and it's easy!)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
8
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
9
PDF/A-1a > Tagged PDF
• Objective: Search and repurpose document content
• Includes:
- PDF/A-1b: Reproduce document appearance
- Structure / Hierarchy
- Tagged PDF
- Unicode character maps
- Language specification
`
Current Status
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
11
Tagged PDF Support
✔Parse the document structure tree: Poppler
✔Expose the tree and attributes: Poppler GLib
✔Provide tools to examine and verify result: Poppler
● Create parallel object tree with attributes: Evince
● (Expose object tree and attributes via ATK: Evince)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
12
PDF/A-1a Support
? PDF/A-1b
✔ Tagged PDF
✔ Structure / Hierarchy
? Unicode character maps
✔ Language specification
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
13
What's Next?
• Create parallel object tree with attributes: Evince
• (Expose object tree and attributes via ATK: Evince)
? PDF/A-1b and Unicode character maps
? Adding support to LaTeX, et al.
`
Getting the Code
(and what you'll see when you do)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
15
Credit Where Credit is Due
• Adrián Pérez: Document Parser Extraordinaire
• Carlos García Campos: Maintains Evince & Poppler
Thanks Guys!!!
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
16
Getting the Code
• git://git.freedesktop.org/git/poppler/poppler
• Today
- Branch: tagged-pdf
- Patches: fdo bugs 64816 and 67710
• Soon: master branch
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
17
Getting the Code (cont.)
• Poppler:
10 files changed, 2309 insertions(+), 17 deletions(-)
• Popper Glib:
16 files changed, 3011 insertions(+)
• Utils:
3 files changed, 661 insertions(+), 2 deletions(-)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
18
Associated Output Tools: Before
• pdfinfo: author, editor, etc.
• pdftotext: content (plain text)
• pdftohtml: content (barely formatted text)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
19
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
20
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
21
Associated Output Tools: After
● pdfstructtohtml: like pdftohtml but preserves tags
● pdfinfo's new options:
- hierarchy
- hierarchy along with content of each element
● poppler-glib-demo: new option to display hierarchy
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
22
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
23
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
24
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
25
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
26
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
27
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
28
`
Questions?

More Related Content

Similar to Tag! Your PDF is It! (GUADEC 2013)

How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbeeHow To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbeedigitalthinkingbee.com
 
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentationOpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentationAlkacon Software GmbH & Co. KG
 
Vendoring - Go west 2018-03-07
Vendoring - Go west 2018-03-07Vendoring - Go west 2018-03-07
Vendoring - Go west 2018-03-07Max Ekman
 
Digitization at the AUC Robert W. Woodruff Library - A Case Study
Digitization at the AUC Robert W. Woodruff Library - A Case StudyDigitization at the AUC Robert W. Woodruff Library - A Case Study
Digitization at the AUC Robert W. Woodruff Library - A Case StudyCliff Landis
 
Google Tools For Researchers
Google Tools For ResearchersGoogle Tools For Researchers
Google Tools For Researchersdcsla
 
Web optimizations Back to the basics - Razvan Rosu
Web optimizations  Back to the basics - Razvan RosuWeb optimizations  Back to the basics - Razvan Rosu
Web optimizations Back to the basics - Razvan RosuRazvan Rosu
 
Collaborative environment with data science notebook
Collaborative environment with data science notebook Collaborative environment with data science notebook
Collaborative environment with data science notebook Moon Soo Lee
 
PDF/a for Dutch Law firms
PDF/a for Dutch Law firmsPDF/a for Dutch Law firms
PDF/a for Dutch Law firmsDean Sappey
 
Cool Tools for Technical Writers
Cool Tools for Technical WritersCool Tools for Technical Writers
Cool Tools for Technical WritersJeff Haas
 
Osseo Apps- Weaver Tech Institute
Osseo Apps-  Weaver Tech InstituteOsseo Apps-  Weaver Tech Institute
Osseo Apps- Weaver Tech InstituteLisa Sjogren
 
Building Multilingual Websites in Drupal 7
Building Multilingual Websites in Drupal 7Building Multilingual Websites in Drupal 7
Building Multilingual Websites in Drupal 7robinpuga
 
Contributing to Apache Spark 3
Contributing to Apache Spark 3Contributing to Apache Spark 3
Contributing to Apache Spark 3Holden Karau
 
Hacking the Google Snippet - Digpen 7 workshop
Hacking the Google Snippet - Digpen 7 workshopHacking the Google Snippet - Digpen 7 workshop
Hacking the Google Snippet - Digpen 7 workshopIan Macfarlane
 
Technology Tools
Technology ToolsTechnology Tools
Technology Toolsmrarbit
 
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...Fishbowl Solutions
 

Similar to Tag! Your PDF is It! (GUADEC 2013) (20)

How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbeeHow To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
 
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentationOpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
 
Vendoring - Go west 2018-03-07
Vendoring - Go west 2018-03-07Vendoring - Go west 2018-03-07
Vendoring - Go west 2018-03-07
 
Digitization at the AUC Robert W. Woodruff Library - A Case Study
Digitization at the AUC Robert W. Woodruff Library - A Case StudyDigitization at the AUC Robert W. Woodruff Library - A Case Study
Digitization at the AUC Robert W. Woodruff Library - A Case Study
 
Google Tools For Researchers
Google Tools For ResearchersGoogle Tools For Researchers
Google Tools For Researchers
 
Web optimizations Back to the basics - Razvan Rosu
Web optimizations  Back to the basics - Razvan RosuWeb optimizations  Back to the basics - Razvan Rosu
Web optimizations Back to the basics - Razvan Rosu
 
Ignite ID PePcon 2014
Ignite ID PePcon 2014Ignite ID PePcon 2014
Ignite ID PePcon 2014
 
Collaboration in the workplace and beyond
Collaboration in the workplace and beyondCollaboration in the workplace and beyond
Collaboration in the workplace and beyond
 
Collaborative environment with data science notebook
Collaborative environment with data science notebook Collaborative environment with data science notebook
Collaborative environment with data science notebook
 
PDF/a for Dutch Law firms
PDF/a for Dutch Law firmsPDF/a for Dutch Law firms
PDF/a for Dutch Law firms
 
Cool Tools for Technical Writers
Cool Tools for Technical WritersCool Tools for Technical Writers
Cool Tools for Technical Writers
 
SFScon 2020 - Matteo Ghetta - DataPlotly - D3-like plots in QGIS
SFScon 2020 - Matteo Ghetta - DataPlotly - D3-like plots in QGISSFScon 2020 - Matteo Ghetta - DataPlotly - D3-like plots in QGIS
SFScon 2020 - Matteo Ghetta - DataPlotly - D3-like plots in QGIS
 
Osseo Apps- Weaver Tech Institute
Osseo Apps-  Weaver Tech InstituteOsseo Apps-  Weaver Tech Institute
Osseo Apps- Weaver Tech Institute
 
Building Multilingual Websites in Drupal 7
Building Multilingual Websites in Drupal 7Building Multilingual Websites in Drupal 7
Building Multilingual Websites in Drupal 7
 
Lesson 05
Lesson 05Lesson 05
Lesson 05
 
Contributing to Apache Spark 3
Contributing to Apache Spark 3Contributing to Apache Spark 3
Contributing to Apache Spark 3
 
Hacking the Google Snippet - Digpen 7 workshop
Hacking the Google Snippet - Digpen 7 workshopHacking the Google Snippet - Digpen 7 workshop
Hacking the Google Snippet - Digpen 7 workshop
 
Technology Tools
Technology ToolsTechnology Tools
Technology Tools
 
CollegeDiveIn presentation
CollegeDiveIn presentationCollegeDiveIn presentation
CollegeDiveIn presentation
 
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
 

More from Igalia

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEIgalia
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesIgalia
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceIgalia
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfIgalia
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JITIgalia
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!Igalia
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerIgalia
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in MesaIgalia
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIgalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera LinuxIgalia
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVMIgalia
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsIgalia
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesIgalia
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSIgalia
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webIgalia
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersIgalia
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...Igalia
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on RaspberryIgalia
 

More from Igalia (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to Maintenance
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdf
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
 

Recently uploaded

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Tag! Your PDF is It! (GUADEC 2013)

  • 1. Tag! Your PDF is It! Alejandro Piñeiro and Joanmarie Diggs GUADEC 2013
  • 2. 2 Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 Topics ● Tagged PDFs: – What They Are – Why We Want Them – How to Make Them ● Current Status of the Project ● Getting the Code (and what you'll see when you do)
  • 4. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 4 Tagged PDF > PDF • Meta-information about page content • HTMLish tags and IDs for text spans • Alternative text for images • Replacement text for symbols
  • 5. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 5 Why We Want Them • Enhanced document accessibility • Through exposure of structural and semantic information associated with the tags Thanks (again) Friends of GNOME!!!
  • 6. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 6 Why We Want Them (cont.) • Reflow functionality (e.g. for mobile devices) • Export to other applications with format, layout, font data, etc. • Copy and paste to other applications with some fundamental retention of content format
  • 7. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 7 Making Tagged PDFs ✘ AbiWord: No ✘ Google Docs: No ✘ LaTeX: No ✘ Scribus: No ✘ PDF Studio: No ✘ python-pisa: No ✔ LibreOffice: Yes (and it's easy!)
  • 8. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 8
  • 9. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 9 PDF/A-1a > Tagged PDF • Objective: Search and repurpose document content • Includes: - PDF/A-1b: Reproduce document appearance - Structure / Hierarchy - Tagged PDF - Unicode character maps - Language specification
  • 11. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 11 Tagged PDF Support ✔Parse the document structure tree: Poppler ✔Expose the tree and attributes: Poppler GLib ✔Provide tools to examine and verify result: Poppler ● Create parallel object tree with attributes: Evince ● (Expose object tree and attributes via ATK: Evince)
  • 12. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 12 PDF/A-1a Support ? PDF/A-1b ✔ Tagged PDF ✔ Structure / Hierarchy ? Unicode character maps ✔ Language specification
  • 13. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 13 What's Next? • Create parallel object tree with attributes: Evince • (Expose object tree and attributes via ATK: Evince) ? PDF/A-1b and Unicode character maps ? Adding support to LaTeX, et al.
  • 14. ` Getting the Code (and what you'll see when you do)
  • 15. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 15 Credit Where Credit is Due • Adrián Pérez: Document Parser Extraordinaire • Carlos García Campos: Maintains Evince & Poppler Thanks Guys!!!
  • 16. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 16 Getting the Code • git://git.freedesktop.org/git/poppler/poppler • Today - Branch: tagged-pdf - Patches: fdo bugs 64816 and 67710 • Soon: master branch
  • 17. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 17 Getting the Code (cont.) • Poppler: 10 files changed, 2309 insertions(+), 17 deletions(-) • Popper Glib: 16 files changed, 3011 insertions(+) • Utils: 3 files changed, 661 insertions(+), 2 deletions(-)
  • 18. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 18 Associated Output Tools: Before • pdfinfo: author, editor, etc. • pdftotext: content (plain text) • pdftohtml: content (barely formatted text)
  • 19. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 19
  • 20. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 20
  • 21. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 21 Associated Output Tools: After ● pdfstructtohtml: like pdftohtml but preserves tags ● pdfinfo's new options: - hierarchy - hierarchy along with content of each element ● poppler-glib-demo: new option to display hierarchy
  • 22. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 22
  • 23. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 23
  • 24. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 24
  • 25. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 25
  • 26. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 26
  • 27. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 27
  • 28. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 28