SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Tag! Your PDF is It!
Alejandro Piñeiro and Joanmarie Diggs
GUADEC 2013
2
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
Topics
● Tagged PDFs:
– What They Are
– Why We Want Them
– How to Make Them
● Current Status of the Project
● Getting the Code (and what you'll see when you do)
`
Tagged PDFs
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
4
Tagged PDF > PDF
• Meta-information about page content
• HTMLish tags and IDs for text spans
• Alternative text for images
• Replacement text for symbols
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
5
Why We Want Them
• Enhanced document accessibility
• Through exposure of structural and semantic
information associated with the tags
Thanks (again) Friends of GNOME!!!
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
6
Why We Want Them (cont.)
• Reflow functionality (e.g. for mobile devices)
• Export to other applications with format, layout,
font data, etc.
• Copy and paste to other applications with some
fundamental retention of content format
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
7
Making Tagged PDFs
✘ AbiWord: No
✘ Google Docs: No
✘ LaTeX: No
✘ Scribus: No
✘ PDF Studio: No
✘ python-pisa: No
✔ LibreOffice: Yes
(and it's easy!)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
8
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
9
PDF/A-1a > Tagged PDF
• Objective: Search and repurpose document content
• Includes:
- PDF/A-1b: Reproduce document appearance
- Structure / Hierarchy
- Tagged PDF
- Unicode character maps
- Language specification
`
Current Status
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
11
Tagged PDF Support
✔Parse the document structure tree: Poppler
✔Expose the tree and attributes: Poppler GLib
✔Provide tools to examine and verify result: Poppler
● Create parallel object tree with attributes: Evince
● (Expose object tree and attributes via ATK: Evince)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
12
PDF/A-1a Support
? PDF/A-1b
✔ Tagged PDF
✔ Structure / Hierarchy
? Unicode character maps
✔ Language specification
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
13
What's Next?
• Create parallel object tree with attributes: Evince
• (Expose object tree and attributes via ATK: Evince)
? PDF/A-1b and Unicode character maps
? Adding support to LaTeX, et al.
`
Getting the Code
(and what you'll see when you do)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
15
Credit Where Credit is Due
• Adrián Pérez: Document Parser Extraordinaire
• Carlos García Campos: Maintains Evince & Poppler
Thanks Guys!!!
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
16
Getting the Code
• git://git.freedesktop.org/git/poppler/poppler
• Today
- Branch: tagged-pdf
- Patches: fdo bugs 64816 and 67710
• Soon: master branch
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
17
Getting the Code (cont.)
• Poppler:
10 files changed, 2309 insertions(+), 17 deletions(-)
• Popper Glib:
16 files changed, 3011 insertions(+)
• Utils:
3 files changed, 661 insertions(+), 2 deletions(-)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
18
Associated Output Tools: Before
• pdfinfo: author, editor, etc.
• pdftotext: content (plain text)
• pdftohtml: content (barely formatted text)
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
19
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
20
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
21
Associated Output Tools: After
● pdfstructtohtml: like pdftohtml but preserves tags
● pdfinfo's new options:
- hierarchy
- hierarchy along with content of each element
● poppler-glib-demo: new option to display hierarchy
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
22
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
23
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
24
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
25
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
26
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
27
Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013
28
`
Questions?

Weitere ähnliche Inhalte

Ähnlich wie Tag! Your PDF is It! (GUADEC 2013)

How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbeeHow To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbeedigitalthinkingbee.com
 
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentationOpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentationAlkacon Software GmbH & Co. KG
 
Vendoring - Go west 2018-03-07
Vendoring - Go west 2018-03-07Vendoring - Go west 2018-03-07
Vendoring - Go west 2018-03-07Max Ekman
 
Digitization at the AUC Robert W. Woodruff Library - A Case Study
Digitization at the AUC Robert W. Woodruff Library - A Case StudyDigitization at the AUC Robert W. Woodruff Library - A Case Study
Digitization at the AUC Robert W. Woodruff Library - A Case StudyCliff Landis
 
Google Tools For Researchers
Google Tools For ResearchersGoogle Tools For Researchers
Google Tools For Researchersdcsla
 
Web optimizations Back to the basics - Razvan Rosu
Web optimizations  Back to the basics - Razvan RosuWeb optimizations  Back to the basics - Razvan Rosu
Web optimizations Back to the basics - Razvan RosuRazvan Rosu
 
Collaborative environment with data science notebook
Collaborative environment with data science notebook Collaborative environment with data science notebook
Collaborative environment with data science notebook Moon Soo Lee
 
PDF/a for Dutch Law firms
PDF/a for Dutch Law firmsPDF/a for Dutch Law firms
PDF/a for Dutch Law firmsDean Sappey
 
Cool Tools for Technical Writers
Cool Tools for Technical WritersCool Tools for Technical Writers
Cool Tools for Technical WritersJeff Haas
 
Osseo Apps- Weaver Tech Institute
Osseo Apps-  Weaver Tech InstituteOsseo Apps-  Weaver Tech Institute
Osseo Apps- Weaver Tech InstituteLisa Sjogren
 
Building Multilingual Websites in Drupal 7
Building Multilingual Websites in Drupal 7Building Multilingual Websites in Drupal 7
Building Multilingual Websites in Drupal 7robinpuga
 
Contributing to Apache Spark 3
Contributing to Apache Spark 3Contributing to Apache Spark 3
Contributing to Apache Spark 3Holden Karau
 
Hacking the Google Snippet - Digpen 7 workshop
Hacking the Google Snippet - Digpen 7 workshopHacking the Google Snippet - Digpen 7 workshop
Hacking the Google Snippet - Digpen 7 workshopIan Macfarlane
 
Technology Tools
Technology ToolsTechnology Tools
Technology Toolsmrarbit
 
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...Fishbowl Solutions
 

Ähnlich wie Tag! Your PDF is It! (GUADEC 2013) (20)

How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbeeHow To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
How To Use Google Docs & Share Files - Belinda Bagatsing - digitalthinkingbee
 
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentationOpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
OpenCms Days 2014 - Introducing the 9.5 OpenCms documentation
 
Vendoring - Go west 2018-03-07
Vendoring - Go west 2018-03-07Vendoring - Go west 2018-03-07
Vendoring - Go west 2018-03-07
 
Digitization at the AUC Robert W. Woodruff Library - A Case Study
Digitization at the AUC Robert W. Woodruff Library - A Case StudyDigitization at the AUC Robert W. Woodruff Library - A Case Study
Digitization at the AUC Robert W. Woodruff Library - A Case Study
 
Google Tools For Researchers
Google Tools For ResearchersGoogle Tools For Researchers
Google Tools For Researchers
 
Web optimizations Back to the basics - Razvan Rosu
Web optimizations  Back to the basics - Razvan RosuWeb optimizations  Back to the basics - Razvan Rosu
Web optimizations Back to the basics - Razvan Rosu
 
Ignite ID PePcon 2014
Ignite ID PePcon 2014Ignite ID PePcon 2014
Ignite ID PePcon 2014
 
Collaboration in the workplace and beyond
Collaboration in the workplace and beyondCollaboration in the workplace and beyond
Collaboration in the workplace and beyond
 
Collaborative environment with data science notebook
Collaborative environment with data science notebook Collaborative environment with data science notebook
Collaborative environment with data science notebook
 
PDF/a for Dutch Law firms
PDF/a for Dutch Law firmsPDF/a for Dutch Law firms
PDF/a for Dutch Law firms
 
Cool Tools for Technical Writers
Cool Tools for Technical WritersCool Tools for Technical Writers
Cool Tools for Technical Writers
 
SFScon 2020 - Matteo Ghetta - DataPlotly - D3-like plots in QGIS
SFScon 2020 - Matteo Ghetta - DataPlotly - D3-like plots in QGISSFScon 2020 - Matteo Ghetta - DataPlotly - D3-like plots in QGIS
SFScon 2020 - Matteo Ghetta - DataPlotly - D3-like plots in QGIS
 
Osseo Apps- Weaver Tech Institute
Osseo Apps-  Weaver Tech InstituteOsseo Apps-  Weaver Tech Institute
Osseo Apps- Weaver Tech Institute
 
Building Multilingual Websites in Drupal 7
Building Multilingual Websites in Drupal 7Building Multilingual Websites in Drupal 7
Building Multilingual Websites in Drupal 7
 
Lesson 05
Lesson 05Lesson 05
Lesson 05
 
Contributing to Apache Spark 3
Contributing to Apache Spark 3Contributing to Apache Spark 3
Contributing to Apache Spark 3
 
Hacking the Google Snippet - Digpen 7 workshop
Hacking the Google Snippet - Digpen 7 workshopHacking the Google Snippet - Digpen 7 workshop
Hacking the Google Snippet - Digpen 7 workshop
 
Technology Tools
Technology ToolsTechnology Tools
Technology Tools
 
CollegeDiveIn presentation
CollegeDiveIn presentationCollegeDiveIn presentation
CollegeDiveIn presentation
 
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
Google for Work Applications: Enterprise-Class Collaboration and Search Integ...
 

Mehr von Igalia

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEIgalia
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesIgalia
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceIgalia
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfIgalia
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JITIgalia
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!Igalia
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerIgalia
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in MesaIgalia
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIgalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera LinuxIgalia
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVMIgalia
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsIgalia
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesIgalia
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSIgalia
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webIgalia
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersIgalia
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...Igalia
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on RaspberryIgalia
 

Mehr von Igalia (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to Maintenance
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdf
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
 

Kürzlich hochgeladen

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jNeo4j
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideStefan Dietze
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 

Kürzlich hochgeladen (20)

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 

Tag! Your PDF is It! (GUADEC 2013)

  • 1. Tag! Your PDF is It! Alejandro Piñeiro and Joanmarie Diggs GUADEC 2013
  • 2. 2 Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 Topics ● Tagged PDFs: – What They Are – Why We Want Them – How to Make Them ● Current Status of the Project ● Getting the Code (and what you'll see when you do)
  • 4. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 4 Tagged PDF > PDF • Meta-information about page content • HTMLish tags and IDs for text spans • Alternative text for images • Replacement text for symbols
  • 5. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 5 Why We Want Them • Enhanced document accessibility • Through exposure of structural and semantic information associated with the tags Thanks (again) Friends of GNOME!!!
  • 6. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 6 Why We Want Them (cont.) • Reflow functionality (e.g. for mobile devices) • Export to other applications with format, layout, font data, etc. • Copy and paste to other applications with some fundamental retention of content format
  • 7. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 7 Making Tagged PDFs ✘ AbiWord: No ✘ Google Docs: No ✘ LaTeX: No ✘ Scribus: No ✘ PDF Studio: No ✘ python-pisa: No ✔ LibreOffice: Yes (and it's easy!)
  • 8. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 8
  • 9. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 9 PDF/A-1a > Tagged PDF • Objective: Search and repurpose document content • Includes: - PDF/A-1b: Reproduce document appearance - Structure / Hierarchy - Tagged PDF - Unicode character maps - Language specification
  • 11. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 11 Tagged PDF Support ✔Parse the document structure tree: Poppler ✔Expose the tree and attributes: Poppler GLib ✔Provide tools to examine and verify result: Poppler ● Create parallel object tree with attributes: Evince ● (Expose object tree and attributes via ATK: Evince)
  • 12. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 12 PDF/A-1a Support ? PDF/A-1b ✔ Tagged PDF ✔ Structure / Hierarchy ? Unicode character maps ✔ Language specification
  • 13. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 13 What's Next? • Create parallel object tree with attributes: Evince • (Expose object tree and attributes via ATK: Evince) ? PDF/A-1b and Unicode character maps ? Adding support to LaTeX, et al.
  • 14. ` Getting the Code (and what you'll see when you do)
  • 15. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 15 Credit Where Credit is Due • Adrián Pérez: Document Parser Extraordinaire • Carlos García Campos: Maintains Evince & Poppler Thanks Guys!!!
  • 16. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 16 Getting the Code • git://git.freedesktop.org/git/poppler/poppler • Today - Branch: tagged-pdf - Patches: fdo bugs 64816 and 67710 • Soon: master branch
  • 17. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 17 Getting the Code (cont.) • Poppler: 10 files changed, 2309 insertions(+), 17 deletions(-) • Popper Glib: 16 files changed, 3011 insertions(+) • Utils: 3 files changed, 661 insertions(+), 2 deletions(-)
  • 18. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 18 Associated Output Tools: Before • pdfinfo: author, editor, etc. • pdftotext: content (plain text) • pdftohtml: content (barely formatted text)
  • 19. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 19
  • 20. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 20
  • 21. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 21 Associated Output Tools: After ● pdfstructtohtml: like pdftohtml but preserves tags ● pdfinfo's new options: - hierarchy - hierarchy along with content of each element ● poppler-glib-demo: new option to display hierarchy
  • 22. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 22
  • 23. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 23
  • 24. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 24
  • 25. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 25
  • 26. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 26
  • 27. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 27
  • 28. Tag! Your PDF is It! - Alejandro Piñeiro & Joanmarie Diggs - GUADEC 2013 28