2. Agenda
1. Challenges of PDF conversion
2. Making convertible PDF from the start
3. About all those other PDFs out there….
4. Features of Datalogics PDF Alchemist
5. Summary and concluding thoughts
3. A bit about me
CTO at Datalogics
Worked with PDF for over 15 years
Board member of PDF Association
Active participant in the PDF standards
community
4. Challenges of PDF conversion
PDF was designed to convey an exact visual
representation of information to humans
PDF’s origins did not account for storing and
retrieving machine-understandable
information
PDF is page and position based, lacks the
notion of text flow and grouping*
Many different PDFs in the wild – some easy
to interpret, some very complex
5. PDF designed to convey exact visual representation
Reliable visual representation, but
many potential ways to make
something that looks a certain way
Capability to tie semantic
information to content came later
on to PDF
Use is increasing but still far from
the majority of content being
produced
Most PDF generators still prefer
smaller files to PDF files that are
easier to repurpose
6. PDF designed for human consumption
At the time PDF was conceived as a PostScript replacement, reliable
rendering for human readers was an important issue…
Focus was on retrieving the information needed to display and print
pages for peoples’ use
Affordances for machine “reading” were bolt-ons to the format
Community has made great strides in allowing for machine
interpretation, but proper use requires expertise in the domain
Structure and semantics are optional – usage is still rare
This is NOT a PDF specific issue
7. Like a TIFF or raster image, marks on a PDF page are precisely
positioned and usually come in small discrete pieces
Humans automatically see a page flow that is not always present in the
PDF syntax
Contents of a PDF page can be specified in an order very different from
how we read
Words, images, other elements on a page may have the marks that
constitute spread far throughout the page marking stream
PDF is page and position based
8. CreatingTagged PDF means you embed
the information for repurposing and
reflow directly into the PDF when it’s
created – at the right time!
Easy to convertTagged PDF into other
formats
But, not allTagged PDF is the same, and
not generators emit usefulTagged PDF!
Avoid all this trouble at the start – if you can!
9. But how about all those other PDFs out there?
Existing PDFs aren’t going to magically gain structure
semantics
Existing tools and workflows may not be upgradable in
the near future – or at all
Not all files converted to PDF contain enough information
for structure semantics in the first place
10. Is OCR the only way to handle these? No!
OCR is not always reliable in
converting pictures of text
back into actual text flows
Rasterizing PDFs to scan and
turn back into non-raster
form introduces multiple
chances for errors and
unexpected results
11. Conversion of PDF to HTML relies upon:
Seeing pages in a way like a human reads them
Figuring our the logical structure of the pages
Putting text back together into text flows
Putting all these elements out in the correct order
13. What does PDF Alchemist offer?
Works on untagged PDFs – handles existing PDFs, does not require
workflow changes or regenerating/reconstructing source PDFs
Turns placed words in PDFs back into text flows – reflowable text
Re-creates tables and lists from page content
Removes pagination artifacts such as page #s and running headers
Converts PDF into single-page HTML5 + CSS or into EPUB packages
Converts PDF forms into fixed-layout HTML forms for use in mobile
environments
15. • Available as a command line tool for server and workflow
integration
• Or as a simple “C” API for integration into programs
Using PDF Alchemist
16. Summary
Most PDFs are and will continue to be made without
regard to repurposing
Reconstructing the content and flow of PDF relies upon
advanced logic and mimicry of how humans read pages
PDF Alchemist offers this logic in an easy to use software
package
Good afternoon!
My name is Ching Yue
I am the director of the ebook mobile technologies at Datalogics.
Today I’d like to use this opportunity to introduce you to Datalogics. To also talk briefly the solutions we provide, and also where we think the market expansions are in the ebook industry.
What I would like for you to take away from this presentation are about who we are, and if you are thinking of ebooks, come talk to us. If you are already in the ebook business and if you are thinking of expanding, do look into the areas that we speak of. If you have questions, feel free to come to talk to me and also talk to Datalogics, we will be very happy to help you succeed in your current or any new endeavors.
Good afternoon!
My name is Ching Yue
I am the director of the ebook mobile technologies at Datalogics.
Today I’d like to use this opportunity to introduce you to Datalogics. To also talk briefly the solutions we provide, and also where we think the market expansions are in the ebook industry.
What I would like for you to take away from this presentation are about who we are, and if you are thinking of ebooks, come talk to us. If you are already in the ebook business and if you are thinking of expanding, do look into the areas that we speak of. If you have questions, feel free to come to talk to me and also talk to Datalogics, we will be very happy to help you succeed in your current or any new endeavors.
Datalogics was founded in Chicago, in 1967. For a software company, we have a pretty good history.
We have evolved a lot of course. In the earlier years, the software we developed ran on room size mainframe machines, and now a good part of our business is working with a computer that fits in your pocket.
One thing that we have stayed true though, is to stay engineering focused.
What that means is that we not only develop and sell software, but also develop tools and solutions to support the developers who can take what we have and develop that further into a product of their own.
This allows you to tailor the solution to your needs, and to add your competitive advantage to your solutions.
We are the Primary channel for Adobe ebook technologies including Adobe Reader Mobile SDK, Adobe Content Server, and more
We have a dedicated ebook team at Datalogics, to promote, sell, and support these solutions.
We work closely with many of our customers in different phases of their ebook initiatives and integration.
Datalogics was founded in Chicago, in 1967. For a software company, we have a pretty good history.
We have evolved a lot of course. In the earlier years, the software we developed ran on room size mainframe machines, and now a good part of our business is working with a computer that fits in your pocket.
One thing that we have stayed true though, is to stay engineering focused.
What that means is that we not only develop and sell software, but also develop tools and solutions to support the developers who can take what we have and develop that further into a product of their own.
This allows you to tailor the solution to your needs, and to add your competitive advantage to your solutions.
We are the Primary channel for Adobe ebook technologies including Adobe Reader Mobile SDK, Adobe Content Server, and more
We have a dedicated ebook team at Datalogics, to promote, sell, and support these solutions.
We work closely with many of our customers in different phases of their ebook initiatives and integration.
The first thing that we look at is the market.
Ebook business has gone through a few phases. Where we see now, is the potentital in bringing the digital content and digital learning into classrooms.
If you are a library, you probably have some kind of ebook offerings to your readers already. Your next phase is to think about how you can leverage the electronic platform to engage your patrons, and become yet again, their source of knowledge, virtually, this time, without their physical presence in a library building.
Geo expansion. Ebook has a great advantage to break the geo barrier. Ebook delivery, not to simplify the business process, makes content delivery across the global much more feasible.
And lastly, the ebook platform can be very useful for people facing physical and learning challenges.
Softwares can be a great way to assist them in enjoying reading and learning in ways that they wouldn’t be able to with a printed book.
Datalogics was founded in Chicago, in 1967. For a software company, we have a pretty good history.
We have evolved a lot of course. In the earlier years, the software we developed ran on room size mainframe machines, and now a good part of our business is working with a computer that fits in your pocket.
One thing that we have stayed true though, is to stay engineering focused.
What that means is that we not only develop and sell software, but also develop tools and solutions to support the developers who can take what we have and develop that further into a product of their own.
This allows you to tailor the solution to your needs, and to add your competitive advantage to your solutions.
We are the Primary channel for Adobe ebook technologies including Adobe Reader Mobile SDK, Adobe Content Server, and more
We have a dedicated ebook team at Datalogics, to promote, sell, and support these solutions.
We work closely with many of our customers in different phases of their ebook initiatives and integration.
In conjuection with the market expansion, we see higher demand in hardward and software channels.