2. Contents
• Introduction to e-publishing
• HTML
• Difference between HTML and XML
• XML
• Data Conversion
• Pdf to XML process
• E-Pub
3. What is e-publishing?
• Digital publishing, also known as electronic publishing or e-
publishing, is any type of publishing that involves
disseminating information or entertainment by digital means.
• Electronic publishing includes the digital publication of e-
books, digital magazines, and the development of digital
libraries and catalogues.
4. What is HTML?
• HTML is the standard markup language for creating Web pages.
• HTML stands for Hyper Text Markup Language
• HTML describes the structure of Web pages using markup
• HTML elements are the building blocks of HTML pages
• HTML elements are represented by tags
• HTML tags label pieces of content such as "heading", "paragraph",
"table", and so on
• Browsers do not display the HTML tags, but use them to render the
content of the page
HTML
6. Difference Between HTMLand XML
HTML XML
HTML tags have a fixed
meaning and browsers know
what it is.
XML tags are different for
different applications, and users
know what they mean.
HTML tags are used for display. XML tags are used to describe
documents and data.
HTML is not case sensitive XML is case sensitive
7. What is XML?
• XML stands for extensible Markup Language.
• A markup language is used to provide information about a document.
• Tags are added to the document to provide the extra information.
• HTML tags tell a browser how to display the document.
• XML tags give a reader some idea what some of the data means.
• XML tags are not predefine. You must define your own tags.
• XML uses a Document Type Definition(DTD) or an XML Schema to
describe the data.
• XML with DTD or XML with Schema is designed to be self-
descriptive.
XML
8. XMLRules
• Tags are enclosed in angle brackets.
• Tags come in pairs with start-tags and end-tags.
• Tags must be properly nested.
<name><email>…</name></email> is not allowed.
<name><email>…</email><name> is.
• Tags are case sensitive.
<address> is not the same as <Address>
9. DocumentType Definitions (DTD)
• A DTD describes the tree structure of a document and something
about its data.
• A DTD determines how many times a node may appear, and how
child nodes are ordered.
10. • Schemas are themselves XML documents.
• They were standardized after DTDs and provide more
information about the document.
• They have a number of data types including string, decimal,
integer, boolean, date, and time.
• They divide elements into simple and complex types.
• They also determine the tree structure and how many children a
node may have.
Schemas
11. Example of XMLDocument
<?xml version=“1.0”/>
<address>
<name>Ganesh Koli</name>
<email>ganeshkoli7175@gmail.com</email>
<phone>7774817221</phone>
</address>
12. What is the data conversion?
• Data conversion is the conversion of computer data from one
format to another. Throughout a computer environment, data is
encoded in a variety of ways. For example, computer hardware is
built on the basis of certain standards, which requires that data
contains, for example, parity bit checks.
Data Conversion
13. PDF To XMLProcess
PDF TO OCR (by ABBYY Fine Reader)
Word Replacement
Tagging For XML (Software-Epsilon)
Validation (software-XML Spy)
Quality Checking
Send to XML
14. What is e-pub?
• EPUB is e-book file format with extension of .epub. We can read
on devices like smartphone, tablet, and computer.
• It is a technical standard published by the International Digital
Publishing Forum (IDPF). The term is short for electronic
publication and is sometimes styled epub.
E-Pub
15. Why should use the .epub format?
• Because it's a completely open and free standard. The .epub is a
standard for
• eBooks created by the International Digital Publishing Forum. It
consists of basic
• XHTML for the book content, XML for descriptions, and a re-
named zip file to
• hold it all in. Anyone can make these eBooks, and since they're
essentially just
• XHTML, anyone can read them.
16. What is XHTML?
• XHTML stands for EXtensible HyperText Markup Language
• XHTML is almost identical to HTML
• XHTML is stricter than HTML
• XHTML is HTML defined as an XML application
• XHTML is supported by all major browsers
17. HTML XHTML
HTML is Hypertext Markup
Language.
XHTML is Extensible
Hypertext Markup Language.
Tags aren't extensible Tags are extensible
All the content can be put under
body element.
All the content has to be put in
blocks (p, under body element.
Tags are not case-sensitive Only lower case tags are
allowed
Possible to leave off and ending
tag like </body>
Tags should appear in pair
Overlapping tags No overlapping tags
Difference Between HTMLand XHTML
18. What is OCR?
• Optical Character Recognition, or OCR, is a technology that
enables you to convert different types of documents, such as
scanned paper documents, PDF files or images captured by a
digital camera into editable and searchable data. (Software-
ABBYY FINE READER for OCR)
22. • mimetype - tells a reader/operating system what's in here
• META-INF folder - This folder contains, at minimum, the
container.xml
• file, which tells the reader software where in the container to find
the book.
• OEBPS folder - Recommended location for the books content. It
contains:
• Images folder - images go here
• ISBN number.opf - XML file that lists what's in the
container
• toc.ncx - This is the table of Contents
• xhtml files - The book's contents are in these
23. Makingthe Container
• Now we make the .epub container that all these files go in.
• Create an empty .zip file with whatever name you like
• Copy the mimetype file into the zip file (don't use compression on
this file)
• Copy the rest of the files and folders mentioned above into the zip
file *
• Re-name the .zip extension to .epub
• Then do online validation of .epub file