2. Table of Contents
CHAPTER 1: Introduction
1.1 What is XML?
1.2 Advantages of XML?
1.3 Differences between XML and HTML
1.4 XML Related Technologies
CHAPTER 2: How XML can be used?
2.2 XML Benefits
2.3 Uses of XML
2.4 XML Tags
CHAPTER 3: XML Editors
3.1 EmEditor
3.2 XML Spy
3.3 XML Syntax Rules
3.4 XML Viewing
CHAPTER 4: XML Documents
4.1 Well Formed XML
4.2 Valid XML
4.3 XML Parser
4.4 Prolog
CHAPTER 5: Document Type Definition
5.1 DTD Elements
5.2 Types of Elements
5.3 Attributes
5.4 Entities
CHAPTER 6: Why we Need DTD?
6.1 Classification of DTD
6.2 Internal DTD
6.3 External DTD
6.4 Problems with DTD
6.5 Design Principles
6.6 XML Schema
2
3. 1. Introduction
XML stands for Extensible Markup Language. XML was developed around 1996 and is a
subset of SGML (Standard Generalized Markup Language). XML was made less complicated
than SGML to enable its use on the web.XML is a set of rules for encoding documents
electronically. XML is a new type of language which has been developed for the web which
is different to any other type of scripting or programming language available before.
XML is used for exchange of data. The language makes it possible to define data in a
structured way. XML tags are not predefined like HTML. XML lets you create your own
unique tags that are meaningful for your data, hence the use of the term “extensible”.
An xml document does not do anything by itself. It is just pure information wrapped in tags.
You have to write a piece of software to send, receive or display it. XML is recommended by
the World Wide Web Consortium (W3C). XML is a meta-language. A meta-language is a
language that's used to define other languages. XML has become popular to use with web
services.
1.1 What is XML?
XML stands for extensible markup language.
XML is a markup language much like HTML.
XML is designed to carry the data, not to display the data.
XML tags are not predefined we can define our own tags.
XML is designed to be self descriptive.
XML is a W3c Recommendation.
XML is designed to store the data.
1.2 Advantages of XML
It is a simultaneously human and machine-readable format.
It supports Unicode, allowing almost any information in any written human
language to be communicated.
It can represent the most general computer science data structures, records, lists
and trees.
The strict syntax and parsing requirements make the necessary parsing
algorithms extremely simple, efficient, and consistent.
XML is heavily used as a format for document storage and processing, both
online and offline.
It is based on international standards.
The hierarchical structure is suitable for most types of documents.
3
4. It manifests as plain text files, which are less restrictive than other proprietary
document formats.
It is platform-independent, thus relatively immune to changes in technology.
XML document is a plain text and human readable and also easy to edit/view.
XML document has a tree structure which is powerful enough to express
complex data and simple enough to understand.
XML documents are language neutral. For e.g. a Java program can generate an
xml which can be parsed by a program written in C++ or Perl.
XML files are operating system independent.
1.3 Differences between XML and HTML
XML and HTML are different and they both have different goals. They are designed for
different purposes. Some people think that xml is an advanced version of html and it has
come to replace html. It is not the case. Both will be there as they are used for different
purposes.
Some of the Differences between XML and HTML
Extensible Markup Language Hyper Text Markup Language
XML is designed to store the data HTML is designed to display the data
XML focus on what the data is HTML focus on how data looks
XML allows us to define our own tags HTML has predefined set of tags
XML is used to transport the data HTML is used to format and display data
1.4 XML Related Technologies
DTD (Document Type Definition) and xml schemas are used to define legal xml tags and
their attributes.
CSS (Cascading Style Sheets) describe HTML or XML in a browser.
XSLT (Extensible Style Sheet Language Transformations) and XPath are used to translate
from one form xml to another.
DOM (Document Object Model), SAX (Simple API for XML), and JAXP (Java API for XML
processing) are all APIs for xml parsing.
2. How Can XML be used?
XML can be used in many aspects of web development, often to simply data storage and
sharing.
4
5. XML Simplifies Data Sharing: XML data is stored in plain text format. This provides a
software and hardware independent way of storing data. This makes it much easier to
create data that different applications can share.
XML Simplifies Data Transport: One of the most time-consuming challenges for
developers is to exchange data between incompatible systems over the Internet.
Exchanging of data using xml greatly reduces this complexity, since the data can be read by
different incompatible applications.
XML Simplifies Platform Changes: XML data is stored in text format. This makes it easier
to expand or upgrade to new operating systems, new applications, or new browsers,
without losing data.
XML Makes our Data More Available: Since xml is independent of hardware and software
applications, xml can make your data more available and useful.
XML is used to Create New Internet Languages: A lot of new Internet languages are
created with XML.
2.2 XML Benefits
XML improves the functionality of web technologies through the use of a more
flexible and adaptable means to identify information.
XML is a Meta language. That is, it is a language that describes other languages.
XML provides the facility to define tags and the structural relationship between
them.
The extensibility and structured nature of xml allows it to be used for
communication between different systems.
2.3 Uses of XML
Meta Content: To describe the contents of a document.
Messaging: Where applications or organizations exchanges data between them.
Database: The data extracted from the database can be preserved with original
information and can be used more than one application in different ways.
2.4 XML Tags
The tags used in xml also look like HTML tags. They are formed by a word (or a number of
words) enclosed inside < > and < / > signs. The difference is that xml tags are not pre-
defined like HTML.
5
6. <Composer> is an example for an opening tag. In XML all opening Tags must have closing
tags, in this case the closing tag would look like </Composer>.
Start Tag
The beginning of every non-empty XML element is marked by a start-tag.
An example of a start-tag: <Composer>
End Tag
The end of every non-empty XML element is marked by an end-tag.
An example of an end-tag: </Composer>
Element Content
The text between the start-tag and end-tag is called the element's content.
The element content in this case would be: This is my home page!!!!!!!
Empty Element Tag
If an element is empty, it must be represented either by a start-tag immediately followed by
an end-tag or by an empty-element tag.
An empty-element tag takes a special form: <BR/>...empty element tag in XML OR
<BR></BR>
Empty-element tags may be used for any element which has no content, whether or not it is
declared using the keyword EMPTY. For interoperability, the empty-element tag must be
used, and can only be used, for elements which are declared EMPTY.
By convention put HTML tags in upper case and XML tags in lower case. Furthermore, XML
is case sensitive. Always remember that <Composer>, <composer> and <COMPOSER> are
different kinds of tags in XML.
Tags should begin with either a letter, an underscore (_) or a colon (:) followed by some
combination of letters, numbers, periods (.), colons, underscores, or hyphens (-) but no
white space, with the exception that no tags should begin with any form of "xml".
3. XML Editor
An xml editor is a markup language editor with added functionality to facilitate the editing
of xml. This can be done using a plain text editor, with all the code visible, but xml editors
have added facilities like tag completion and menus and buttons for tasks that are common
in xml editing, based on data supplied with document type definition (DTD) or the xml tree.
6
7. An xml Editor should be able to
Add closing tags to your opening tags automatically.
Force you to write valid xml.
Verify your xml against a DTD.
Verify your xml against a Schema.
Color codes your xml syntax.
Here are Some xml Editors
Emeditor
XML Notepad
XML Cook top
XML Pro
XML Spy
eNotepad
If you use notepad for xml editing, you will soon run into problems. Notepad does not know
that you are writing xml, so it will not be able to assist you. You will create many errors,
and as your xml documents grow larger you will lose control. Today xml is an important
technology, and every day we can see xml playing a more and more critical role in new web
development.
However, when you start working with xml, you will soon find that it is better to edit xml
documents using a professional xml editor. Good xml editors will help you to write error
free xml documents, validate your text against a DTD or a schema, and force you to stick to
a valid xml structure. Add closing tags to your opening tags automatically.
3.1 EmEditor
Why is EmEditor Professional the Best Text Editor?
1. EmEditor can Launch very Quickly, Almost Instantaneously
You are going to view or edit a large quantity of files every day, but you don't want to wait
for many seconds just to view a file! Unfortunately, many programs, including word
processors and text editors, require you to wait several seconds before you can start using!
This doesn't make sense! You want to increase productivity by using a text editor, but
waiting so long every time doesn't justify your using a text editor. You should not wait
more than one second. That's why EmEditor has been so popular for such a long time.
7
8. 2. Extendable with Plug-ins!
EmEditor exposes many APIs, so programmers can easily write plug-ins that fit their needs.
Features such as Spelling, Word Count, Explorer, Web Preview, and Compare Files, etc. are
designed as plug-ins.
3. Powerful Macros with your Favorite Script Language!
You can write a macro to do almost whatever you want within EmEditor! The macros are
based on the Windows Scripting Host (WSH) engine, so you can use all of the powerful,
robust objects available under the Windows Scripting Host. You can program macros with
popular script languages including JavaScript and VB Script. You can even program with
Perl Script, Python, PHP Script, Ruby, and other Active Script languages as long as the script
engines you want to use are installed on your system.
4. Unicode Support!
EmEditor supports Unicode natively, and in fact, the whole program is built as a Unicode
application. EmEditor allows you to open a file with any encoding supported in the
Windows system, and you can easily convert from one encoding to another within
EmEditor. EmEditor allows you to open Unicode file names, and allows you to search for
Unicode characters. With EmEditor plug-ins, EmEditor allows you to convert a selected text
to HTML/XML Character Reference or Universal Character Names, and vice versa.
5. Easy and Intuitive Design with Tabbed Windows!
EmEditor is designed for Windows XP, thus frequently used shortcut keys are similar to
other Windows applications, such as Copy, Cut, Paste, Undo, and Redo. In addition,
EmEditor uses tabbed windows similar to Slim Browser, Internet Explorer, Firefox and
other tabbed browser applications. This allows you to open multiple documents in one
window and jump between them quickly and easily.
6. Other Features!
There are many other useful Features that are Worth Mentioning:
Keyword highlighting.
Regular expression search and highlighting.
External tools.
Plug-ins using custom bars.
Keyboard, toolbar, menu, font and color customization.
Drag and drop.
Auto save/backup.
8
9. Clickable URLs and e-mail addresses.
The window can be split into a maximum of 4 panes.
Can define multiple configurations and associate file extensions.
Can save backups to the recycle bin.
Can open recently used files from the tray icon on the taskbar.
Shortcut keys to insert accent marks and special characters.
Application error handler support.
64-bit edition available.
Windows Vista ready.
Fast e-mail support.
3.2 XML SPY
XML Spy is the first true integrated development environment for the xml that includes all
major aspects of xml in one powerful and easy-to-use product.
Easy to use.
Syntax coloring.
Automatic tag completion.
Automatic well-formed check.
Easy switching between text view and grid view.
Built in DTD and / or Schema validation.
Built in graphical xml Schema designer.
Powerful conversion utilities.
Database import and export.
Built in templates for most xml document types.
Built in XPath analyzer.
Full SOAP and WSDL capabilities.
Powerful project management.
3.3 XML Syntax Rules
XML as we have seen is a formal specification for markup languages. Every formal language
specification has an associated syntax.
XML Documents as we have seen Comprise two Basic Components.
Data: The actual content.
Markup: Meta-information about data that describes it.
9
10. The syntax rules of xml are very simple and logical. The rules are easy to learn, and easy to
use.
1. Every Element must have Closing Tag
<p>This is a paragraph
<p>This is another paragraph
In xml, it is illegal to omit the closing tag. All elements must have a closing tag.
The very first line of any xml document must declare the document to be an xml document
and specify some other optional attributes.
<?xml version="1.0"?>
The statement above declares the document as an xml document, which means it complies
with xml syntax rules.
2. XML Tags are Case Sensitive
XML elements are defined using xml tags.
XML tags are case sensitive. With xml, the tag <Letter> is different from the tag <Letter>.
Opening and closing tags must be written with the same case.
3. XML Elements must be Properly Nested
In HTML, you might see improperly nested elements:
<b><i>This text is bold and italic</b></i>
In xml, all elements must be properly nested within each other.
<b><i>This text is bold and italic</i></b>
4. XML Documents must have a Root Element
XML documents must contain one element that is the parent of all other elements. This
element is called the root element. XML documents must contain one element that is the
parent of all other elements. This element is called the root element.
<?xml version=”1.0?”>
10
11. <Root><Child><Subchild>.....</Subchild></Child></Root>
5. XML Attribute Values must be Quoted
<?xml version="1.0" ?>
<Address><Bangalore>
<Name Nickname="12">Sumana</Name><Company>Testing</Company></Bangalore>
<Mysore><Name>Sumith</Name><Company EmpID="1675">Mac Studio</Company>
</Mysore></Address>
XML elements can have attributes in name/value pairs just like in HTML.
6. Entity References
Some characters have a special meaning in xml. If you place a character like "<" inside an
xml element, it will generate an error because the parser interprets it as the start of a new
element.
To avoid this error, replace the "<" character with an entity reference.
There are Five Predefined Entity References in xml
Entity Reference
< (less than) <
> (Greater than) >
& (Ampersand) &
‘ (Apostrophe) '
“ (quotation mark) "e;
Note: Only the characters "<" and "&" are strictly illegal in xml. The greater than character
is legal, but it is a good habit to replace it.
7. Comments in XML
Comments should not appear on the first line or otherwise above the xml declaration for
xml processor compatibility. The string "--" (double-hyphen) is not allowed (as it is used to
delimit comments), and entities must not be recognized within comments.
The Syntax for writing Comments in xml is Similar to that of HTML.
<! -- This is a comment -->
11
12. 8. White-Space is preserved in XML
HTML truncates multiple white-space characters to one single white-space. With xml, the
white-space in a document is not truncated.
3.4 XML Viewing
XML files can be viewed in all major browsers.
[Note: Don't expect xml files to be displayed as HTML pages]
<?xml version="1.0"?>
<Address><Name>Harsh</Name>
<Company>Motorola </Company>
</Address>
4. XML Documents
XML documents are similar to HTML documents. They contain information and markup
tags that define the information and are saved as ASCII text. The name of the xml document
has an xml extension “abc.xml”. A data object is an xml document if it is well-formed.
A well-formed xml document may in addition be valid if it meets certain further constraints
or Rules. Well formed xml documents contain text and xml tags which confirm to the xml
syntax.
Valid xml documents must be well formed and are additionally error checked against a
document type definition (DTD). DTD is a set of rules that defines what tags appear in an
xml document. DTDs also describe the structure of a document.
4.1 Well Formed XML
Well formed xml documents simply markup pages with descriptive tags. You don't need to
describe or explain what these tags mean. In other words a well formed xml document does
not need a DTD, but is must confirm to the xml syntax rules. If all tags in a document are
correctly formed and follow xml syntax rules or guidelines, then a document are
considered as well formed. Some of the rules are given below.
1. XML documents must contain at least one element.
Well formed: <title>Software</title>
Not well formed: “Software”
12
13. 2. XML documents must contain a unique opening and closing tag that contains the whole
document, forming what is called a root element.
Well Formed: <title>DEL</title>
Not well formed: <title>DEL
3. Tags in XML are Case Sensitive: The <Author>, <AUTHOR> are not the same. The xml
processing instruction must be all lowercase. But keywords in DTDs must be all
UPPERCASE, such as ELEMENT, ATTLIST, #REQUIRED, #IMPLIED, NMTOKEN, ID, etc.
However, your own elements and attributes may be any case you choose, as long as you are
consistent.
Well formed: <Author>Information</Author>
Not well formed: <Author>Information</AUTHOR>
4. Attribute values must always be quoted (as opposed to HTML).
Well formed: <Name id="100">Asini</Name>
Not well formed: <Name id="1>Asini</Name>
4.2 Valid XML
Valid xml is a more rigid or formal form of xml. All xml documents are well formed
documents. Some xml documents are additionally valid. Valid documents must confirm not
only to the syntax, but also to the DTD.
13
14. In the case of markup languages defined by xml, the DTD provides the grammatical
structure to bring order to the elements of the language. The main difference between valid
and well formed is that valid xml requires a DTD and whereas well formed xml does not.
4.3 XML Parser
An xml parser is a processor that reads an xml document and determines the structure and
properties of the data. If the parser goes beyond the xml rules for well-firmness and
validates the document against an xml DTD, the parser is said to be a "validating" parser.
A validating xml parser also checks the xml syntax and reports errors. Now you have the
possibility to check whether a document is well formed and valid. An xml parser reads xml,
and converts it into an xml DOM object that can be accessed with JavaScript. Most browsers
have a built-in xml parser.
4.4 Prolog
The prolog refers to the information that appears before the start tag of the document or
root element. It includes information that applies to the document as a whole, such as
character encoding, document structure, and style sheets.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="show_book.xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
XML Declaration
The XML declaration typically appears as the first line in an XML document. The XML
declaration is not required, however, if used it must be the first line in the document and no
other content or white space can precede it.
The XML Declaration in the Document Map Consists of the Following:
The Version Number, <?xml version="1.0"?>. This is mandatory. Although the number
might change for future versions of XML, 1.0 is the current version.
The Encoding Declaration, <?xml version="1.0" encoding="UTF-8"?>. This is optional. If
used, the encoding declaration must appear immediately after the version information in
the XML declaration, and must contain a value representing an existing character encoding.
An XML declaration can also contain a Standalone Declaration, for example, <?xml
version="1.0" encoding="UTF-8" standalone="yes"?>. Like the encoding declaration, the
14
15. standalone declaration is optional. If used, the standalone declaration must appear last in
the XML declaration.
Encoding Declaration
The encoding declaration identifies which encoding is used to represent the characters in
the document. Although XML parsers can determine automatically if a document uses the
UTF-8 or UTF-16 Unicode encoding, this declaration should be used in documents that
support other encodings. For example, the following is the encoding declaration for a
document that uses the ISO-8859-1 (Latin 1).
Example: <?xml version="1.0" encoding="ISO-8859-1"?>
Standalone Declaration
The standalone declaration indicates whether a document relies on information from an
external source, such as external document type definition (DTD), for its content. If the
standalone declaration has a value of "yes",
Example :<?xml version="1.0" standalone="yes"?>
The parser will report an error if the document references an external DTD or external
entities. Leaving out the standalone declaration produces the same result as including a
standalone declaration of "no". The XML parser will accept external resources, if there are
any, without reporting an error.
Comments
Comments begin with a <!-- and end with a -->. Comments can appear in the document
prolog, including the document type definition (DTD); after the document; or in the textual
content. Comments cannot appear within attribute values. They cannot appear inside of
tags.
5. Document Type Definition (DTD)
XML DTD or document type definition is expected to define formal grammar of xml based
markup language(s). Basically DTD contains list of elements that can occur in markup, list
of attributes of each element, possible attribute values or value types and content model
that specifies allowed nesting of elements.
15
16. This Information can be used in Several Ways:
One can use DTD to validate document, i.e., to check whether document follows
formal rules defined in DTD, in this way one can detect possible errors (like
misspelled element names, attribute names/values, wrongly nested elements etc.)
that otherwise would be difficult to notice.
One can use DTD just to provide accurate description of markup language. Here
many things depend on markup language itself, as not all xml applications can be
accurately described using xml DTD.
One can use DTD to define character entities, specify default attributes and bind
elements to xml namespaces.
The main purpose of a DTD is to define the legal building blocks of an xml document. You
can store a DTD at the beginning of a document or externally in a separate file.
All the xml documents (and HTML documents) are made up by the following building
blocks:
Elements
Attributes
Entities
PCDATA
CDATA
5.1 DTD Elements
Elements are the main building blocks in the document structure. The elements represent
the logical components of a document and how they are arranged into a hierarchical (tree)
structure.
Syntax: <! ELEMENT Name Content >
In a DTD, elements are declared with an ELEMENT declaration.
Declaring Elements
In a DTD, xml elements are declared with an element declaration with the following syntax.
<! ELEMENT Element-Name Category>
OR
<! ELEMENT Element-Name (Element-Content)>
16
17. Empty Elements
Empty elements are declared with the category keyword EMPTY
<! ELEMENT Element-Name EMPTY>
Elements with Parsed Character Data
Elements with only parsed character data are declared with #PCDATA inside parentheses.
<! ELEMENT Element-Name (#PCDATA)>
Example: <! ELEMENT from (#PCDATA)>
Elements with any contents
Elements declared with the category keyword ANY, can contain any combination of par
sable data:
<! ELEMENT element-name ANY>
Example: <! ELEMENT note ANY>
Elements with children (Sequences)
Elements with one or more children are declared with the name of the children elements
inside parentheses.
<! ELEMENT Element-Name (Child1)>
OR
<! ELEMENT Element-Name (Child1, Child2,...)>
Example: <! ELEMENT note (to, from, heading, body)>
When children are declared in a sequence separated by commas, the children must appear
in the same sequence in the document. In a full declaration, the children must also be
declared, and the children can also have children. The full declaration of the "note" element
is:
<! ELEMENT Note (To, From, Heading, Body)>
<! ELEMENT To (#PCDATA)>
<! ELEMENT from (#PCDATA)>
<! ELEMENT heading (#PCDATA)>
17
18. <! ELEMENT body (#PCDATA)>
Declaring Only one Occurrence of an Element
<! ELEMENT element-name (child-name)>
Example
<! ELEMENT Note (Message)>
The example above declares that the child element "message" must occur once, and only
once inside the "note" element.
Declaring Minimum one Occurrence of an Element
<! ELEMENT Element-Name (Child-Name+)>
Example: <! ELEMENT Note (Message+)>
The + sign in the example above declares that the child element "Message" must occur one
or more times inside the "Note" element.
Declaring Zero or More Occurrences of an Element
<! ELEMENT Element-Name (Child-Name*)>
Example: <! ELEMENT Note (Message*)>
The * sign in the example above declares that the child element "Message" can occur zero
or more times inside the "Note" element.
Declaring Zero or One Occurrences of an Element
<! ELEMENT Element-Name (Child-Name?)>
Example: <! ELEMENT Note (Message?)>
The ? Sign in the example above declares that the child element "message" can occur zero
or one time inside the "Note" element.
Declaring Either/or Content
Example: <! ELEMENT Note (To, From, header, (message | body))>
18
19. The example above declares that the "note" element must contain a "to" element, a "from"
element, a "header" element, and either a "message" or a "body" element.
Declaring Mixed Content
Example: <! ELEMENT Note (#PCDATA|to|from|header|message)*>
The example above declares that the "Note" element can contain zero or more
occurrences of parsed character data, "To", "From", "Header", or "Message" elements.
5.2 Types of Elements
There are Three Primary Types of Elements. They are given below
Simple elements: These are elements that contain text or "parsed character data"
(represented as #PCDATA in your DTD).
Compound elements: These elements contain other elements, and sometimes PCDATA
and other elements.
Standalone elements: They do not contain any PCDATA or other elements.
5.3 Attributes
Attributes allow an author to attach extra information to the elements in a document. One
important difference from the elements is that the attributes cannot contain elements and
there is no "Sub-attribute".
In a DTD, attributes are declared with an ATTLIST declaration.
19
20. Declaring Attributes
An attribute declaration has the following syntax
<! ATTLIST element-name attribute-name attribute-type default-value>
DTD Example: <! ATTLIST Payment type CDATA "Check">
XML Example: <Payment type="Check" />
The Attribute-Type can be one of the Following:
Type Description
CDATA The value is character data
(en1|en2|..) The value must be one from an enumerated list
ID The value is a unique id
IDREF The value is the id of another element
IDREFS The value is a list of other ids
NMTOKEN The value is a valid xml name
NMTOKENS The value is a list of valid xml names
ENTITY The value is an entity
ENTITIES The value is a list of entities
NOTATION The value is a name of a notation
xml The value is a predefined xml value
The Default-Value can be one of the Following:
Value Explanation
value The default value of the attribute
#REQUIRED The attribute is required
#IMPLIED The attribute is not required
#FIXED value The attribute value is fixed
Default Attribute Value
DTD
<! ELEMENT Square EMPTY>
<! ATTLIST Square width CDATA "0">
Valid xml
20
21. <square width="100" />
In the example above, the "square" element is defined to be an empty element with a
"width" attribute of type CDATA. If no width is specified, it has a default value of 0.
#REQUIRED
Syntax :<! ATTLIST Element-Name Attribute-Name Attribute-Type #REQUIRED>
Example
DTD
<! ATTLIST Person Number CDATA #REQUIRED>
Valid xml
<Person Number="5677" />
Invalid xml
<person />
Use the #REQUIRED keyword if you don't have an option for a default value, but still want
to force the attribute to be present.
#IMPLIED
Syntax :<!ATTLIST Element-Name Attribute-Name Attribute-Type #IMPLIED>
Example
DTD
<! ATTLIST Contact fax CDATA #IMPLIED>
Valid xml
<Contact fax="555-667788" />
Use the #IMPLIED keyword if you don't want to force the author to include an attribute,
and you don't have an option for a default value.
21
22. #FIXED
Syntax :<! ATTLIST Element-Name Attribute-Name Attribute-Type #FIXED "value">
Example
DTD
<! ATTLIST Sender Company CDATA #FIXED "Microsoft">
Valid xml
<Sender Company="Microsoft" />
Invalid xml
<Sender Company="Software" />
Use the #FIXED keyword when you want an attribute to have a fixed value without
allowing the author to change it. If an author includes another value, the xml parser will
return an error.
Enumerated Attribute Values
Syntax: <! ATTLIST Element-Name Attribute-Name (En1|En2|..) Default-value>
Example
DTD: <! ATTLIST Payment type (Check | Cash) "cash">
XML Example: <Payment type="Check" />
OR
<Payment type="Cash" />
Use enumerated attribute values when you want the attribute value to be one of a fixed set
of legal values.
5.4 Entities
An entity is a name that represents a special character, additional text or a file. There are
two kinds of entities
22
23. General Entities
Parameter Entities
There are Two Kinds of Entities in XML Documents.
1. General Entities: Used in the context of documents. References to general entities start
with & and end with;
2. Parameter Entities: Used in a document’s DTD. References to parameter entities start
with % and end with;
6. Why we Need a DTD?
XML is a language specification. Based on this specification, individuals and organizations
develop their own markup languages which they then use to communicate information.
Needs to know how the document is structured and
Needs to check if the content is indeed compliant with the structure
The Document Type Definition also known as DTD holds information about the structure of
an xml document.
6.1 Why Use a DTD?
XML provides an application independent way of sharing data.
With a DTD, different groups of people can agree on a common DTD for
interchanging data.
Your application can use a standard DTD to verify that data that you receive from
the outside world is valid.
The DTD can be used to verify your data.
6.2 Internal DTDs
Internal DTD are inserted within the doc type declaration. DTDs inserted this way are used
in the specific document.
Syntax: <! DOCTYPE Root-Element [DTD Specification]>
Examples
1. <?xml version="1.0"?>
<!DOCTYPE Note [
23
24. <!ELEMENT Note (To, From, Heading, Body)>
<!ELEMENT To (#PCDATA)>
<!ELEMENT From (#PCDATA)>
<!ELEMENT Heading (#PCDATA)>
<!ELEMENT Body (#PCDATA)>
]>
<Note><To>Tove</To><From>Jani</From><Heading>Reminder</Heading>
<Body>Don't Forget Me This Weekend</Body></Note>
2. <?xml version="1.0"?>
<!DOCTYPE message [
<!ELEMENT message (to,from,subject,text)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT text (#PCDATA)>
]>
<message><to>Dave</to><from>Susan</from><subject>Reminder</subject>
<text>Don't forget to buy milk on the way home.</text></message>
3. <?xml version="1.0"?>
<!DOCTYPE Tutorials [
<!ELEMENT Tutorials (Tutorial)+>
<!ELEMENT Tutorial (Name, URL)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT URL (#PCDATA)>
]>
<Tutorials><Tutorial><Name>xml Tutorial</Name>
<URL>www.Test.COM </URL></Tutorial>
<Tutorial><Name>HTML Tutorial</Name><URL>www.workhard.com<URL>
</Tutorial></Tutorials>
4. <?xml version="1.0"?>
<!DOCTYPE Address[
<!ELEMENT Address (Street, City, State, Zip)>
<!ELEMENT Street (#PCDATA)>
<!ELEMENT City (#PCDATA)>
<!ELEMENT State (#PCDATA)>
<!ELEMENT Zip (#PCDATA)>
]>
<Address><Street>12 City Road</Street><City>Melbourne</City>
<State>Victoria</State><Zip>8001</Zip></Address>
24
25. 5. <?xml version="1.0"?>
<!DOCTYPE Note[
<!ELEMENT Note (To, From, Heading, Body)>
<!ELEMENT To (#PCDATA)>
<!ELEMENT From (#PCDATA)>
<!ELEMENT Heading (#PCDATA)>
<!ELEMENT Body (#PCDATA)>
]>
<Note><To>Yashaswi</To><From>Jan</From>
<Heading>Head Lines</Heading><Body> Software</Body></Note>
6. <?xml version="1.0"?>
<!DOCTYPE Film [
<!ENTITY COM "Comedy">
<!ENTITY SF "Science Fiction">
<!ELEMENT Film (Title+, Genre, Year)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Genre (#PCDATA)>
<!ELEMENT Year (#PCDATA)>
]>
<Film><Title id="1">Tootsie</Title><Genre>&COM;</Genre>
<Year>1982</Year><Title Id="2">Jurassic Park</Title><Genre>&SF;</Genre>
<Year>1993</Year></Film>
7. <?xml version="1.0"?>
<!DOCTYPE People_List [
<!ELEMENT People_List (Person*)>
<!ELEMENT Person (Name, Birthdate?, Gender?, Social Security Number?)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Birthdate (#PCDATA)>
<!ELEMENT Gender (#PCDATA)>
<!ELEMENT Social Security Number (#PCDATA)>
]>
<People_List><Person><Name>Aditya</Name><Birthdate>27/11/2008</Birthdate>
<Gender>Male</Gender></Person></People_List>
8. <?xml version="1.0"?>
<!DOCTYPE Newspaper [
<!ELEMENT Newspaper (Article+)>
<!ELEMENT Article (Headline, Byline, Lead, Body, Notes)>
<!ELEMENT Headline (#PCDATA)>
<!ELEMENT Byline (#PCDATA)>
<!ELEMENT Lead (#PCDATA)>
25
26. <!ELEMENT Body (#PCDATA)>
<!ELEMENT Notes (#PCDATA)>
<!ATTLIST Article Author CDATA #REQUIRED>
<!ATTLIST Article Editor CDATA #IMPLIED>
<!ATTLIST Article Date CDATA #IMPLIED>
<!ATTLIST Article Edition CDATA #IMPLIED>
<!ENTITY Newspaper "Times of India">
<!ENTITY Publisher "Hasini">
<!ENTITY Copyright "Copyright 2010 SOFTWARE ">
]>
<Newspaper><Article Author="Yashaswi" Editor="Anurag" Date="20/02/2010"
Edition="First"><Headline>Temptation 2010</Headline>
<Byline>New Year</Byline><Lead>No &Publisher; Matter</Lead>
<Body>&Newspaper;</Body><Notes>All The Best The New Year&Copyright;</Notes>
</Article></Newspaper>
9. <?xml version="1.0"?>
<!DOCTYPE Parts [
<!ELEMENT Parts (Title?, Part*)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Part (Item, Manufacturer, Model, Cost)+>
<!ATTLIST Part
type (Computer|Auto|Airplane) #IMPLIED>
<!ELEMENT Item (#PCDATA)>
<!ELEMENT Manufacturer (#PCDATA)>
<!ELEMENT Model (#PCDATA)>
<!ELEMENT Cost (#PCDATA)>
]>
<Parts><Title>Main Heading</Title><Part type="Computer">
<Item></Item><Manufacture></Manufacture>
<Model></Model><Cost></Cost></Part></Parts>
10. <?xml version="1.0"?>
<!DOCTYPE Videos [
<!ELEMENT Videos (Music+) >
<!ELEMENT Music (Title, Artist+)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Artist (#PCDATA) >
]>
<Videos><Music><Title>Video Title1</title>
<Artist>Artist1</artist></Music>
<Music><Title>Video Title2 </Title><Artist>Artist2</Artist>
<Artist>Artist3</Artist></Music></Videos>
26
27. 11. <?xml version="1.0"?>
<!DOCTYPE Document [
<!ELEMENT Document (Customer)*>
<!ELEMENT Customer (Name,Date,Orders)>
<!ELEMENT Name (Last_Name,First_Name)>
<!ELEMENT Last_Name (#PCDATA)>
<!ELEMENT First_Name (#PCDATA)>
<!ELEMENT Date (#PCDATA)>
<!ELEMENT Orders (Item)*>
<!ELEMENT Item (Product,Number,Price)>
<!ELEMENT Product (#PCDATA)>
<!ELEMENT Number (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
]>
<Document><Customer><Name>
<Last_Name>Kaif</Last_Name>
<First_Name>Kat</First_Name>
</Name>
<Date>20/02/2010</Date>
<Orders><Item><Product></Product>
<Number></Number><Price></Price>
</Item></Orders></Customer>
</Document>
12. <?xml version="1.0"?>
<!DOCTYPE book [
<!ELEMENT book (title, chapter+)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT chapter (heading, paragraph*)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT paragraph (#PCDATA)>
<!ATTLIST chapter language CDATA #REQUIRED>
]>
<book><title/><chapter language="markup">
<heading>Introduction to xml</heading><paragraph> Extensible markup language,
used to describe the data</paragraph></chapter>
</book>
27
28. 6.3 External DTD
An external DTD is one that resides in a separate document. It refers saving the DTD as a
separate file with extension .dtd and then referencing the DTD file within the XML
document.
Syntax: <! DOCTYPE Root-Element SYSTEM "File-Name">
Examples
1. <!ELEMENT JewelleryShop (Gold+)>
<!ELEMENT Gold (Chain+, Bangles+, Earings+,Necklace?)>
<!ELEMENT Chain (Longchain?, Shortchain+)>
<!ELEMENT Longchain (#PCDATA)>
<!ELEMENT Shortchain (#PCDATA)>
<!ELEMENT Bangles (#PCDATA)>
<!ELEMENT Earings (#PCDATA)>
<!ELEMENT Necklace (#PCDATA)>
<?xml version="1.0"?>
<!DOCTYPE JewelleryShop SYSTEM "gold.dtd">
<JewelleryShop><Gold><Chain>
<Longchain>500grams</Longchain>
<Shortchain>200grams</Shortchain></Chain>
<Bangles>200grams of 4 bangles</Bangles>
<Earings>250 grams of 2 earings</Earings>
<Necklace/></Gold></JewelleryShop>
2. <!ELEMENT people_list (person*)>
<!ELEMENT person (name, birthdate?, gender?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT birthdate (#PCDATA)>
<!ELEMENT gender (#PCDATA)>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE people_list SYSTEM "example.dtd">
<people_list><person><name>Borne</name><birthdate>04/02/1977</birthdate>
<gender>Male</gender></person></people_list>
3. <!ELEMENT addressbook (contact)>
<!ELEMENT contact (name, address+, city, state, zip, phone, email, web, company)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT address (#PCDATA)>
28
29. <!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
<!ELEMENT phone (voice, fax?)>
<!ELEMENT voice (#PCDATA)>
<!ELEMENT fax (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT web (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<?xml version="1.0"?>
<!DOCTYPE addressbook SYSTEM "AddressBook.dtd" [
<!ENTITY amp "&#38;"><!ENTITY apos "'">]>
<addressbook><contact><name>Frank Rizzo</name>
<address>1212 W 304th Street</address>
<city>New York</city><state>New York</state>
<zip>10011</zip><phone>
<voice>212-555-1212</voice>
<fax>212-555-1213</fax>
</phone><email>frizzo@fruity.com</email>
<web>http://www.fruity.com/rizzo</web>
<company>Frank's Ratchet Service</company></contact>
<contact><name>Sol Rosenberg</name><address>1162 E 412th Street</address>
<city>New York</city><state>New York</state><zip>10011</zip>
<phone><voice>212-555-1818</voice><fax>212-555-1819</fax>
</phone><email>srosenberg@fruity.com</email>
<web>http://www.fruity.com/rosenberg</web>
<company>Rosenberg'sShoes&Glasses</company></contact>
</addressbook>
4. <!ELEMENT movies (movie)+>
<!ELEMENT movie (title, writer+, producer+, director+, actor*, comments?)>
<!ATTLIST movie type (drama | comedy | adventure | sci-fi | mystery | horror | romance
|documentary) "drama" rating (G | PG | PG-13 | R | X) "PG" review (1 | 2 | 3 | 4 | 5) "3" year
CDATA #IMPLIED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT writer (#PCDATA)>
<!ELEMENT producer (#PCDATA)>
<!ELEMENT director (#PCDATA)>
<!ELEMENT actor (#PCDATA)>
<!ELEMENT comments (#PCDATA)>
29
32. 6.4 Problems with DTD
Not itself using XML syntax.
No constraints on character data.
Too simple attribute value models.
No support for Namespaces.
Very limited support for modularity and reuse (the entity mechanism is too low-
level).
No support for schema evolution, extension, or inheritance of declarations (difficult
to write, maintain, and read large DTDs, and to define families of related schemas).
Limited white-space control.
No embedded, structured self-documentation (<!-- comments --> are not enough).
Content and attribute declarations cannot depend on attributes or element context
(many XML languages use that, but their DTDs have to "allow too much").
Too simple ID attributes mechanism.
Only defaults for attributes, not for elements.
Cannot specify "any element" or "any attribute".
Defaults cannot be specified separate from the declarations.
6.5 Design Principles
The XML Schema Language shall be
More expressive than XML DTDs.
Expressed in XML.
Self-describing.
Usable by a wide variety of applications that employ XML.
Straightforwardly usable on the Internet.
Optimized for interoperability.
Simple enough to be implemented with modest design and runtime resources.
Coordinated with relevant W3C specs.
The XML Schema Language Specification shall
Be prepared quickly.
Be precise, concise, human-readable, and illustrated with examples.
32