Introduction to the usage of DTDs in connection with XML documents. Elements and attributes are introduced in details. Use of ID, IDREF, and IDREFS for uniqueness and referring to elements are illustrated using a number of examples.
1. Introduction to DTD
Kristian Torp
Department of Computer Science
Aalborg University
people.cs.aau.dk/Ëtorp
torp@cs.aau.dk
November 3, 2015
daisy.aau.dk
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 1 / 37
2. Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 2 / 37
3. Learning Outcomes
Learning Outcomes
Be able to read and understand a DTD
Be able to construct a DTD for a set of existing XML documents
Be able to validate an XML document against a DTD
Know the limitations of a DTD
Database Focus
All XML technologies are presented from a database perspective!
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 3 / 37
4. Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 4 / 37
5. Example: Course Catalog XML Document
User Requirements
Make a DTD for the course catalog
Use the DTD to validate our course catalog XML document
Example (Current Courses)
<?xml version=â 1.0 â ?>
<coursecatalog>
<course cid= âP4 â>
<name>OOP</name>
<semester>3</ semester>
<desc>Objectâoriented programming</ desc>
</ course>
<course cid= âP2 â>
<name>DB</name>
<semester>7</ semester>
<desc>Databases including SQL</ desc>
</ course>
</ coursecatalog>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 5 / 37
6. Example: Course Catalog DTD
Example (DTD for Course Catalog)
<?xml version=â 1.0 â encoding=âUTFâ8â ?>
<!ELEMENT coursecatalog ( course )â>
<!ELEMENT course (name, semester , desc ) >
<!ELEMENT name (#PCDATA)>
<!ELEMENT semester (#PCDATA)>
<!ELEMENT desc (#PCDATA)>
<! ATTLIST course cid ID #REQUIRED>
Informal Description
A course catalog consists of zero or more of courses
A course consists of a name, a semester, and a description
It is identiïŹed by an ID that is required
A (course) name is a string (leaf in XML document)
A semester is a string (leaf in XML document)
A description is a string (leaf in XML document)
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 6 / 37
7. Overview
Purpose
DeïŹne the document structure
Legal elements and attributes
Serves the same purpose as a create table statement in SQL
Structure and type of data
Integrity constraints!
Left over from SGML
Is not written in XML
If this is a requirement then use XML Schema
Still very widely used
Because much simpler than XML Schema
Note
Many simple errors can be found using a DTD
A necessity if receiving XML documents from external sources
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 7 / 37
8. Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 8 / 37
9. Simplest Entity
Example (Element Declaration)
<!ELEMENT name (#PCDATA)>
Example (Allowed Values)
<name>Hello Element</name>
<name/>
<name><![CDATA[ select â from emp where sal > 10]]></name>
Example (Illegal Values)
<name>> </name>
<name>></name>
<name><it>Hello</it></name>
Unknown element <it>, must be deïŹned in DTD
Note
Root, internal-node, and leafs in XML tree representation
Terminal and non-terminal in grammar terminology
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 9 / 37
10. Sequences of Child Elements
Example (Element Declaration)
<!ELEMENT course (name, semester, desc)>
Example (Allowed XML Fragment, Why?)
<course>
<name>OOP</name>
<semester>7</ semester>
<desc>I n t r o d u c t i o n to OOP</ desc>
</ course>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 10 / 37
11. Sequences of Child Elements
Example (Element Declaration)
<!ELEMENT course (name, semester, desc)>
Example (Allowed XML Fragment, Why?)
<course>
<name>OOP</name>
<semester>7</ semester>
<desc>I n t r o d u c t i o n to OOP</ desc>
</ course>
Example (Disallowed XML Fragment, Why?)
<course>
<semester>7</ semester>
<name>OOP</name>
</ course>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 10 / 37
12. Sequences of Child Elements
Example (Element Declaration)
<!ELEMENT course (name, semester, desc)>
Example (Allowed XML Fragment, Why?)
<course>
<name>OOP</name>
<semester>7</ semester>
<desc>I n t r o d u c t i o n to OOP</ desc>
</ course>
Example (Disallowed XML Fragment, Why?)
<course>
<semester>7</ semester>
<name>OOP</name>
</ course>
Example (Is this allowed?)
<course></ course>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 10 / 37
13. Choice Among Child Elements
Example (Element Declaration)
<!ELEMENT circle (x, y, (radius | diameter))>
Example (Allowed XML Fragment)
< c i r c l e>
<x>5</ x>
<y>9</ y>
<diameter>7</ diameter>
</ c i r c l e>
Example (Illegal XML Fragment)
< c i r c l e>
<x>4</ x>
<y>8</ y>
<radius>3.5</ radius>
<diameter>7</ diameter>
</ c i r c l e>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 11 / 37
14. Symbols in a DTD
Symbols
Symbol Example
â <!ELEMENT coursecatalog (course)â>
+ <!ELEMENT coursecatalog (course)+>
? <!ELEMENT coursecatalog (course)?>
, <!ELEMENT course (name, semester, desc) >
| <!ELEMENT course (name | semester | desc) >
Note
Symbols are mostly taken from regular expressions
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 12 / 37
15. Mixed Content
Example (Data Centric)
<!ELEMENT coor ( x , y )>
Example (Allowed Fragment)
<coor>
<x>5</ x>
<y>9</ y>
</ coor>
Example (Mixed Content)
<!ELEMENT coor ( x , y , #PCDATA)â>
Example (Allowed Fragment)
<coor>
This i s the coordinate
(<x>5</ x> , <y>9</ y>) where
the treasure i s hidden !
</ coor>
Note
Data centric very table like
Mixed content also called narrative document
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 13 / 37
16. Element Declarations using ANY
Example (Any)
<!ELEMENT coor (ANY)>
<!ELEMENT x (#PCDATA)
<!ELEMENT y (#PCDATA)
Example (Allowed Fragments)
<coor/>
<coor>Hello World</coor>
<coor>Hello <x>1</x><x/>World<y>3</y><y>4</y></coor>
<coor>Hello <x>1</x><y>2</y>World<y>3</y><x>4</x></coor>
Example (Illegal Fragments)
<coor><z>1</z></coor>
<coor><x>1</x><y>1<y/><z>1</z></coor>
Note
ANY handy for narrative documents, e.g., HTML
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 14 / 37
17. Element Declarations using EMPTY
Example (Empty)
<!ELEMENT coor EMPTY>
Example (Allowed?)
<coor></coor>
<coor/>
<coor>Hello</coor>
<coor><x>Hello</x></coor>
<coor> </coor>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 15 / 37
18. Summary: Elements
Repetition
Symbol Explanation Example
? zero-or-one <!ELEMENT person (address?)>
* zero-or-more <!ELEMENT person (addressâ)>
+ one-or-more <!ELEMENT person (address+)>
once <!ELEMENT person (address)>
Sequence or Choice
Symbol Explanation Example
, Sequence <!ELEMENT coor (x, y)>
| Choice <!ELEMENT coor (x | y)>
Data Type
Symbol Explanation Example
#PCDATA String <!ELEMENT name (#PCDATA)>
ANY What ever <!ELEMENT coor (ANY)>
EMPTY Empty <!ELEMENT room EMPTY>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 16 / 37
19. Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 17 / 37
20. Attribute Declarations
Example (Circles)
<?xml version= â 1.0 â encoding= â utf â8 â?>
<!ELEMENT drawing ( c i r c l e )â>
<!ELEMENT c i r c l e ( x , y , ( radius | diameter ) )>
<! ATTLIST c i r c l e cid ID #REQUIRED
name CDATA #IMPLIED >
<!ELEMENT x (#PCDATA)>
<!ELEMENT y (#PCDATA)>
<!ELEMENT radius (#PCDATA)>
<! ATTLIST radius u n i t (mm|cm |m) âmâ> <!ââ Enum with default ââ>
<!ELEMENT diameter (#PCDATA)>
<! ATTLIST diameter u n i t (mm|cm |m) #REQUIRED> <!ââ Enum no default ââ>
Note
Mandatory and optional attributes
One or more attributes
Enumeration with defaults
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 18 / 37
21. Example Document
Example (Circles)
<?xml version=â 1.0 â encoding= âUTFâ8 â?>
<!DOCTYPE drawing SYSTEM â c i r c l e a t t . dtd â>
<drawing>
< c i r c l e cid= âC1 â name= â f o r e s t â>
<x>8</ x> <y>8</ y>
<radius>4</ radius> <!ââ default u n i tââ>
</ c i r c l e>
< c i r c l e cid= âC2 â> <!ââ name not required ââ>
<x>5</ x> <y>5</ y>
<radius u n i t =âcmâ>4</ radius> <!ââ e x p l i c i t u n i tââ>
</ c i r c l e>
</ drawing>
Note
Unique value is not an integer
Used that attribute name is optional in element circle
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 19 / 37
22. Uniqueness, Examples
Example (Circle/Points with IDs)
<?xml version= â 1.0 â encoding= â utf â8 â?>
<!ELEMENT drawing ( point | c i r c l e )â>
<!ELEMENT point ( x , y )>
<!ELEMENT c i r c l e ( x , y , ( radius | diameter ) )>
<! ATTLIST c i r c l e did ID #REQUIRED>
<! ATTLIST point did ID #REQUIRED>
Example (Circles)
<drawing>
< c i r c l e did= âC1 â>
<x>8</ x> <y>8</ y>
<radius>4</ radius>
</ c i r c l e>
<point did= âP2 â>
<x>5</ x> <y>5</ y>
</ point>
</ drawing>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 20 / 37
23. Uniqueness, Errors
Example (Find the error 1!)
<drawing>
<c i r c l e cid= âC1 â name= â f o r e s t â>
<x>8</ x> <y>8</ y>
<radius>5</ radius>
</ c i r c l e>
<c i r c l e cid= âC1 â>
<x>5</ x> <y>5</ y>
<radius u n i t =âcmâ>8</ radius>
</ c i r c l e>
</ drawing>
Example (Find the error 2!)
<drawing>
<c i r c l e did= âC11 â>
<x>8</ x> <y>8</ y>
<radius>4</ radius>
</ c i r c l e>
<point did= âC11 â>
<x>5</ x> <y>5</ y>
</ point>
</ drawing>
Example (Find the error 3!)
<drawing>
<c i r c l e cid= âC1 â name= â f o r e s t â>
<x>8</ x> <y>8</ y>
<radius>5</ radius>
</ c i r c l e>
<c i r c l e cid= â2C â>
<x>5</ x> <y>5</ y>
<radius u n i t =âcmâ>8</ radius>
</ c i r c l e>
</ drawing>
Example (Find the error 4!)
<drawing>
<c i r c l e did= âC11 â>
<x>8</ x> <y>8</ y>
<radius>4</ radius>
</ c i r c l e>
<point did= â C1111111111111111111111
<x>5</ x> <y>5</ y>
</ point>
</ drawing>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 21 / 37
24. Uniqueness
Limitations
Only attribute values unique not element values
Cannot be a integer, e.g., <circle did=â1â> not allowed
Only unique within a single document
Uniqueness not guaranteed across multiple documents
Only a single attribute uniqueness (no composite keys)
Combination of x and y coordinates cannot be declared unique
Note
Uniqueness quite restrictive compared to DBMS technology
XML Schema lifts most limitations on uniqueness
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 22 / 37
25. Empty Elements with Attributes
Example (Empty)
<!ELEMENT coor EMPTY>
<! ATTLIST coor cid ID #REQUIRED
x CDATA #REQUIRED
y CDATA #REQUIRED
z CDATA #IMPLIED>
Example (Allowed?)
<coor/>
<coor cid=âc1â x=â1â y=â1â z=â1â/>
<coor cid=âc2â x=â2â y=â2â></coor>
<coor cid=âc3â x=â3â y=â3â> </coor>
<coor cid=âc4â z=â4â y=â4â x=â4â/>
<coor z=â5â y=â5â x=â5â/>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 23 / 37
26. Is something Wrong?
Example (Case 1)
<!ELEMENT coor EMPTY>
<! ATTLIST coor
cid ID
x CDATA #REQUIRED>
Example (Case 2!)
<!ELEMENT coor EMPTY>
<! ATTLIST coor
cid ID #IMPLIED
x CDATA #REQUIRED>
Example (Case 3!)
<!ELEMENT coor EMPTY>
<! ATTLIST coor
x CDATA #REQUIRED
cid ID #REQUIRED>
Example (Case 4)
<!ELEMENT coor EMPTY>
<! ATTLIST coor
cid ID â 42 â
x CDATA #REQUIRED>
Example (Case 5)
<!ELEMENT coor (EMPTY)>
<! ATTLIST coor
cid ID #REQUIRED
x CDATA #REQUIRED>
Example (Case 6)
<!ELEMENT coor EMPTY>
<! ATTLIST coor
cid ID #REQUIRED
x ID #REQUIRED>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 24 / 37
27. Summary: Attributes
General Syntax
<!ATTLIST elementâname attributeâname type [DefaultValue]>
Often used types
Type Example
CDATA <!ATTLIST course id CDATA>
ID <!ATTLIST course id ID #REQUIRED>
Enumeration <!ATTLIST course id (OOP | DB)>
Defaults
Type Example
#REQUIRED <!ATTLIST course id ID #REQUIRED>
#IMPLIED <!ATTLIST course id CDATA #IMPLIED>
#FIXED <!ATTLIST course id CDATA #FIXED â1â>
A value <!ATTLIST course id (OOP | DB) âDBâ>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 25 / 37
28. Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 26 / 37
29. A Buggy DTD
Example (DTD With Five Errors)
<?xml version= â 1.0 â>
<!ELEMENT users user+>
<!ELEMENT user ( firstname , lastname>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname>
Two-Minutes Exercise
With your neighbor identify the errors in the DTD
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 27 / 37
30. A Buggy DTD
Example (DTD With Five Errors)
<?xml version= â 1.0 â>
<!ELEMENT users user+>
<!ELEMENT user ( firstname , lastname>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname>
Two-Minutes Exercise
With your neighbor identify the errors in the DTD
Example (The Corrected DTD)
<?xml version= â 1.0 â encoding= â utf â8 â?>
<!ELEMENT users ( user )+>
<!ELEMENT user ( firstname , lastname )>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 27 / 37
31. Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 28 / 37
32. Uncertain About Content
Example (DTD for Courses with Flexible Description)
<?xml version=â 1.0 â encoding=âUTFâ8â ?>
<!ELEMENT courses ( course )â>
<!ELEMENT course (name, desc )>
<!ELEMENT name (#PCDATA)>
<!ELEMENT desc ANY>
Example (DTD for Courses with Flexible Description)
<?xml version=â 1.0 â encoding=âUTFâ8â ?>
<!DOCTYPE courses SYSTEM â course . dtd â>
<courses>
<course>
<name>OOP</name>
<desc>
<name>objectâoriented</name>
<desc>programming</ desc>.
</ desc>
</ course>
</ courses>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 29 / 37
33. A University Example, Setup
Example (DTD)
<?xml version=â 1.0 â encoding=âUTFâ8â ?>
<!ELEMENT u n i v e r s i t y ( courses ,
students ,
follows )>
<!ELEMENT courses ( course )+>
<!ELEMENT course (name)>
<! ATTLIST course cid ID #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT students ( student )+>
<!ELEMENT student ( fname )>
<! ATTLIST student sid ID #REQUIRED>
<!ELEMENT fname (#PCDATA)>
<!ELEMENT follows ( takes )+>
<!ELEMENT takes EMPTY>
<! ATTLIST takes sid IDREF #REQUIRED>
<! ATTLIST takes cids IDREFS #REQUIRED>
Example (XML Fragment)
<u n i v e r s i t y>
<courses>
<course cid= âC111 â>
<name>DB</name>
</ course>
<course cid= âC222 â>
<name>OOP</name>
</ course>
</ courses>
<students>
<student sid= âS11 â>
<fname>Ann</ fname>
</ student>
<student sid= âS22 â>
<fname>Bart</ fname>
</ student>
<student sid= âS33 â>
<fname>Curt</ fname>
</ student>
</ students>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 30 / 37
34. A University Example, Referencing
Example (DTD)
<?xml version=â 1.0 â encoding=âUTFâ8â ?>
<!ELEMENT u n i v e r s i t y ( courses ,
students ,
follows )>
<!ELEMENT courses ( course )+>
<!ELEMENT course (name)>
<! ATTLIST course cid ID #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT students ( student )+>
<!ELEMENT student ( fname )>
<! ATTLIST student sid ID #REQUIRED>
<!ELEMENT fname (#PCDATA)>
<!ELEMENT follows ( takes )+>
<!ELEMENT takes EMPTY>
<! ATTLIST takes sid IDREF #REQUIRED>
<! ATTLIST takes cids IDREFS #REQUIRED>
Example (XML Fragment)
<follows>
<takes sid= âS11 â cids= âC111 C222 â />
<takes sid= âS22 â cids= âC222 â />
<takes sid= âS33 â cids= âC111 â />
</ follows>
Note
ID cannot start with digit
sid is a single ID
cids is a set of IDs
No overlap between IDs
Separator is space (not ,)
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 31 / 37
35. Quiz: IDREFS
Example (University XML)
<u n i v e r s i t y>
<courses>
<course cid= âC111 â>
<name>DB</name>
</ course>
<course cid= âC222 â>
<name>OOP</name>
</ course>
</ courses>
<students>
<student sid= âS11 â>
<fname>Ann</ fname>
</ student>
<student sid= âS22 â>
<fname>Bart</ fname>
</ student>
<student sid= âS33 â>
<fname>Curt</ fname>
</ student>
</ students>
Example (Allowed One?)
<follows>
<takes sid= âS11 â cids= âC111 C222 C111 â />
</ follows>
Example (Allowed Two?)
<follows>
<takes sid= âS11 â cids= âC333 C222 C111 â />
</ follows>
Example (Allowed Three?)
<follows>
<takes sid= âS11 â cids= âC111 â />
<takes sid= âS11 â cids= âC222 â />
</ follows>
Example (Allowed Four?)
<follows>
<takes sid= âS11 â cids= â â />
<takes sid= âS22 â cids= â c111 â />
</ follows>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 32 / 37
36. Using an Internal DTD
Example (DTD for Courses with Flexible Description)
<?xml version=â 1.0 â standalone=â yes â ?>
<!DOCTYPE courses [
<!ELEMENT courses ( course )â>
<!ELEMENT course (name, desc )>
<!ELEMENT name (#PCDATA)>
<!ELEMENT desc ANY>
]>
<courses>
<course>
<name>OOP</name>
<desc>
<name>objectâoriented</name>
<desc>programming</ desc>.
</ desc>
</ course>
</ courses>
Note
BeneïŹt: All information in one ïŹle
Drawback: DTD is not reused (maintenance nightmare)
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 33 / 37
37. Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 34 / 37
38. Summary: DTD
Limitations
Only very basic data types supported
Only single-column keys (for uniqueness)
Uniqueness only guaranteed within a single document
Very limited support for integrity constraints
Note
DTD is widely used
DTD is being replaced by XML Schema when documents are complex
There are problems using XML Namespace and DTD
Advise
Never build a new DTD if an existing (standard) can be used!
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 35 / 37
39. RDBMS vs. XML
RDBMS vs. XML
Query Schema
SQL DML DDL
XML XQuery DTD/XML Schema
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 36 / 37
40. Summary: DTD versus XML Schema
DTD
Own format
Compact notation
Simple data types
From SGML
Support entities
No support namespaces
XML Schema
XML format
Very verbose
Advanced data types
Invented for XML
Does not support entities
Support namespaces
Advice
Start with a DTD
Move on to XML Schema for later iterations
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 37 / 37