A comparison of a database table to an XML document. There is an overview of basic XML concepts suchs as attribute, element, entity, and tag. Data centric and document centric XML document are covered.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Introduction to XML and Databases
1. Introduction to XML
Kristian Torp
Department of Computer Science
Aalborg University
people.cs.aau.dk/˜torp
torp@cs.aau.dk
November 3, 2015
daisy.aau.dk
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 1 / 42
2. Outline
1 Introduction
2 Anatomy of an XML Document
Document Prolog
Elements
Attributes
Entities
Complete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 2 / 42
3. Learning Goals
Goals
Know the basic differences between a table and an XML document
Know the different representations of an XML document
Know the basic parts of an XML document
Know the goals of designing XML
Know data centric from document centric
Be able to construct your own basic XML documents
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 3 / 42
4. Outline
1 Introduction
2 Anatomy of an XML Document
Document Prolog
Elements
Attributes
Entities
Complete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 4 / 42
5. Text Files, (a Dej´a Vu?)
Example (A Text File)
P4 OOP 3 Object−oriented programming
P2 DB 7 Databases including SQL
Open Questions
What does the columns mean?
When does white space matter?
What are the types of the columns?
Note
No metadata what so ever
Need additional information to parse the text file!
Could be a human looking at the file
Lowest common denominator a CSV file
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 5 / 42
6. A First Look
Example (Table Look)
Id Name Semester Desc
P4 OOP 3 Object-oriented programming
P2 DB 7 Databases including SQL
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 6 / 42
7. A First Look
Example (Table Look)
Id Name Semester Desc
P4 OOP 3 Object-oriented programming
P2 DB 7 Databases including SQL
Example (XML Look)
<?xml version=” 1.0 ” ?>
<!DOCTYPE coursecatalog SYSTEM ” coursecatalog . dtd ”>
<coursecatalog>
<course cid= ’P4 ’>
<name>OOP</name>
<semester>3</ semester>
<desc>Object−oriented programming</ desc>
</ course>
<course cid= ’P2 ’>
<name>DB</name>
<semester>7</ semester>
<desc>Databases including SQL</ desc>
</ course>
</ coursecatalog>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 6 / 42
8. A Second Look
Example (XML Look (again))
<?xml version=” 1.0 ” ?>
<!DOCTYPE coursecatalog SYSTEM ” coursecatalog . dtd ”>
<coursecatalog>
<course cid= ’P4 ’>
<name>OOP</name>
<semester>3</ semester>
<desc>Object−oriented programming</ desc>
</ course>
<course cid= ’P2 ’>
<name>DB</name>
<semester>7</ semester>
<desc>Databases including SQL</ desc>
</ course>
</ coursecatalog>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 7 / 42
9. A Second Look
Example (XML Look (again))
<?xml version=” 1.0 ” ?>
<!DOCTYPE coursecatalog SYSTEM ” coursecatalog . dtd ”>
<coursecatalog>
<course cid= ’P4 ’>
<name>OOP</name>
<semester>3</ semester>
<desc>Object−oriented programming</ desc>
</ course>
<course cid= ’P2 ’>
<name>DB</name>
<semester>7</ semester>
<desc>Databases including SQL</ desc>
</ course>
</ coursecatalog>
Example (Tree Look)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 7 / 42
10. Something Well Known?
Example (XHTML)
<?xml version=” 1.0 ” encoding=”UTF−8” ?>
<!DOCTYPE html PUBLIC ” −//W3C/ / DTD XHTML 1.0 T r a n s i t i o n a l / / EN”
” h t t p : / /www.w3. org /TR/ xhtml1 /DTD/ xhtml1−t r a n s i t i o n a l . dtd ”>
<html xmlns=” h t t p : / /www.w3. org /1999/ xhtml ”>
<head>
< t i t l e >A Simple XHTML Document</ t i t l e >
</ head>
<body>
<p>Hello XHTML!</ p>
</ body>
</ html>
[Source: examples/xhtml_simple.xhtml]
XHTML versus HTML
XHTML is a cleaned-up version of HTML
Looks a lot like HTML
Much stricter requirements to XHTML than to HTML
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 8 / 42
11. Data versus Document Centric
Example (Data Centric)
<rows>
<row>
<name>Hans</name>
<address>Denmark</ address>
</ row>
<row>
<name>Marge</name>
<address>Sweden</ address>
</ row>
</ rows>
Example (Document Centric)
< l y r i c >
Is i t getting < i t >better</ i t >?
Or do you f e e l the same?
W i l l i t make i t easier on you now?
You got someone to <em>blame</em>
You say
One love
One l i f e
</ l y r i c >
Data Centric
Database table like
Content in leafs
Inflexible, but simple
Document Centric
Free format (almost)
Mixed content
Flexible, but complex
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 9 / 42
12. Goals of XML
Goals
XML shall be straight forwardly usable over the Internet
XML shall support a wide variety of applications
XML shall be compatible with SGML
SGML = Standard Generalized Markup Language
Easy to write programs which process XML documents
Keep the number of optional features low (0)
XML documents should be reasonably clear
The XML design should be prepared quickly
The design of XML shall be formal and concise
XML documents shall be easy to create
[Source: www.w3.org/TR/REC-xml/]
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 10 / 42
13. XML Family of Products
Products
Core
The basic XML recommendation
Add-ons
DTD, XML Namespace, XPath, XLink, XPointer, XQuery, etc.
Focus on layout
CSS, XSLT, and XSL-FO
XML Applications
XHTML, DocBook, SVG, XForms, etc.
XML Applications
Web Content Syndication: RSS (www.rssboard.org)
Education: SCORM for teaching material (www.scorm.com)
Document metadata: Dublin Core (www.dublincore.org)
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 11 / 42
14. Summary: Introduction
Main Points
An XML document compared to a text file
More readable (without help)
More complicated to handle (if you are familiar with content)
Higher space usage
Data and metadata embedded in the same document
Markup and content clearly separated
An XML document can be represented in two ways
Textual structure
Tree structure
The goals of the XML design were made in an Internet age!
There is a very large set of XML technologies and applications
Note
XML and databases are not competing technologies
XML is not a replacement of HTML
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 12 / 42
15. Outline
1 Introduction
2 Anatomy of an XML Document
Document Prolog
Elements
Attributes
Entities
Complete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 13 / 42
16. Main Parts of an XML Document
Concepts
Document prolog
Elements
A root
Attributes
Entities
Example (XML Document)
<?xml version=” 1.0 ” encoding=”UTF−8” ?>
<!DOCTYPE coursecatalog
SYSTEM ” coursecatalog . dtd ” [
<!ENTITY prg ” programming ”>
<! ENTITY sql ”SQL”> ]>
<coursecatalog>
<course id=”P4”>
<name>OOP</name>
<semester>3</ semester>
<desc>Object−oriented &prg ;</ desc>
</ course>
<course id=”P2”>
<name>DB</name>
<semester>7</ semester>
<desc>Databases including &sql ;</ desc>
</ course>
</ coursecatalog>
Note
Elements more flexible than attributes
XML supports UTF out-of-the box
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 14 / 42
17. Outline
1 Introduction
2 Anatomy of an XML Document
Document Prolog
Elements
Attributes
Entities
Complete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 15 / 42
18. Document Prolog
Example
<?xml version=” 1.0 ” ?>
<!DOCTYPE coursecatalog SYSTEM ” coursecatalog . dtd ”>
<coursecatalog>
Consists of
Version number and text encoding
Document type definition declaration
Instruction to the XML processor
Root element of the XML document
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 16 / 42
19. Outline
1 Introduction
2 Anatomy of an XML Document
Document Prolog
Elements
Attributes
Entities
Complete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 17 / 42
20. Elements
Example
Start tag <state> or <course>
State tag with attributes <state id=”1” abbr=”GA”>
End tag </state>
Element with content <state>Georgia</state>
Empty element <state/>
Empty element with attributes <state id=”1” abbr=”GA”/>
Case matters <state> <State> <STaTE>
Consists of
Start tag
Some content called character data
End tag
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 18 / 42
21. Elements, cont.
Rules
Start tag must be before end tag
An elements start and end tag must have the same parent
Wrong: <state><city></state></city>
Right: <state><city></city></state>
Content
Simple <outer><one>stuff</one></outer>
Mixed content <outer>More <one>stuff</one></outer>
Tag versus Element
<msg>Hello World</msg>
Element: <msg>Hello World</msg>
Tag: msg
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 19 / 42
22. Outline
1 Introduction
2 Anatomy of an XML Document
Document Prolog
Elements
Attributes
Entities
Complete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 20 / 42
23. Attributes
Example
<state id=”1” abbr=”GA”>
<country id=”DK”date=”2006−02−01”>
Consists of
Name/value pairs
Note
Attributes cannot stand alone
Only start tags can have attributes
There can be any number of attributes
Attribute names must be unique <state id=”GA”id=”GE”>
Attribute values must be in quotes <state id=GA>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 21 / 42
24. Elements versus Attributes
Example (Elements versus Attributes)
<box height=” 20 ”
width=” 20 ”
depth=” 30 ”
u n i t =”cm”>
<content>S t u f f</ content>
</ box>
<box>
<height>
<scalar>20</ scalar>
<u n i t>cm</ u n i t>
</ height>
<width>
<scalar>20</ scalar>
<u n i t>cm</ u n i t>
</ width>
<depth>
<scalar>30</ scalar>
<u n i t>cm</ u n i t>
</ depth>
<content>S t u f f</ content>
</ box>
Note
Attributes can always be converted to elements
Elements can sometimes be converted to attributes
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 22 / 42
25. Elements versus Attributes, cont.
Example (Elements versus Attributes)
<box>
<height>
<scalar>20</ scalar>
<u n i t>cm</ u n i t>
</ height>
<width>
<scalar>20</ scalar>
<u n i t>cm</ u n i t>
</ width>
<depth>
<scalar>30</ scalar>
<u n i t>cm</ u n i t>
</ depth>
<content>S t u f f</ content>
</ box>
<box>
<height u n i t =”cm”>20</ height>
<width u n i t =”cm”>20</ width>
<depth u n i t =”cm”>30</ depth>
<content>S t u f f</ content>
</ box>
Note
Attributes good for identify, units and so on
Elements good if variable number of “stuff”
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 23 / 42
26. Outline
1 Introduction
2 Anatomy of an XML Document
Document Prolog
Elements
Attributes
Entities
Complete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 24 / 42
27. Entities
Example
<! ENTITY company ”XML Lovers Inc.”>
<! ENTITY sql ”SQL”>
Purpose
To make XML document easier to maintain
Recurring text
Are place holders for content (abbreviations)
Types
Parameter entities used in DTD
General entities used in the XML document itself
There are a lot of details about entities!
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 25 / 42
28. Using Entities
Example
<?xml version=” 1.0 ” encoding=”UTF−8” ?>
<!DOCTYPE coursecatalog
SYSTEM ” coursecatalog . dtd ” [
<!ENTITY prg ” programming ”>
<! ENTITY sql ”SQL”> ]>
<coursecatalog>
<course id=”P4”>
<name>OOP</name>
<semester>3</ semester>
<desc>Object−oriented &prg ;</ desc>
</ course>
<course id=”P2”>
<name>DB</name>
<semester>7</ semester>
<desc>Databases including &sql ;</ desc>
</ course>
</ coursecatalog>
[Source: examples/coursecatalog_with_entity.xml]
Note
The entities prg and sql
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 26 / 42
29. More Entity Examples
Entity Types
Predefined character entities amp = & gt = >
Usage: <msg>Hello & and ></msg>
Numbered character entities #145 = æ
Usage: <msg>This is a Danish letter ‘</msg>
External entities definition is in another file
Internal entities
Unparsed entity <!ENTITY logo SYSTEM ”logo.gif”NDATA gif>
Note
There are a lot of details about entities!
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 27 / 42
30. Various Comments on XML Documents
Comments
Are always in UTF
Whitespace is preserved (not the case in HTML)
Carriage return and line feed converted to line feed
Weird when used to MS Windows
This is a comment <!−−a comment in XML −−>
Example (Comments in XML)
<?xml version=” 1.0 ”>
<doc>
<!−− A comment −−>
<row> </ row>
<row> <!−− Another comment −−> </ row>
</ doc>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 28 / 42
31. Outline
1 Introduction
2 Anatomy of an XML Document
Document Prolog
Elements
Attributes
Entities
Complete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 29 / 42
32. First Design
Example (1-n Relationship)
<order −db>
<orders>
<order id=” 117 ”>
<customer−name>Ann</ customer−name>
</ order>
<order id=” 341 ”>
<customer−name>Jim</ customer−name>
</ order>
</ orders>
<orderlines>
<o r d e r l i n e id=” 117 ” line −no=” 1 ”>
<description>pizza</ description>
<quantity>1</ quantity>
<price −each>10.50</ price −each>
</ o r d e r l i n e>
</ orderlines>
</ order −db>
Note
Too much first normal form, does not use tree hierarchy
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 30 / 42
33. Second Design
Example (1-n Relationship)
<order −db>
<orders>
<order id=”O117”>
<customer−name>Ann</ customer−name>
<orderlines>
<o r d e r l i n e line −no=” 1 ”>
<description>pizza</ description>
<quantity>1</ quantity>
<price −each>10.50</ price −each>
</ o r d e r l i n e>
</ orderlines>
</ order>
<order id=”O341”>
<customer−name>Jim</ customer−name>
</ order>
</ orders>
</ order −db>
Note
All information related to single order is stored together
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 31 / 42
34. Summary: Anatomy
Main Points
Elements
One is the root
Attribute
Limited set
Entities
Similar to a macro
There are many details
The prolog
Note
In doubt element or attribute? Pick element
Remember good comments, for humans!
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 32 / 42
35. Outline
1 Introduction
2 Anatomy of an XML Document
Document Prolog
Elements
Attributes
Entities
Complete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 33 / 42
36. Non Well-Formed XML Document
Example (Missing Root)
<course id=”P4”>
<name>OOP</name>
<semester>3</ semester>
<desc>Object−oriented Prog .</ desc>
</ course>
<course id=”P2”>
<name>DB</name>
<semester>7</ semester>
<desc>Databases including SQL</ desc>
</ course>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 34 / 42
37. Non Well-Formed XML Document
Example (Missing Root)
<course id=”P4”>
<name>OOP</name>
<semester>3</ semester>
<desc>Object−oriented Prog .</ desc>
</ course>
<course id=”P2”>
<name>DB</name>
<semester>7</ semester>
<desc>Databases including SQL</ desc>
</ course>
Example (Nesting Wrong)
<person ssn=” 43 ”>
<name>< f i r s t >James</ f i r s t > <l a s t>Bond</name></ l a s t>
<job>agent</ job>
</ person>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 34 / 42
38. Non Well-Formed XML Document
Example (Missing Root)
<course id=”P4”>
<name>OOP</name>
<semester>3</ semester>
<desc>Object−oriented Prog .</ desc>
</ course>
<course id=”P2”>
<name>DB</name>
<semester>7</ semester>
<desc>Databases including SQL</ desc>
</ course>
Example (Nesting Wrong)
<person ssn=” 43 ”>
<name>< f i r s t >James</ f i r s t > <l a s t>Bond</name></ l a s t>
<job>agent</ job>
</ person>
Example (Missing Quotes)
<person ssn=43>
<name> . . . </name>
</ person>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 34 / 42
39. Well-Formed XML and Valid Document
Well-Formed XML Document
All XML elements must have a closing tag
Empty elements are allow
Tags must be properly nested
Start and end tag must have the same parent
The XML document must have a root tag
Attribute values must be quoted
Valid XML Document
Is well-formed
Adheres to the rules of the specified DTD or XML Schema
Similar to a schema for a table, e.g., types and integrity constraints
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 35 / 42
40. Well-Formed and Valid
XML Documents
Well-Formed XML Documents
Valid XML Documents
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 36 / 42
41. Summary: Well-Formed and Valid
Main Points
Well-formed XML document
Structure must adhere to certain rules
Valid XML document
Types and constraints must match a schema (DTD or XML Schema)
Not covered in this lecture, more to come later
Note
Tools check if documents are well-form and valid
The well-formedness is a huge plus over “flat” files
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 37 / 42
42. Outline
1 Introduction
2 Anatomy of an XML Document
Document Prolog
Elements
Attributes
Entities
Complete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 38 / 42
43. Why XML?
Many Good Reasons
Open
Specifications available to all
Platform neutral
Runs on Apple, Linux, Unix, Windows, . . .
Vendor neutral
Competition among vendors
Standard
Changes done in open forums
Note
XML has support for checking structure/types/integrity constraints
DTD and XML Schema
XML has support for querying text documents
XPath and XQuery
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 39 / 42
44. Data vs. Document Centric
Data Centric
Database designer
Does not use document order
Only content at leaf level
Simple
Rigid
Example: Extract RDBMS
Document Centric
Text author
Document order, e.g., for
chapters figure no
Mixed content
Complex
Flexible
Examples: DocBook, XHTML
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 40 / 42
45. XML vs. DBMS
RDBMS
Structured data
Unordered
Flat information
Native format
Very compact format
SQL
Fine-grained modifications
Bad data exchange
Integrity via SQL DDL
Supports data types
Extreme data volumes
XML
Structured and unstructured
Ordered
Hierarchical information
Standard format
Very verbose format
XPath and XQuery
Coarse-grained modifications
Excellent data exchange
Integrity via XML Schema (DTD)
Supports data types
Large data volumes
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 41 / 42
46. Additional Information
Web Sites
W3C Schools free online tutorials www.w3schools.com.
Quite good for getting an overview of the various XML technologies.
Interactive XML Tutorials www.xmlzoo.net.
Covers several parts of XML
The Annotated XML 1.0 Specification
www.xml.com/axml/testaxml.htm.
The XML 1.0 specification with a lot of comments.
W3C XML recommendations www.w3.org.
The place to go if you want all the details.
Altova’s home page (maker of XMLSpy) www.altova.com. If you are
looking for a good XML tool.
IBM developerWorks overview “New to XML”
www.ibm.com/developerworks/xml/newto/
Many links to additional information.
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 42 / 42