Weitere ähnliche Inhalte
Ähnlich wie Pal gov.tutorial2.session2.xml dtd's (15)
Mehr von Mustafa Jarrar (20)
Kürzlich hochgeladen (20)
Pal gov.tutorial2.session2.xml dtd's
- 1. أكاديمية الحكومة اإللكترونية الفلسطينية
The Palestinian eGovernment Academy
www.egovacademy.ps
Tutorial II: Data Integration and Open Information Systems
Session 2
XML DTD’s
Dr. Ismail M. Romi
Palestine Polytechnic University
PalGov © 2011 1
- 2. About
This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the
Commission of the European Communities, grant agreement 511159-TEMPUS-1-
2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps
Project Consortium:
Birzeit University, Palestine
University of Trento, Italy
(Coordinator )
Palestine Polytechnic University, Palestine Vrije Universiteit Brussel, Belgium
Palestine Technical University, Palestine
Université de Savoie, France
Ministry of Telecom and IT, Palestine
University of Namur, Belgium
Ministry of Interior, Palestine
TrueTrust, UK
Ministry of Local Government, Palestine
Coordinator:
Dr. Mustafa Jarrar
Birzeit University, P.O.Box 14- Birzeit, Palestine
Telfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011
2
- 3. © Copyright Notes
Everyone is encouraged to use this material, or part of it, but should
properly cite the project (logo and website), and the author of that part.
No part of this tutorial may be reproduced or modified in any form or by
any means, without prior written permission from the project, who have
the full copyrights on the material.
Attribution-NonCommercial-ShareAlike
CC-BY-NC-SA
This license lets others remix, tweak, and build upon your work non-
commercially, as long as they credit you and license their new creations
under the identical terms.
PalGov © 2011 3
- 4. Tutorial Map
Topic h
Intended Learning Objectives
Session 1: XML Basics and Namespaces 3
A: Knowledge and Understanding
Session 2: XML DTD’s 3
2a1: Describe tree and graph data models.
Session 3: XML Schemas 3
2a2: Understand the notation of XML, RDF, RDFS, and OWL.
Session 4: Lab-XML Schemas 3
2a3: Demonstrate knowledge about querying techniques for data
models as SPARQL and XPath. Session 5: RDF and RDFs 3
2a4: Explain the concepts of identity management and Linked data. Session 6: Lab-RDF and RDFs 3
2a5: Demonstrate knowledge about Integration &fusion of Session 7: OWL (Ontology Web Language) 3
heterogeneous data. Session 8: Lab-OWL 3
B: Intellectual Skills Session 9: Lab-RDF Stores -Challenges and Solutions 3
2b1: Represent data using tree and graph data models (XML & Session 10: Lab-SPARQL 3
RDF). Session 11: Lab-Oracle Semantic Technology 3
2b2: Describe data semantics using RDFS and OWL. Session 12_1: The problem of Data Integration 1.5
2b3: Manage and query data represented in RDF, XML, OWL. Session 12_2: Architectural Solutions for the Integration Issues 1.5
2b4: Integrate and fuse heterogeneous data. Session 13_1: Data Schema Integration 1
C: Professional and Practical Skills Session 13_2: GAV and LAV Integration 1
2c1: Using Oracle Semantic Technology and/or Virtuoso to store Session 13_3: Data Integration and Fusion using RDF 1
and query RDF stores. Session 14: Lab-Data Integration and Fusion using RDF 3
D: General and Transferable Skills
2d1: Working with team. Session 15_1: Data Web and Linked Data 1.5
2d2: Presenting and defending ideas. Session 15_2: RDFa 1.5
2d3: Use of creativity and innovation in problem solving.
2d4: Develop communication skills and logical reasoning abilities. Session 16: Lab-RDFa 3
PalGov © 2011 4
- 5. Session ILO’s:
After completing this session students will be able to:
•Manage data represented in XML.
•Represent data using tree and graph data models.
PalGov © 2011 5
- 6. Session2: Document Type Definition-DTD
Session Overview:
</Create DTDs>
< Validate an XML document
against a DTD />
<Use DTDs to create XML documents
from multiple files />
PalGov © 2011 6
- 7. XML Schemas
A quality control tool.
Describes the structure of an XML document.
Ensures that a document fulfills a minimum set of
requirements.
Serve as away to formalize an application to be
publishable object.
XML schema is like a program that tells a processor how
to read the document.
PalGov © 2011 7
- 8. A history of schema Language
1. Document Type Definition – DTD:
– The oldest and most widely supported schema language.
2. The W3C Built XML Schema:
– XML Schemas are themselves XML documents.
3. RELAX NG
4. Schemarton
PalGov © 2011 8
- 9. Validation Steps
A "Valid" XML document is a "Well Formed" XML document, which also
conforms to the rules of a Document Type Definition.
1. The processor reads the rules and declaration in the schema.
2. Build a specific type of parser (validating parser)
3. The validating parser take an XML instance as input.
4. Produces a validation report.
PalGov © 2011 9
- 10. Document Type Definition - DTD
Defines the legal building blocks of an XML document.
Defines the document structure with a list of legal elements and
attributes.
DTD's are extensible - meaning they can be extended to meet the
needs of the current task.
A DTD can be specified within an XML document (internal) or in a
separate file (external).
Many free DTD's exist on the internet today and can be freely
downloaded.
DTD's declare a set of allowed elements.
PalGov © 2011 10
- 11. Document Type Definition - DTD
DTD's define a content model for each element: This
describes what elements or data can go inside an
element, in what order, in what number, and whether they
are required or optional.
DTD's declare a set of allowed attributes for each element
with data types and default values.
DTD's provide mechanisms to manage the model,
providing links to other components.
The Document Type Declaration
Internal DTD declaration:
The DTD declared inside the XML file.
External DTD declaration:
The DTD declared in an external file.
PalGov © 2011 11
- 12. Internal DTD Declaration
<!DOCTYPE root-element [element-declaration ]>
Example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE note
[
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note> PalGov © 2011 12
- 13. External DTD Declarations
You can refer to an external DTD in one of the
following two ways:
– System identifiers
– Public identifiers
PalGov © 2011 13
- 14. External DTD Declarations using System
Identifiers
<!DOCTYPE root-element SYSTEM “system identifier” [...]>
System identifier is a file reference, consists of:
– The keyword SYSTEM
– URI reference pointing to the document‘s location.
• A URI can be a file on your local hard drive, a file on your intranet or
network, or even a file available on the Internet:
Examples:
<!DOCTYPE name SYSTEM ―/user/local/dtds/name.dtd‖ [ ]>
<!DOCTYPE name SYSTEM ―http://wiley.com/hr/name.dtd‖ [ ]>
<!DOCTYPE name SYSTEM ―name.dtd‖>
PalGov © 2011 14
- 15. External DTD Declarations using Public
Identifiers
<!DOCTYPE root-element PUBLIC “public identifier” [...]>
Public identifiers are used to identify an entry in a catalog.
A commonly used format is called Formal Public Identifiers (FPIs).
The syntax for an FPI is defined in the document ISO9070.
FPI Syntax:
“-//Owner//Class Description//Language//Version”
Example:
<!DOCTYPE name PUBLIC ―-//Beginning XML//DTD Name Example//EN‖>
Recommended list of DOCTYPE at:
http://www.w3.org/QA/2002/04/valid-dtd-list.html
PalGov © 2011 15
- 16. Sharing Vocabularies
It is often better to share vocabularies and use DTDs that are widely
accepted.
Sharing DTDs enables you to more easily integrate with other
companies and XML developers who use the shared vocabularies.
Many individuals and industries have developed DTDs.
Examples:
– Chemical Markup Language (CML) DTD
– XHTML, maintains three DTDs (Transitional, Strict, and Frameset).
You can check many places when trying to find a DTD for a specific
industry.
– http://xml.coverpages.org/.
– http://www.dublincore.org.
PalGov © 2011 16
- 17. Anatomy of a DTD
DTDs consist of three basic parts:
1. Element declarations
2. Attribute declarations
3. Entity declarations
Those declarations must follow DOCTYPE
declaration as follow:
<?xml version 1.0, standalone = “yes”>
<!DOCTYPE root-element [
declarations
declarations
]>
PalGov © 2011 17
- 18. Element Declarations
ELEMENT declaration is used to indicate to the parser that
you are about to define an element.
The declaration can appear only within the context of the
DTD.
Syntax
<!ELEMENT element-name (content model)>
Element declarations consist of three basic parts:
– ELEMENT Key word (<!ELEMENT)
– Element name
– Element content model
PalGov © 2011 18
- 19. Element Declarations…Cont
An element‘s content model defines the allowable
content within the element.
An element may contain element children, text, a
combination of children and text, or the element
may be empty.
Four kinds of content models exist:
– Element content
– Mixed content
– Empty content
– Any content
PalGov © 2011 19
- 20. Element Content
Include the allowable elements within
parentheses.
Example:
<!ELEMENT contact (name, location, phone)>
Each element that you specify within this
element‘s content model must also have its own
definition within the DTD.
PalGov © 2011 20
- 21. Element Content…Cont
The processor needs this information so that it knows how
to handle each element when it is encountered.
Name in the content model must appear exactly as it will in
the document.
Ways of specifying the element children:
– Sequences
– Choices
PalGov © 2011 21
- 22. Element Content - Sequences
The elements within these documents must appear in a
distinct order.
If your XML document were missing one of the elements
within the sequence, or if your document contained more
elements, the parser would raise an error.
If all of the specified elements were included within the
XML document but appeared in another order processor
would raise an error.
whitespace doesn‘t matter.
PalGov © 2011 22
- 23. Element Content - Choices
Sometimes you needed to allow one element or
another, but not both.
You would need a choice mechanism of some sort.
Example:
<!ELEMENT location (address | GPS)>
This declaration would allow the <location> element to
contain one <address> or one <GPS> element.
If the <location> element were empty, or if it contained
more than one of these elements, the parser would
raise an error.
PalGov © 2011 23
- 24. Mixed Content
The XML Recommendation specifies that any element with
text in its content is a mixed content model element.
Within mixed content models, text can appear by itself or it
can be interspersed between elements.
The simplest mixed content model—text only:
<!ELEMENT element-name (#PCDATA)>
#PCDATA keyword, (Parsed Character DATA):
– indicates that the character data within the content model
should be parsed by the parser.
– Used for text or character data.
PalGov © 2011 24
- 25. Mixed Content - Cont
Every time you declare elements within a mixed
content model, they must follow four rules:
– They must use the choice mechanism (the vertical bar |
character) to separate elements.
– The #PCDATA keyword must appear first in the list of
elements.
– There must be no inner content models.
– If there are child elements, the * cardinality indicator
must appear at the end of the model.
PalGov © 2011 25
- 26. Mixed Content-Example
DTD:
<!ELEMENT description (#PCDATA | em | strong | br)*>
XML Document:
<description>Jeff is a developer and author for Beginning XML <em>4th
edition</em>.<br/>Jeff <strong>loves</strong> XML!</description>
The text may appear every where, and the em, strong, br can appear
any time.
Note:
em: italic, strong:bold, br: line break
PalGov © 2011 26
- 27. Empty Content
Empty element doesn‘t have content.
<!ELEMENT element-name EMPTY>
The most common used empty element is:
<br/> (line break).
PalGov © 2011 27
- 28. Element with ANY content
<!ELEMENT element-name ANY>
Can contain any combination of parsable data (text, or
elements).
ANY: a keyword indicates that any elements declared
within the DTD can be used within the content of the
element and that they can be used in any order any
number of times.
PalGov © 2011 28
- 29. Cardinality
An element‘s cardinality defines how many times it will
appear within a content model.
Each element within a content model can have an
indicator following the element name that tells the parser
how many times it will appear.
PalGov © 2011 29
- 30. Cardinality…Cont
Indicator Description
None when no cardinality indicator is used, it indicates
that the element must appear once and only
once.
? Indicates that the element may appear either
once or not at all
+ Indicates that the element may appear one or
more times
* Indicates that the element may appear zero or
more times
Example:
<!ELEMENT name (first+, middle?, last), Tel*>
PalGov © 2011 30
- 32. Attribute Types
Type Description
CDATA Indicates that the attribute value is character data
(unparsed).
ID Indicates that the attribute value uniquely identifies the
containing element.
IDREF The value is the id of another element.
IDREFS The value is a list of other ids
ENTITY The value is an entity
ENTITIES The value is a list of entities
NMTOKEN The value is a valid XML name
NMTOKENS The value is a list of valid XML names
Enumerated List The value must be an enumerated value (val1 | val2 | ….)
<!ATTLIST element-name attribute-name attribute-type “attribute-value”
PalGov © 2011 32
- 33. CDATA
• It specifies that the attribute value is character
data (any text).
• Unparsed content
DTD example:
<!ELEMENT square EMPTY>
<!ATTLIST square width CDATA "0">
XML example:
<square width="100">
</square>
<!ATTLIST element-name attribute-name attribute-type “attribute-value”
PalGov © 2011 33
- 34. ID, IDREF, and IDREFS
Attributes of type ID can be used to uniquely identify an
element within an XML document.
Once you have uniquely identified the element, you can
later use an IDREF to refer to that element.
Remember several rules when using ID attributes:
– The value of an ID attribute must be unique within the entire
XML document.
– Only one attribute of type ID may be declared per element.
– The attribute value declaration for an ID attribute must be
#IMPLIED or #REQUIRED.
The value of an IDREF attribute must match the value of some ID within the XML
document.
To refer to a list of elements:
– Use an IDREFS attribute store with a list of whitespace-separated IDREF values that refer to
an ID attributes defined in the document.
PalGov © 2011 34
- 35. ENTITY and ENTITIES
• Attributes can also include references to unparsed entities.
• An unparsed entity is an entity reference to an external file
that the processor cannot parse (external images..).
• Instead of actually including the image inside the
document, you use special attributes to refer to the
external resource.
<!ATTLIST element-name attribute-name attribute-type “attribute-value”
PalGov © 2011 35
- 36. Enumerated Attribute Types
• Used to restrict attribute values
• An enumerated list allows you to specify a list of allowable
values.
• Each value must be a valid XML name
• Example:
DTD:
<!ATTLIST phone kind (Home | Work | Cell | Fax) #IMPLIED>
XML:
<phone kind=―Cell‖ > Valid
<phone kind=―cell‖ > Invalid
<!ATTLIST element-name attribute-name attribute-type “attribute-value”
PalGov © 2011 36
- 37. Attribute Value Declarations
Within each attribute declaration you must specify how
the value will appear in the document.
The XML Recommendation allows you to specify that the
attribute:
Value Description
#DEFAULT The attribute has a default value
#REQUIRED The attribute value must be included in the element
#IMPLIED The attribute does not have to be included
#FIXED The attribute value is fixed
<!ATTLIST element-name attribute-name attribute-type “attribute-value”
PalGov © 2011 37
- 38. Specifying Multiple Attributes
Declaring each attribute:
<!ATTLIST contacts version CDATA #FIXED ―1.0‖>
<!ATTLIST contacts source CDATA #IMPLIED>
Using one declaration:
<!ATTLIST contacts version CDATA #FIXED ―1.0‖
source CDATA #IMPLIED>
PalGov © 2011 38
- 39. Entities
• Place holder in XML
• Types:
– Built-in entities
– Character entities
– General entities
– Parameter entities
PalGov © 2011 39
- 40. Built-in Entities
• & The & character
• < The < character
• > The > character
• ' The ‗ character
• " The ― character
PalGov © 2011 40
- 41. References to Built-in Entities
To use an entity, you must include an entity
reference within the document.
An entity reference refers to an entity that
represents a character, some text, or even an
external file.
A reference to a built-in entity takes the following
form:
&entity-name;
Example:
<CheckAvg> Avg < ―85‖ </CheckAvg>
PalGov © 2011 41
- 42. Character Entities
• Used for characters that are difficult to type.
• Not found on the keyboard.
&#unicode-value;
• Example:
© === character c
• Using Hexadecimal values:
• Example: you must include a lowercase x
© === character c before the value, so that the
XML parser knows how it
should handle the reference.
PalGov © 2011 42
- 43. General Entities ( Internal Entities)
Variables used to define shortcuts to standard text
or special characters.
General entities must be declared within the DTD
before they can be used within the XML
document.
Declaration:
– <!ENTITY entity-name ―value‖>
Example:
DTD – <!ENTITY address ―Palestine, Hebron, POBox 198‖>
XML– <ppu-address> &address; </ppu-address>
PalGov © 2011 43
- 44. External Entities
• Entity whose replacement text exists in another file.
• Useful for:
– Importing content that is shared by many documents.
– Importing content that is changed frequently.
– Breaking the document into multiple physical parts.
• External entities must be declared in order to enable the
parser find the replacement text.
PalGov © 2011 44
- 46. Unparsed Entities
• Holds content that should not be parsed
because it contains something other than
text or xml.
• Useful for:
– Importing graphics, sound files.
– None character data.
• Declaration:
<!ENTITY entity-name SYSTEM ―physical location‖ NDATA file-format>
PalGov © 2011 46
- 48. DTD Limitations
• Differences between DTD syntax and XML syntax.
• Poor support for XML namespaces
• Poor data typing.
• Limited content model descriptions.
PalGov © 2011 48
- 49. Summary
• By using DTDs, you can easily validate your XML
documents against a defined vocabulary of
elements and attributes. This reduces the amount
of code needed within your application.
• An XML parser can be used to check whether the
contents of an XML document are valid according
to the declarations within a DTD.
PalGov © 2011 49
- 50. Refrences
• Hunter, H, Rafter, J., Fawcett, J., Vlist, E., Ayers, D., Duckett, J., Watt,
A., McKinnon,L., (2007), "Beginning XML", 4th Ed.,Wiley Publishing
Inc: Indiana, USA.
• Ray, E., (2003), "Learning XML", 2nd Ed., O‘Rreilly Media Inc.: USA.
• Amiano, M., D'Cruz, C., Ethier, K., Thomas, M., (2006), XML:
Problem - Design – Solution", Wiley Publishing Inc: Indiana, USA.
• http://www.w3.org
• http://www.w3schools.com
• http://www.xml.com
• http://www.xml.org
PalGov © 2011 50