Xml

 It is Data Description Language.
 It is a flexible way to create common information
formats.
 It also provide to share both the format and the data
on the World Wide Web, intranets, and elsewhere.
 Computer users might agree on a standard or
common way to describe a piece of information
using a platform independent format with XML.
 Such a standard way of describing data would
enable a user, to exchange the data with description
between any type of platforms.

 XML → Extensible Markup Language
 Just a text file, with tags, attributes, free text...
looks much like HTML
 Used to create special-purpose markup languages
 Facilitates sharing of structured text and
information across the Internet.
 XML Structure facilitates parsing of data.
 XML tags are not pre-defined, you must define
your own tags.
 Two Apps still have to agree on what the meaning
of the "descriptive tags“.

HTML

MML
SGML

CML
XML

XML for Developers -
Version 1a 4

HTML XML
<H1>Cars for Sale</H1> <for_sale>
<h3>Audi 80</h3> <heading>Cars for Sale</heading>
1800 cc<BR> <make>Audi 80</make>
Blue<BR> <engine>1800 cc</engine>
Manual<BR> <color>Blue</color>
1988<BR> <transmission>Manual</transmission>
$1250 <year>1988</year>
<P> <price>$1250</price>
<H3>Toyota Corolla</h3> <make>Toyota Corolla</make>
1250 cc<BR> <engine>1250</engine>
Red<BR> <color>Red</color>
Automatic<BR> <transmission>Automatic</transmission>
1984<BR> <year>1984</year>
Red<BR> <price>$940</price>
$940<BR> </for_sale>

Example1.html Example1.xml
XML for Developers -
Version 1a 5

HTML XML in IE5 browser
Cars for Sale
Audi 80
1800 cc
Blue
Manual
1988
$1250

Toyota Corolla
1250 cc
Red
Automatic
1984
Red
$940

6

XML HTML
 It is free & extensible  Derived language from
language. SGML.
 Tags are user-defined.
 Tags are pre-defined
 It is about describing
 It is about displaying
information
 Extensible set of tags
information.
 Content orientated  Fixed set of tags

 Standard Data  Presentation oriented
infrastructure  No data validation
 Allows multiple output capabilities
forms  Single presentation

 Many number of extensible languages are defined from XML.
◦ Wireless Markup Language (WML).
◦ Chemical Markup Language (CML)
◦ Bio-informatic Sequence Markup Language (BSML).
◦ Mathematical Markup Language (MathML).
◦ Open Office Markup Language ( OOML )
 Directly usable over the internet, means any platform and any
protocol can understand XML.
 Plain text documents and easier to write & transport
Programs.
 Can be used with SGML without any conflict along with other
web technologies.
 Clearly understandable by human.
 Accurate validation can be possible with the help of DTD and
schema.
 We can define structure of data and along with description.
 Semi structured Data Bases are also defined using XML.

 Web publishing : XML allows the customer to customize
web pages. With XML, you store the data once and then
render that content for different viewers.
 Web searching and automating Web tasks: XML defines
the type of information contained in a document, making
it easier to return useful results when searching the Web.
 General applications: XML provides a standard method to
use, store, transmit, and display data for all kinds of
applications and devices.
 e-business applications: XML allows to make electronic
data interchange (EDI) for both business-to-business
transactions, and business-to-consumer transactions.
 Metadata applications: XML makes it easier to express
metadata in a portable, reusable format.
 Pervasive computing: XML provides portable and
structured information types for display on pervasive
(wireless) computing devices such as personal digital
assistants (PDAs), cellular phones.

 DTD (Document Type Definition) and XML Schemas are
used to define legal XML tags and their attributes for
particular purposes

 CSS (Cascading Style Sheets) describe how to display
HTML or XML in a browser

 XSLT (eXtensible Stylesheet Language Transformations)
and XPath are used to translate from one form of XML
to another

 DOM (Document Object Model), SAX (Simple API for
XML, and JAXP (Java API for XML Processing) are all APIs
for XML parsing

10

 XML documents use a self-describing and simple
syntax.
<?xml version=“1.0” encoding=“ISO-8859-1”?>
<note>
<to> Ramesh </to>
<from> Kiran</from>
<heading> Reminder </heading>
<body> Don’t forget me this weekend </body>
</note>

 XML Syntax consists of
◦ XML Declaration
◦ XML Elements
◦ XML Attributes
 The first line of an XML document should always
consist of an XML declaration defining the version of
XML.
 An XML document may start with one or more
processing instructions (PIs) or directives:
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="ss.css"?>
 Following the directives, there must be exactly one
root element containing all the rest of the XML:
<weatherReport>
...
</weatherReport>

 XML elements have relationships
 Elements can have different content types
 Element Naming Rules:
1) Names contain letters, numbers & other characters.
2) Names must not start with a number or
punctuation marks.
3) Names must not start with the letters xml.
4) Names cannot contain spaces.
 Names (as used for tags and attributes) must begin
with a letter or underscore, and can consist of:
◦ Letters, both Roman (English) and foreign
◦ Digits, both Roman and foreign
. (dot)
- (hyphen)
_ (underscore)

<person gender="male">
<firstname>Ravi</firstname>
<lastname>Kumar</lastname>
</person>

 Attributes and elements are somewhat interchangeable
 Example using just elements:
<name>
<first>David</first>
<last>Matuszek</last>
</name>
 Example using attributes:
<name first="David" last="Matuszek"></name>
 You will find that elements are easier to use in your
programs--this is a good reason to prefer them
 Attributes often contain metadata, such as unique IDs
 Generally speaking, browsers display only elements
(values enclosed by tags), not tags and attributes

15

 While elements can contain multiple values
attributes cannot
 Attributes are not expandable
 Elements can describe structure but not
Attributes
 Attributes are more difficult to manipulate by
program code than elements
 Attribute values are difficult to validate against
a DTD

<novel>
<foreword>
<paragraph> This is the great American novel.</paragraph>
</foreword>
<chapter number="1">
<paragraph>It was a dark and stormy night.</paragraph>
<paragraph>Suddenly, a shot rang out!</paragraph>
</chapter>
</novel>
novel

foreword chapter
number="1"

paragraph paragraph paragraph

This is the great It was a dark Suddenly, a shot
American novel. and stormy night. rang out!
17

 Every element must have both a start tag and an end
tag, e.g. <name> ... </name>
◦ But empty elements can be abbreviated: <break />.
◦ XML tags are case sensitive
◦ XML tags may not begin with the letters xml, in any
combination of cases
 Elements must be properly nested,
e.g. not <b><i>bold and italic</b></i>
 Every XML document must have one and only one root
element.
 The values of attributes must be enclosed in single or
double quotes, e.g. <time unit="days">
 Character data cannot contain < or &

18

 Start with <?xml version="1.0"?>
 XML is case sensitive
 You must have exactly one root element that
encloses all the rest of the XML
 Every element must have a closing tag
 Elements must be properly nested
 Attribute values must be enclosed in double
or single quotation marks
 There are only five pre-declared entities

19

 Five special characters must be written as
entities:
& for & (almost always necessary)
< for < (almost always necessary)
> for > (not usually necessary)
" for " (necessary inside double quotes)
' for ' (necessary inside single quotes)
 These entities can be used even in places
where they are not absolutely required
 These are the only predefined entities in XML

20

 The XML declaration looks like this:
<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>
◦ The XML declaration is not required by browsers, but is
required by most XML processors (so include it!)
◦ If present, the XML declaration must be first--not even
whitespace should precede it
◦ Note that the brackets are <? and ?>
◦ version="1.0" is required (this is the only version so far)
◦ encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or
something else, or it can be omitted
◦ standalone tells whether there is a separate DTD

21

 PIs (Processing Instructions) may occur anywhere in
the XML document (but usually first)
 A PI is a command to the program processing the
XML document to handle it in a certain way
 XML documents are typically processed by more
than one program
 Programs that do not recognize a given PI should
just ignore it
 General format of a PI: <?target instructions?>
 Example: <?xml-stylesheet type="text/css"
href="mySheet.css"?>

22

 
 Comments can be put anywhere in an XML document
 Comments are useful for:
◦ Explaining the structure of an XML document
◦ Commenting out parts of the XML during development and
testing
 Comments are not elements and do not have an end tag
 The blanks after  are optional
 The character sequence -- cannot occur in the comment
 The closing bracket must be -->
 Comments are not displayed by browsers, but can be
seen by anyone who looks at the source code

23

 By default, all text inside an XML document is
parsed.
 You can force text to be treated as unparsed
character data by enclosing it in <![CDATA[ ... ]]>
 Any characters, even & and <, can occur inside a
CDATA
 Whitespace inside a CDATA is (usually) preserved
 The only real restriction is that the character
sequence ]]> cannot occur inside a CDATA
 CDATA is useful when your text has a lot of illegal
characters (for example, if your XML document
contains some HTML text)

24

<note>
<to> Ramesh </to>
<from> Kiran</from>
<body> Don’t forget me this weekend </body>
</note>

<remainder>
<to> Ramesh </to>
<from> Kiran</from>
<message> Don’t forget me this weekend </message>
</ remainder >

 Basically DTD is used to specify the set of rules for
structuring data in xml file.
 It is used to define the building blocks of XML document.
 Using DTD we can specify the various elements types,
attributes and their relationship.
 DTD constraints structure of XML data
◦ What elements can occur
◦ What attributes can/must an element have.
◦ What subelements can/must occur inside each element, and how
many times.
 DTD syntax
◦ <!ELEMENT element (subelements-specification) >
◦ <!ATTLIST element (attributes) >

 A DTD adds syntactical requirements in
addition to the well-formed requirement
 It helps in eliminating errors when creating or
editing XML documents
 It clarifies the intended semantics
 It simplifies the processing of XML
documents

27

 <?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Raj</to>
<from>AEC</from>
<heading>Invitation</heading>
<body>Welcome to Aditya!</body>
</note>

 !DOCTYPE note defines that the root element of this
document is note
 !ELEMENT note defines that the note element contains
four child elements: "to,from,heading,body"
 !ELEMENT to defines the to element to be of type
"#PCDATA"
 !ELEMENT from defines the from element to be of type
"#PCDATA"
 !ELEMENT heading defines the heading element to be
of type "#PCDATA"
 !ELEMENT body defines the body element to be of type
"#PCDATA"

<person>
<name> K.Vijay Kumar </name> Exactly one name
<greet> Happy new year </greet> At most one greeting
<addr>19-12, main road </addr> As many address
<addr> Kakinada </addr> lines as needed
<tel> 943786254 </tel>
Mixed telephones
<fax> 227862544 </fax>
and faxes
<tel> 227862551 </tel>
<email> vkumar123@gmail.com </email> As many
as needed
</person>

30

 name to specify a name element
 greet? to specify an optional
(0 or 1) greet elements
 name, greet? to specify a name followed by
an optional greet
 addr* to specify 0 or more address
lines
 tel | fax a tel or a fax element
 (tel | fax)* 0 or more repeats of tel or fax
 email* 0 or more email elements

31

 So the whole structure of a person entry is
specified by

name, greet?, addr*, (tel | fax)*, email*

 This is known as a regular expression

32

<?xml version="1.0" encoding="UTF-8"?> The name of
<!DOCTYPE addressbook [ the DTD is
<!ELEMENT addressbook (person*)> addressbook
<!ELEMENT person
(name, greet?, address*, (fax | tel)*, email*)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT greet (#PCDATA)> The syntax
<!ELEMENT address (#PCDATA)> of a DTD is
<!ELEMENT tel (#PCDATA)> not XML
<!ELEMENT fax (#PCDATA)>
syntax
<!ELEMENT email (#PCDATA)>
]>
“Internal” means that the DTD and the
XML Document are in the same file 33

 Suffixes:
? optional foreword?
+ one or more chapter+
* zero or more appendix*
 Separators
, both, in order foreword?, chapter+
| or section|chapter
 Grouping
() grouping (section|chapter)+

 The syntax is <!ELEMENT name category>
◦ The name is the element name used in start and end
tags
◦ The category may be EMPTY:
 In the DTD: <!ELEMENT br EMPTY>
 In the XML: <br></br> or just <br />
◦ In the XML, an empty element may not have any
content between the start tag and the end tag
◦ An empty element may (and usually does) have
attributes

 The syntax is <!ELEMENT name category>
◦ The category may be ANY
 This indicates that any content--character data,
elements, even undeclared elements--may be used
 Since the whole point of using a DTD is to define the
structure of a document, ANY should be avoided
wherever possible
◦ The category may be (#PCDATA), indicating that only
character data may be used
 In the DTD: <!ELEMENT paragraph (#PCDATA)>
 In the XML: <paragraph>A shot rang out!</paragraph>
 The parentheses are required!
 Note: In (#PCDATA), whitespace is kept exactly as entered
 Elements may not be used within parsed character data
 Entities are character data, and may be used

 A category may describe one or more children:
<!ELEMENT novel (foreword, chapter+)>
◦ Parentheses are required, even if there is only one child
◦ A space must precede the opening parenthesis
◦ Commas (,) between elements mean that all children must
appear, and must be in the order specified
◦ “|” separators means any one child may be used
◦ All child elements must themselves be declared
◦ Children may have children
◦ Parentheses can be used for grouping:
<!ELEMENT novel (foreword, (chapter+|section+))>

 # #PCDATA describes elements with only
character data
 #PCDATA can be used in an “or” grouping:
◦ <!ELEMENT note (#PCDATA|message)*>
◦ This is called mixed content
◦ Certain (rather severe) restrictions apply:
 #PCDATA must be first
 The separators must be “|”
 The group must be starred (meaning zero or more)

 The format of an attribute is:
<!ATTLIST element-name
name type requirement
name type requirement>
where the name-type-requirement may be
repeated as many times as desired
◦ Note that only spaces separate the parts, so careful
counting is essential
◦ The element-name tells which element may have these
attributes
◦ The name is the name of the attribute
◦ Each element has a type, such as CDATA (character data)
◦ Each element may be required, optional, or “fixed”
◦ In the XML, attributes may occur in any order

 There are ten attribute types
 These are the most important ones:
◦ CDATA The value is character data
◦ (man|woman|child) The value is one of enumerated
values
◦ ID The value is a unique identifier
 ID values must be legal XML names and must be unique
within the document
◦ NMTOKEN The value is a legal XML name
 This is sometimes used to disallow whitespace in the name
 It also disallows numbers, since an XML name cannot begin
with a digit

 IDREF The ID of another element
 IDREFS A list of other IDs
 NMTOKENS A list of valid XML names
 ENTITY An entity
 ENTITIES A list of entities
 NOTATION A notation
 xml: A predefined XML value

 Recall that an attribute has the form
<!ATTLIST element-name name type requirement>
 The requirement is one of:
◦ A default value, enclosed in quotes
 Example: <!ATTLIST degree CDATA "PhD">
◦ #REQUIRED
 The attribute must be present
◦ #IMPLIED
 The attribute is optional
◦ #FIXED "value"
 The attribute always has the given value
 If specified in the XML, the same value must be used

Invoice Element Declaration:
<?xml version=“1.0” ?>
<!ELEMENT employee (#PCDATA)>

<! ElementName AttributeName Type Default >

<!ATTLIST employee type (FullTime | PartTime) “FullTime” >

Usage in XML file:
<?xml version=“1.0” ?>
<employee type=“PartTime”/>

 CDATA
◦ CDATA attributes are strings , any text is allowed
 ID
◦ The values of an ID attribute must be a name. All id the ID attributes used in a
document must be unique. IDs uniquely identify individual elements in a
document.Elements can only have a single ID attrinute
 IDREF or IDREFS
◦ An IDREF attributes value must be the value of a single ID attribute on some
element in the document. The value of an IDREFs attribute may contain multiple
IDREF values seperated by white space.
 ENTITY or ENTITIES
◦ An ENTITY attribute’s must be the name of a single ENTITY. The value of an
ENTITIES attribute may contain multiple entity names separated by white space.
 NMTOKEN or NMTOKENS
◦ Name token attributes are a restricted form of string attribute, but there are no
other restrictions on the word.
 List of Names Enumerated
◦ You can specify that the value of an attribute must be taken from a specific list
of names. This frequently called an enumerated type because each of the
possible values must be explicitely enumerated in the declaration

 #REQUIRED
◦ The attribute must have an explicitly specified value for every occurrence of the
element in the document
 #IMPLIED
◦ The attribute value is not required and no default value is provided. If a value is not
specified the XMP processor must proceed without one.
 “value”
◦ An attrubute can be given any legal value as a default. The attribute value is not
required on each element of the document, and if it is not present it will appear to be
the specified default
 #FIXED “value”
◦ An attribute declaration may specify that an attribute has a fixed value. In this case,
the attribute is not required, but if it occurrs, it must have the specified value. If it is
not present, it will appear to be the specified defualt

 CDATA  ID
◦ Character data ◦ Unique ID
 NMTOKEN  IDREF
◦ Single token ◦ Match to ID
 NMTOKENS  IDREFS
◦ Multiple tokens ◦ Match to multiple ID's
 ENTITY  NOTATION
◦ Attribute is entity ref ◦ Describe non-XML data
 ENTITIES  Name group
◦ Multiple entity ref's ◦ Restricted list

 CDATA
◦ name = "Tom Jones"  ID
 NMTOKEN ◦ ID = "P09567"
◦ color="red"  IDREF
 NMTOKENS ◦ IDREF="P09567"
◦ values=“A12 A15 A34"  IDREFS
 ENTITY ◦ IDREFS="A01 A02"

◦ photo="MyPic"  NOTATION
 ENTITIES ◦ FORMAT="TeX"
◦ photos="pic1 pic2"  Name group
◦ coord="X"

 Can specify a default attribute value for
when its missing from XML document, or
state that value must be entered
◦ #REQUIRED Must be specified
◦ #IMPLIED May be specifed
◦ "default" Default value if unspecified
◦ #FIXED Only one value allowed

<ATTLIST tag name type default>
<!ATTLIST seqlist sepchar NMTOKEN #REQUIRED
type (alpha|num) "num"

 There are exactly five predefined entities: <, >, &,
", and '
 Additional entities can be defined in the DTD:
<!ENTITY copyright "Copyright Dr. Dave">
 Entities can be defined in another document:
<!ENTITY copyright SYSTEM "MyURI">
 Example of use in the XML:
This document is &copyright; 2002.
• Entities are a way to include fixed text (sometimes called
“boilerplate”)
• Entities should not be confused with character references,
which are numerical values between & and #
• Example: &233#; or &xE9#; to indicate the character é

 In XML, element names are defined by the developer. This often
results in a conflict when trying to mix XML documents from
different XML applications.

This XML carries HTML table information: This XML carries information about a table
<table> (a piece of furniture):
<tr> <table>
<td>Apples</td> <name>Wooden Table</name>
<td>Bananas</td> <width>80</width>
</tr> <length>120</length>
</table> </table>

•If these both XML tags were added together, there would be a name conflict.
• Both contain a <table> element, but the elements have different content and
meaning.
•An XML parser will not know how to handle these differences.

Name conflicts in XML can easily be avoided using a name prefix.
This XML carries information about an HTML table, and a piece of
furniture:
<h:table>
In the example above, there
<h:tr>
<h:td>Apples</h:td> will be no conflict because the
<h:td>Bananas</h:td> two <table> elements have
</h:tr> different names.
</h:table>
<f:table>
<f:name>Wooden Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>

 When using prefixes in XML, a so-called namespace for the prefix must be
defined.
 The namespace is defined by the xmlns attribute in the start tag of an element.
 The namespace declaration has the following
 syntax. xmlns:prefix="URI".

<root xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f=“http://www.w3schools.com/furniture”>
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>

 Recall that DTDs are used to define the tags
that can be used in an XML document
 An XML document may reference more than
one DTD
 Namespaces are a way to specify which DTD
defines a given tag
 XML, like Java, uses qualified names
◦ This helps to avoid collisions between names
◦ Java: myObject.myVariable
◦ XML: myDTD:myTag
◦ Note that XML uses a colon (:) rather than a dot (.)

53

 A namespace is defined as a unique string
◦ To guarantee uniqueness, typically a URI (Uniform
Resource Indicator) is used, because the author
“owns” the domain
◦ It doesn't have to be a “real” URI; it just has to be
a unique string
◦ Example: http://www.matuszek.org/ns
There are two ways to use namespaces:
◦ Declare a default namespace
◦ Associate a prefix with a namespace, then use the
prefix in the XML to refer to the namespace

54

 In any start tag you can use the reserved attribute name xmlns:
<book xmlns="http://www.matuszek.org/ns">
◦ This namespace will be used as the default for all elements
up to the corresponding end tag
◦ You can override it with a specific prefix

 You can use almost this same form to declare a prefix:
<book xmlns:dave="http://www.matuszek.org/ns">
◦ Use this prefix on every tag and attribute you want to use
from this namespace, including end tags--it is not a default
prefix
<dave:chapter dave:number="1">To Begin</dave:chapter>

 You can use the prefix in the start tag in which it is defined:
<dave:book xmlns:dave="http://www.matuszek.org/ns">

55

 XSL stands for EXtensible Stylesheet Language, and is a
style sheet language for XML documents.
 XSLT stands for XSL Transformations.
 XSLT is used to transform XML documents into other
formats, like XHTML.
 XSLT is used to transform an XML document into another
XML document, or another type of document that is
recognized by a browser, like HTML and XHTML.
 XSLT does this by transforming each XML element into an
(X)HTML element.
 With XSLT you can add/remove elements and attributes
to or from the output file.
 You can also rearrange and sort elements.
 You can also perform tests and make decisions about
which elements to hide and display, and a lot more.

 DTDs are a very weak specification language
◦ You can’t put any restrictions on element contents.
◦ It’s difficult to specify:
 All the children must occur, but may be in any order.
 This element must occur a certain number of times.
◦ There are only ten data types for attribute values.
 DTDs aren’t written in XML!
◦ If you want to do any validation, you need one parser for
the XML and another for the DTD.
◦ This makes XML parsing harder than it needs to be.
◦ There is a newer and more powerful technology:
XML Schemas.
◦ However, DTDs are still very much in use.

 An XML Schema describes the structure of an XML document.
 XML Schema is an XML-based alternative to DTD.
 The XML Schema language is also referred to as XML Schema
Definition (XSD).
 Ex: remainder.xsd
< xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema>
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
< /xs:element>
< /xs:schema>

< note xmlns="http://www.w3schools.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3schools.com note.xsd">
< to>Ravi</to>
< from>AEC</from>
< heading>Reminder</heading>
< body>Welcome to Aditya</body>
< /note>

< note xmlns="http://www.w3schools.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation=“note.xsd">
< to>Ravi</to>
< from>AEC</from>
< heading>Reminder</heading>
< body>Welcome to Aditya</body>
< /note>

 The purpose of an XML Schema is to define the legal
building blocks of an XML document, just like a DTD.
 An XML Schema:
◦ defines elements that can appear in a document.
◦ defines attributes that can appear in a document.
◦ defines which elements are child elements.
◦ defines the order of child elements.
◦ defines the number of child elements.
◦ defines whether an element is empty or can include text.
◦ defines data types for elements and attributes.
◦ defines default and fixed values for elements and
attributes.
 XML Schemas are the Successors of DTDs
◦ XML Schemas are extensible to future additions.
◦ XML Schemas are richer and more powerful than DTDs.
◦ XML Schemas are written in XML.
◦ XML Schemas support data types.
◦ XML Schemas support namespaces.
 XML Schemas are much more powerful than DTDs.

 XML Schemas is the support for data types.
◦ It is easier to validate the correctness of data.
◦ It is easier to work with data from a database.
◦ It is easier to define data facets (restrictions on data), data patterns
(data formats) and easy to convert data between different data types.
 XML Schemas is that they are written in XML.
◦ You don't have to learn a new language.
◦ You can use your XML editor to edit your Schema files.
◦ You can use your XML parser to parse your Schema files.
 XML Schemas provides Secure Data Communication.
◦ A date like: "03-11-2004" will be interpreted as in some countries,
3.November and in other as 11.March.
◦ However, an XML element with a data type like this:
◦ <date type="date">2004-03-11</date>
◦ ensures a mutual understanding between sender and reciever, i.e., the
XML "date“ type requires the format "YYYY-MM-DD".
 XML Schemas are Extensible.
◦ Reuse your Schema in other Schemas.
◦ Create your own data types derived from the standard types.
◦ Reference multiple schemas in the same document.

 Defining Simple Element :
<xs:element name="xxx" type="yyy"/>
 XML Schema has a lot of built-in data types.
o xs:string
o xs:decimal
o xs:integer
o xs:boolean
o xs:date
o xs:time

Example
 Here are some XML elements:
<lastname>Refsnes</lastname>
<age>36</age>
<dateborn>1970-03-27</dateborn>
 Here are the corresponding simple element definitions in Schema:
<xs:element name="lastname" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="dateborn" type="xs:date"/>
<xs:element name="color" type="xs:string" default="red"/>
<xs:element name="color" type="xs:string" fixed="red"/>

 Syntex :
<xs:attribute name="xxx" type="yyy"/>
 Example
 Here is an XML element with an attribute:
<lastname lang="EN">Smith</lastname>
 And here is the corresponding attribute definition:
<xs:attribute name="lang" type="xs:string"/>
<xs:attribute name="lang" type="xs:string" default="EN"/>
<xs:attribute name="lang" type="xs:string" fixed="EN"/>
<xs:attribute name="lang" type="xs:string" use="required"/>

 Restrictions are used to define acceptable values for XML
elements or attributes.
 Restrictions on XML elements are called facets.

Restrictions on Values
 The example defines an element called "age" with a restriction.
The value of age cannot be lower than 0 or greater than 100:
<xs:element name="age">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="100"/>
</xs:restriction>
</xs:simpleType>
</xs:element>

 The example below defines an element called "car" with a
restriction.
The only acceptable values are: Audi, Golf, BMW:
<xs:element name="car">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Audi"/>
<xs:enumeration value="Golf"/>
<xs:enumeration value="BMW"/>
</xs:restriction>
</xs:simpleType>
</xs:element>

 Below example defines an element "letter" with a restriction.
 The acceptable value is ONE of the LOWERCASE letters from a to z:
<xs:element name="letter">
<xs:simpleType>
<xs:pattern value="[a-z]"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
 The only acceptable value is THREE of the UPPERCASE letters from a
to z:
<xs:element name="initials">
<xs:simpleType>
<xs:pattern value="[A-Z][A-Z][A-Z]"/>
</xs:restriction>
</xs:simpleType>
</xs:element>

 The next example defines an element called "gender" with a
restriction. The only acceptable value is male OR female:
<xs:element name="gender">
<xs:simpleType>
<xs:pattern value="male|female"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
 The example defines an element “mobileno" with a restriction.
 There must be exactly 10 digits:
<xs:element name=“mobileno">
<xs:simpleType>
<xs:pattern value="[0-9]{10}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>

 The whiteSpace constraint is set to "preserve", which means that the XML
processor WILL NOT remove any white space characters:
<xs:element name="address">
<xs:simpleType>
<xs:whiteSpace value="preserve"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
 The whiteSpace constraint is set to "replace", which means that the XML
processor WILL REPLACE all white space characters (line feeds, tabs, spaces, and
carriage returns) with spaces:
<xs:element name="address">
<xs:simpleType>
<xs:whiteSpace value="replace"/>
</xs:restriction>
</xs:simpleType>
</xs:element>

 The value must be minimum five characters
and maximum eight characters:
 <xs:element name="password">
<xs:simpleType>
<xs:minLength value="5"/>
<xs:maxLength value="8"/>
</xs:restriction>
</xs:simpleType>
</xs:element>

 A complex element is an XML element that
contains other elements and/or attributes.
 There are four kinds of complex elements:
◦ empty elements
◦ elements that contain only other elements
◦ elements that contain only text
◦ elements that contain both other elements and text

 It is a software library (or a package) that
provides methods (or interfaces) for client
applications to work with XML documents
 It checks the well-formattedness
 It may validate the documents
 It does a lot of other detailed things so that a
client is shielded from that complexities

 DOM: Document Object Model
 SAX: Simple API for XML
 A DOM parser implements DOM API
 A SAX parser implement SAX API
 Most major parsers implement both
DOM and SAX API’s

 A DOM document is an object containing
all the information of an XML document

 It is composed of a tree (DOM tree) of
nodes , and various nodes that are
somehow associated with other nodes in
the tree but are not themselves part of the
DOM tree

 There are 12 types of nodes in a DOM
Document object
Document node
Element node
Text node
Attribute node
Processing instruction node
…….

Sample XML document

<?xml-stylesheet type="text/css" href=“test.css"?>

<!DOCTYPE shapes SYSTEM “shapes.dtd">
<shapes>
……
<squre color=“BLUE”>
<length> 20 </length>
</squre>
……
</shapes>

 A DOM parser creates an internal structure in
memory which is a DOM document object
 Client applications get the information of the
original XML document by invoking methods
on this Document object or on other objects it
contains
 DOM parser is tree-based (or DOM obj-based)
 Client application seems to be pulling the data
actively, from the data flow point of view

 Advantage:
(1) It is good when random access to widely
separated parts of a document is
required
(2) It supports both read and write operations

 Disadvantage:
(1) It is memory inefficient
(2) It seems complicated, although not really

 It does not first create any internal structure
 Client does not specify what methods to call
 Client just overrides the methods of the API
and place his own code inside there
 When the parser encounters start-tag, end-
tag,etc., it thinks of them as events

 When such an event occurs, the handler
automatically calls back to a particular method
overridden by the client, and feeds as
arguments the method what it sees
 SAX parser is event-based,it works like an
event handler in Java (e.g. MouseAdapter)
 Client application seems to be just receiving
the data inactively, from the data flow point of
view

 Advantage:
(1) It is simple
(2) It is memory efficient
(3) It works well in stream application
 Disadvantage:
The data is broken into pieces and clients
never have all the information as a whole
unless they create their own data structure

Xml

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Xml

Ähnlich wie Xml (20)

Mehr von vamsi krishna

Mehr von vamsi krishna (10)

Xml