Measures of Dispersion and Variability: Range, QD, AD and SD
XML
1. XML
XML
1 What are the disadvantages of XML?
• HTML lacks syntax checking
• HTML lacks structure
• HTML is not suitable for data interchange
• HTML is not context aware – HTML does not allow us to describe the information
content or the semantics of the document
• HTML is not object-oriented
• HTML is not re-usable
• HTML is not extensible
• HTML is suitable only for displaying content, not for “what” the content is about.
• HTML has a few tags to describe the meaning of the text, such as <ADDRESS>
• HTML is not flexible enough to markup wide variety of documents. HTML can describe
only <HEAD> and <BODY>. It cannot describe abstracts, chapters, part, sections etc.
2 What is XML?
• XML – Extensible Markup Language
o Extensible – capable of being extended. We can make our own elements/tags.
o Markup – it is a way of adding information to the text indicating the logical components
of a document
• How is it different from HTML?
o HTML was designed to display data
o XML was designed to store, describe and transport data
o XML separates data from HTML
• XML is also a markup language like HTML
• XML tags are not predefined – we must design our own tags.
• XML is portable - It is easy to produce files that capture the rules of your markup and
enable other programs to properly read or process your XML documents.
• XML does not do anything like HTML. XML was created to structure, store, and transport
information.
• XML is not a replacement for HTML; they do different things.
3 State the differences between HTML and XML.
HTML XML
1 Designed to display data Designed to store and transport data between
applications and databases. Transport here means
that data can be exchanged between incompatible
systems, over the Internet.
2 Focus is on how data looks Focus is on what data is
3 It has pre-defined tags such as <B>, No predefined tags; all tags must be defined by the
<LI>, etc user. E.g., we can create tags such as <TO>,
<FROM>, <BOOKNAME>, etc
4 HTML is used to display information XML is used to describe information
5 Every tag may not have a closing tag. Every tag must have a closing tag.
6 HTML is not case sensitive. XML is case sensitive
7 HTML is for humans XML is for computers
4 What are the advantages of XML? OR What are the features of XML?
• XML simplifies data sharing : Since XML data is stored in plain text format, data can be easily
Prof. Mukesh N. Tekwani Page 1 of 11
2. XML
shared among different hardware and software platforms.
• XML separates data from HTML : To display dynamic data in HTML, the code must be
rewritten each time the data changes. With XML, data can be stored in separate files so that
whenever the data changes it is automatically displayed correctly. We have to design the HTML
for layout only once.
• XML simplifies data transport: Data can be easily exchanged between different platforms.
• XML makes data more available
o Since XML is independent of hardware, software and application, XML can make
your data more available and useful.
o Different applications can access your data in HTML pages
• XML provides a means to package almost any type of information (binary, text, voice, video) for
delivery to a receiving end.
• Internationality: HTML relies heavily on ASCII which makes using foreign characters very
difficult. XML uses Unicode so that many European and Asian languages are also handled easily
5 What are the types of XML markup?
There are 5 types of XML markup:
Elements:
1. XML elements describe the meaning of the text they contain.
2. Elements occur in pairs with a start tag and end tag that enclose the text they markup.
3. Inside the start tag, a keyword indicates the meaning of the markup. The end tag contains
the same key word with a forward slash (/). Both tags start with a less than sign and end
with a greater than sign.
<LETTER>……….</LETTER>
4. Some elements do not occur in pairs. These elements are said to be empty. The tag for
the element ends /> e.g., <BR/>
5. Some elements take attributes that modify or expand on the meaning they impart to
content they contain. Attributes are set equal to values enclosed between quotation
marks.
Entities:
1. In HTML we use entities such as > < etc. Entities in XML are very similar
to entities in HTML.
2. Some characters have a special meaning in XML. E.g., If you place a character like "<"
inside an XML element, it will generate an error because the parser interprets it as the
start of a new element. <message>if salary < 1000 then </message>
3. XML also enables us to use any Unicode character you want thus, producing documents
in other languages other than English.
4. XML entities can be defined in your XML file or externally and you can incorporate the
entities in your XML file.
5. To avoid this error, replace the "<" character with an entity reference:
<message>if salary < 1000 then</message>
6. The predefined entites in XML are:
Entity Symbol Description
< < Less than
> > Greater than
& & Ampersand
' ‘ Apostrophe
" “ Quotation mark
Page 2 of 11 mukeshtekwani@hotmail.com
3. XML
Comments: comments are same as HTML. <!-- --> .
Processing instructions:
Processing instructions (PIs) enable us to embed information to be passed to an application
right in your XML document. <?name data> is the syntax.
The name, or PI target, should be anything that the processing application will recognize.
Targets with XML are reserved for standardization purposes.
The data component of PI can be anything that the processing application understands.
Ignored sections:
In a mathematical expression it becomes necessary to use characters that are XML reserved.
If you put them into a ignored section like this:
<![CDATA[4 <3 is false.]]>
the expression with the less than sign passes to the application.
All ignored sections start with <![CDATA[ and end with ]]>
6 Simple example of XML document:
<?xml version="1.0" encoding="ISO-8859-1"?>
<class_list>
<student>
<name>Anamika</name>
<grade>A+</grade>
</student>
<student>
<name>Veena</name>
<grade>B+</grade>
</student>
</class_list>
• The first line is the XML declaration.
o It defines the XML version (1.0)
o It gives the encoding used (ISO-8859-1 = Latin-1/West European character set)
o The XML declaration is actually a processing instruction (PI) an it is identified
by the ? At its start and end
• The next line describes the root element of the document (like saying: "this document is
a class_list“). Every XML document must have only one root element. The root element
is like the parent element. All other elements must be completely enclosed within that
element. In our example, the root element is <class_list>
• In XML the non-empty element must consist of three things: a start tag, content (either
text or other elements) and an end tag. The name that you use in the element start tag
must exactly match (including case) the name you use in the end tag.
• The next 2 lines describe child elements of the root (student, name, and grade)
• And finally the last line defines the end of the root element: </class_list>.
• XML documents can contain empty XML elements.
Example,
<banner source="topbanner.gif"/>
<rule/>
Prof. Mukesh N. Tekwani Page 3 of 11
4. XML
<footer source="foot.gif"/>
With empty elements, a close delimiter is used . /> or you can you can use a closing
tag as follows: <empty_element></empty_element>
Attributes:
XML elements can have attributes. An attribute provides additional information about an
element. Attributes provide information that is not a part of the data. In the example below,
the file type is irrelevant to the data, but can be important to the software that wants to
manipulate the element:
<file type="gif">computer.gif</file>
7 Describe the logical structure / tree structure of XML documents.
There is a big difference between XML and HTML markup. With a few exceptions, most
HTML tags perform functions related to how the content is displayed. XML markup, on the
other hand, is meant to convey what the content means.
Each XML document must have only one root element, and all other elements must be
perfectly nested inside that element. Perfectly nested means, that if an element contains other
elements, those elements must be completely enclosed within that element.
If we sketch the structure of the elements in XML document, we obtain a tree structure.
The root element <class_list> is at the top of the tree. All elements that are inside this
element are neatly contained within each other. An XML document can contain only one
root element, and no element can be either partially or completely outside this element. An
element is a parent of the elements that it contains. The elements inside an element are called
children. Elements that share the same parent element are called siblings.
In our example <class_list> is the parent of all elements. <student> is the parent of <name>,
<name> is a child of <student>, and <name> and <grade> are siblings. Each child element
must be fully contained within its parent element. Sibling elements may not overlap.
The arrangement of elements in XML is called its logical structure.
Tree Structure:
• XML documents form a tree structure.
• XML documents must contain a root element. This element is "the parent" of all other
elements.
• The elements in an XML document form a document tree. The tree starts at the root and
branches to the lowest level of the tree.
• All elements can have sub elements (child elements)
• <root>
<child>
<subchild>.....</subchild>
</child>
</root>
Page 4 of 11 mukeshtekwani@hotmail.com
5. XML
Example of tree structure:
This tree structure is a represenattion for one book in the XML document which is given
below:
<bookstore>
<book category = "COOKING">
<title lang = "en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category = "CHILDREN">
<title lang = "en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category = "WEB">
<title lang = "en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
The <book> element has 4 children: <title>, < author>, <year>, and <price>
Prof. Mukesh N. Tekwani Page 5 of 11
6. XML
8 State the XML syntax Rules
All XML Elements Must Have a Closing Tag
In HTML, elements do not have to have a closing tag:
<p>This is a paragraph
<p>This is another paragraph
In XML, it is illegal to omit the closing tag. All elements must have a closing tag:
<p>This is a paragraph</p>
<p>This is another paragraph</p>
XML Tags are Case Sensitive
XML tags are case sensitive. The tag <Letter> is different from the tag <letter>.
Opening and closing tags must be written with the same case:
<Message>This is incorrect</message>
<message>This is correct</message>
"Opening and closing tags" are also called as "Start and end tags".
XML Elements Must be Properly Nested
In HTML, you might see improperly nested elements:
<b><i>This text is bold and italic</b></i>
In XML, all elements must be properly nested within each other:
<b><i>This text is bold and italic</i></b>
In the example above, "Properly nested" simply means that since the <i> element is opened
inside the <b> element, it must be closed inside the <b> element.
XML Documents Must Have a Root Element
XML documents must contain one element that is the parent of all other elements. This
element is called the root element.
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
Page 6 of 11 mukeshtekwani@hotmail.com
7. XML
XML Attribute Values Must be Quoted
XML elements can have attributes in name/value pairs just like in HTML. In XML, the
attribute values must always be quoted.
In the two XML documents below, the first one is incorrect, the second is correct:
<note date=12/11/2007>
<to>Raja</to>
<from>Jani</from>
</note>
<note date="12/11/2007">
<to>Raja</to>
<from>Jani</from>
</note>
The error in the first document is that the date attribute in the note element is not quoted.
Entity References
Some characters have a special meaning in XML. If you place a character like "<" inside an
XML element, it will generate an error because the parser interprets it as the start of a new
element.
This will generate an XML error:
<message>if salary < 1000 then</message>
To avoid this error, replace the "<" character with an entity reference:
<message>if salary < 1000 then</message>
There are 5 predefined entity references in XML:
< < less than
> > greater than
& & ampersand
' ' apostrophe
" " quotation mark
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character
is legal, but it is a good habit to replace it.
Comments in XML
The syntax for writing comments in XML is similar to that of HTML.
Prof. Mukesh N. Tekwani Page 7 of 11
8. XML
<!-- This is a comment -->
White-space is Preserved in XML
HTML truncates multiple white-space characters to one single white-space:
HTML: Hello Tove
Output: Hello Tove
With XML, the white-space in a document is not truncated.
XML Stores New Line as LF
In Windows applications, a new line is normally stored as a pair of characters: carriage
return (CR) and line feed (LF). In Unix applications, a new line is normally stored as a LF
character. XML stores a new line as LF.
9 State the XML naming rules.
XML elements must follow these naming rules:
• Names can contain letters, numbers, and other characters
• Names cannot start with a number or punctuation character
• Names cannot start with the letters xml (or XML, or Xml, etc)
• Names cannot contain spaces.
• Any name can be used, no words are reserved.
Best Naming Practices
Make names descriptive. Names with an underscore separator are nice: <first_name>,
<last_name>.
Names should be short and simple, like this: <book_title> not like this:
<the_title_of_the_book>.
Avoid "-" characters. If you name something "first-name," some software may think you
want to subtract name from first.
Avoid "." characters. If you name something "first.name," some software may think that
"name" is a property of the object "first."
Avoid ":" characters. Colons are reserved to be used for something called namespaces (more
later).
XML documents often have a corresponding database. A good practice is to use the naming
rules of your database for the elements in the XML documents.
Non-English letters like éòá are perfectly legal in XML, but watch out for problems if your
software vendor doesn't support them.
10 XML elements are extensible. Explain this statement.
XML’s flexibility comes from its capability to enable you to make up your own XML
Page 8 of 11 mukeshtekwani@hotmail.com
9. XML
elements. This means that you can introduce tags into XML XML elements can be extended
to carry more information.
Look at the following XML example:
<note>
<to>Raja</to>
<from>Jani</from>
<body>Don't forget me this weekend!</body>
</note>
Let's imagine that we created an application that extracted the <to>, <from>, and <body>
elements from the XML document to produce this output:
MESSAGE
To: Raja
From: Jani
Don't forget me this weekend!
Imagine that the author of the XML document added some extra information to it:
<note>
<date>2008-01-10</date>
<to>Raja</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
This application will not crash because of the changes we made. The application should still
be able to find the <to>, <from>, and <body> elements in the XML document and produce
the same output. This is the concept of extensibility.
11
Write a note on XML attributes.
XML elements can have attributes. An attribute provides additional information about an
element. Attributes provide information that is not a part of the data. In the example below,
the file type is irrelevant to the data, but can be important to the software that wants to
manipulate the element:
<file type="gif">computer.gif</file>
XML Attributes Must be Quoted
Attribute values must always be quoted. Either single or double quotes can be used. For a
person's gender, the person element can be written like this:
<person gender = “female”> or <person gender = ‘female’>
Prof. Mukesh N. Tekwani Page 9 of 11
10. XML
XML attributes must be avoided for the following reasons:
• attributes cannot contain multiple values (elements can)
• attributes cannot contain tree structures (elements can)
• attributes are not easily expandable (for future changes)
• attributes are difficult to read and maintain.
Use elements for data. Use attributes for information that is not relevant to the data.
12 What is the difference between XML elements and attributes?
XML does not specify about when to use elements and when to use attributes. Consider the
following examples:
<person gender = "female">
<firstname>Anita</firstname>
<lastname>Shah</lastname>
</person>
<person>
<gender>female</gender>
<firstname>Anita</firstname>
<lastname>Shah</lastname>
</person>
In the first example gender is an attribute. In the next example, gender is an element. Both
examples provide the same information. Generally, we avoid using attributes in XML and
instead prefer to use elements.
Another example:
Consider the following XML document :
Using date attribute:
<note date="10/01/2008">
<to>Raja</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Using date element:
<note>
<date>10/01/2008</date>
<to>Raja</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
We now expand the date element in the next code:
<note>
Page 10 of 11 mukeshtekwani@hotmail.com
11. XML
<date>
<day>10</day>
<month>01</month>
<year>2008</year>
</date>
<to>Raja</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
13 What are the specifications needed for a document to be valid and well formed XML
document?
An XML document with correct syntax is called a “well formed XML document”. But a
document validated against a DTD is a “valid” document”.
Well formed document:
A "Well Formed" XML document has correct XML syntax. These syntax rules are:
• XML documents must have a root element
• XML elements must have a closing tag
• XML tags are case sensitive
• XML elements must be properly nested
• XML attribute values must be quoted
Valid XML document:
A "Valid" XML document is a "Well Formed" XML document, which also conforms to the
rules of a Document Type Definition (DTD):
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE note SYSTEM "Note.dtd">
<note>
<to>Raja</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The DOCTYPE declaration in the example above, is a reference to an external DTD file.
The content of the file is shown in the paragraph below. The purpose of a DTD is to define
the structure of an XML document. It defines the structure with a list of legal elements:
<!DOCTYPE note
[
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
Prof. Mukesh N. Tekwani Page 11 of 11