SlideShare ist ein Scribd-Unternehmen logo
1 von 82
   It is Data Description Language.
   It is a flexible way to create common information
    formats.
   It also provide to share both the format and the data
    on the World Wide Web, intranets, and elsewhere.
   Computer users might agree on a standard or
    common way to describe a piece of information
    using a platform independent format with XML.
   Such a standard way of describing data would
    enable a user, to exchange the data with description
    between any type of platforms.
   XML → Extensible Markup Language
   Just a text file, with tags, attributes, free text...
    looks much like HTML
   Used to create special-purpose markup languages
   Facilitates sharing of structured text and
    information across the Internet.
   XML Structure facilitates parsing of data.
    XML tags are not pre-defined, you must define
    your own tags.
   Two Apps still have to agree on what the meaning
    of the "descriptive tags“.
HTML


         MML
SGML

        CML
 XML




               XML for Developers -
               Version 1a        4
HTML                                   XML
<H1>Cars for Sale</H1>         <for_sale>
<h3>Audi 80</h3>               <heading>Cars for Sale</heading>
1800 cc<BR>                    <make>Audi 80</make>
Blue<BR>                       <engine>1800 cc</engine>
Manual<BR>                     <color>Blue</color>
1988<BR>                       <transmission>Manual</transmission>
$1250                          <year>1988</year>
<P>                            <price>$1250</price>
<H3>Toyota Corolla</h3>        <make>Toyota Corolla</make>
1250 cc<BR>                    <engine>1250</engine>
Red<BR>                        <color>Red</color>
Automatic<BR>                  <transmission>Automatic</transmission>
1984<BR>                       <year>1984</year>
Red<BR>                        <price>$940</price>
$940<BR>                       </for_sale>



               Example1.html                 Example1.xml
                                                               XML for Developers -
                                                               Version 1a        5
HTML             XML in IE5 browser
Cars for Sale
Audi 80
1800 cc
Blue
Manual
1988
$1250

Toyota Corolla
1250 cc
Red
Automatic
1984
Red
$940




                                      6
XML                       HTML
 It is free & extensible    Derived language from
  language.                   SGML.
 Tags are user-defined.
                             Tags are pre-defined
 It is about describing
                             It is about displaying
  information
 Extensible set of tags
                              information.
 Content orientated         Fixed set of tags

 Standard Data              Presentation oriented
  infrastructure             No data validation
 Allows multiple output      capabilities
  forms                      Single presentation
   Many number of extensible languages are defined from XML.
    ◦   Wireless Markup Language (WML).
    ◦   Chemical Markup Language (CML)
    ◦   Bio-informatic Sequence Markup Language (BSML).
    ◦   Mathematical Markup Language (MathML).
    ◦   Open Office Markup Language ( OOML )
   Directly usable over the internet, means any platform and any
    protocol can understand XML.
   Plain text documents and easier to write & transport
    Programs.
   Can be used with SGML without any conflict along with other
    web technologies.
   Clearly understandable by human.
   Accurate validation can be possible with the help of DTD and
    schema.
   We can define structure of data and along with description.
   Semi structured Data Bases are also defined using XML.
   Web publishing : XML allows the customer to customize
    web pages. With XML, you store the data once and then
    render that content for different viewers.
   Web searching and automating Web tasks: XML defines
    the type of information contained in a document, making
    it easier to return useful results when searching the Web.
   General applications: XML provides a standard method to
    use, store, transmit, and display data for all kinds of
    applications and devices.
   e-business applications: XML allows to make electronic
    data interchange (EDI) for both business-to-business
    transactions, and business-to-consumer transactions.
   Metadata applications: XML makes it easier to express
    metadata in a portable, reusable format.
   Pervasive computing: XML provides portable and
    structured information types for display on pervasive
    (wireless) computing devices such as personal digital
    assistants (PDAs), cellular phones.
   DTD (Document Type Definition) and XML Schemas are
    used to define legal XML tags and their attributes for
    particular purposes

   CSS (Cascading Style Sheets) describe how to display
    HTML or XML in a browser

   XSLT (eXtensible Stylesheet Language Transformations)
    and XPath are used to translate from one form of XML
    to another

   DOM (Document Object Model), SAX (Simple API for
    XML, and JAXP (Java API for XML Processing) are all APIs
    for XML parsing


                                                10
 XML documents use a self-describing and simple
syntax.
<?xml version=“1.0” encoding=“ISO-8859-1”?>
<note>
   <to> Ramesh </to>
   <from> Kiran</from>
   <heading> Reminder </heading>
   <body> Don’t forget me this weekend </body>
</note>
   XML Syntax consists of
    ◦ XML Declaration
    ◦ XML Elements
    ◦ XML Attributes
    The first line of an XML document should always
    consist of an XML declaration defining the version of
    XML.
   An XML document may start with one or more
    processing instructions (PIs) or directives:
     <?xml version="1.0"?>
     <?xml-stylesheet type="text/css" href="ss.css"?>
   Following the directives, there must be exactly one
    root element containing all the rest of the XML:
     <weatherReport>
        ...
     </weatherReport>
   XML elements have relationships
   Elements can have different content types
   Element Naming Rules:
    1) Names contain letters, numbers & other characters.
    2) Names must not start with a number or
        punctuation marks.
    3) Names must not start with the letters xml.
    4) Names cannot contain spaces.
   Names (as used for tags and attributes) must begin
    with a letter or underscore, and can consist of:
    ◦ Letters, both Roman (English) and foreign
    ◦ Digits, both Roman and foreign
      . (dot)
      - (hyphen)
      _ (underscore)
<person gender="male">
    <firstname>Ravi</firstname>
    <lastname>Kumar</lastname>
</person>
   Attributes and elements are somewhat interchangeable
   Example using just elements:
     <name>
       <first>David</first>
       <last>Matuszek</last>
     </name>
   Example using attributes:
     <name first="David" last="Matuszek"></name>
   You will find that elements are easier to use in your
    programs--this is a good reason to prefer them
   Attributes often contain metadata, such as unique IDs
   Generally speaking, browsers display only elements
    (values enclosed by tags), not tags and attributes



                                                   15
   While elements can contain multiple values
    attributes cannot
   Attributes are not expandable
   Elements can describe structure but not
    Attributes
    Attributes are more difficult to manipulate by
    program code than elements
    Attribute values are difficult to validate against
    a DTD
<novel>
  <foreword>
    <paragraph> This is the great American novel.</paragraph>
  </foreword>
  <chapter number="1">
    <paragraph>It was a dark and stormy night.</paragraph>
    <paragraph>Suddenly, a shot rang out!</paragraph>
  </chapter>
</novel>
                                   novel


                   foreword                     chapter
                                              number="1"


                paragraph            paragraph             paragraph

              This is the great     It was a dark      Suddenly, a shot
              American novel.     and stormy night.       rang out!
                                                      17
   Every element must have both a start tag and an end
    tag, e.g. <name> ... </name>
    ◦ But empty elements can be abbreviated: <break />.
    ◦ XML tags are case sensitive
    ◦ XML tags may not begin with the letters xml, in any
      combination of cases
   Elements must be properly nested,
       e.g. not <b><i>bold and italic</b></i>
   Every XML document must have one and only one root
    element.
   The values of attributes must be enclosed in single or
    double quotes, e.g. <time unit="days">
   Character data cannot contain < or &


                                               18
   Start with <?xml version="1.0"?>
   XML is case sensitive
   You must have exactly one root element that
    encloses all the rest of the XML
   Every element must have a closing tag
   Elements must be properly nested
   Attribute values must be enclosed in double
    or single quotation marks
   There are only five pre-declared entities


                                      19
   Five special characters must be written as
    entities:
     &amp; for     &     (almost always necessary)
     &lt;    for     <     (almost always necessary)
     &gt;    for     >     (not usually necessary)
     &quot; for    "      (necessary inside double quotes)
     &apos; for    '     (necessary inside single quotes)
   These entities can be used even in places
    where they are not absolutely required
   These are the only predefined entities in XML


                                                20
   The XML declaration looks like this:
    <?xml version="1.0" encoding="UTF-8"
    standalone="yes"?>
    ◦ The XML declaration is not required by browsers, but is
      required by most XML processors (so include it!)
    ◦ If present, the XML declaration must be first--not even
      whitespace should precede it
    ◦ Note that the brackets are <? and ?>
    ◦ version="1.0" is required (this is the only version so far)
    ◦ encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or
      something else, or it can be omitted
    ◦ standalone tells whether there is a separate DTD



                                                    21
   PIs (Processing Instructions) may occur anywhere in
    the XML document (but usually first)
   A PI is a command to the program processing the
    XML document to handle it in a certain way
   XML documents are typically processed by more
    than one program
   Programs that do not recognize a given PI should
    just ignore it
   General format of a PI: <?target instructions?>
   Example: <?xml-stylesheet type="text/css"
    href="mySheet.css"?>


                                           22
   <!-- This is a comment in both HTML and XML -->
   Comments can be put anywhere in an XML document
   Comments are useful for:
    ◦ Explaining the structure of an XML document
    ◦ Commenting out parts of the XML during development and
      testing
   Comments are not elements and do not have an end tag
   The blanks after <!-- and before --> are optional
   The character sequence -- cannot occur in the comment
   The closing bracket must be -->
   Comments are not displayed by browsers, but can be
    seen by anyone who looks at the source code


                                                  23
   By default, all text inside an XML document is
    parsed.
   You can force text to be treated as unparsed
    character data by enclosing it in <![CDATA[ ... ]]>
   Any characters, even & and <, can occur inside a
    CDATA
   Whitespace inside a CDATA is (usually) preserved
   The only real restriction is that the character
    sequence ]]> cannot occur inside a CDATA
   CDATA is useful when your text has a lot of illegal
    characters (for example, if your XML document
    contains some HTML text)


                                            24
<note>
  <to> Ramesh </to>
  <from> Kiran</from>
  <heading> Reminder </heading>
  <body> Don’t forget me this weekend </body>
</note>

<remainder>
  <heading> Reminder </heading>
  <to> Ramesh </to>
  <from> Kiran</from>
  <message> Don’t forget me this weekend </message>
</ remainder >
    Basically DTD is used to specify the set of rules for
    structuring data in xml file.
     It is used to define the building blocks of XML document.
     Using DTD we can specify the various elements types,
    attributes and their relationship.
   DTD constraints structure of XML data
    ◦ What elements can occur
    ◦ What attributes can/must an element have.
    ◦ What subelements can/must occur inside each element, and how
      many times.
   DTD syntax
    ◦ <!ELEMENT element (subelements-specification) >
    ◦ <!ATTLIST element (attributes) >
   A DTD adds syntactical requirements in
    addition to the well-formed requirement
   It helps in eliminating errors when creating or
    editing XML documents
   It clarifies the intended semantics
   It simplifies the processing of XML
    documents




                                                      27
   <?xml version="1.0"?>
    <!DOCTYPE note [
    <!ELEMENT note (to,from,heading,body)>
    <!ELEMENT to (#PCDATA)>
    <!ELEMENT from (#PCDATA)>
    <!ELEMENT heading (#PCDATA)>
    <!ELEMENT body (#PCDATA)>
    ]>
    <note>
       <to>Raj</to>
       <from>AEC</from>
       <heading>Invitation</heading>
       <body>Welcome to Aditya!</body>
    </note>
   !DOCTYPE note defines that the root element of this
    document is note
   !ELEMENT note defines that the note element contains
    four child elements: "to,from,heading,body"
   !ELEMENT to defines the to element to be of type
    "#PCDATA"
   !ELEMENT from defines the from element to be of type
    "#PCDATA"
   !ELEMENT heading defines the heading element to be
    of type "#PCDATA"
   !ELEMENT body defines the body element to be of type
    "#PCDATA"
<person>
   <name> K.Vijay Kumar </name>      Exactly one name
   <greet> Happy new year </greet>   At most one greeting
   <addr>19-12, main road </addr>    As many address
   <addr> Kakinada </addr>           lines as needed
   <tel> 943786254 </tel>
                                  Mixed telephones
   <fax> 227862544 </fax>
                                  and faxes
   <tel> 227862551 </tel>
   <email> vkumar123@gmail.com </email>      As many
                                             as needed
</person>




                                                            30
   name          to specify a name element
   greet?        to specify an optional
            (0 or 1) greet elements
   name, greet? to specify a name followed by
            an optional greet
   addr*               to specify 0 or more address
                lines
   tel | fax           a tel or a fax element
   (tel | fax)*        0 or more repeats of tel or fax
   email*         0 or more email elements


                                                          31
   So the whole structure of a person entry is
    specified by

          name, greet?, addr*, (tel | fax)*, email*


   This is known as a regular expression




                                                      32
<?xml version="1.0" encoding="UTF-8"?>      The name of
<!DOCTYPE addressbook [                      the DTD is
  <!ELEMENT addressbook (person*)>          addressbook
  <!ELEMENT person
     (name, greet?, address*, (fax | tel)*, email*)>
  <!ELEMENT name (#PCDATA)>
  <!ELEMENT greet        (#PCDATA)>       The syntax
  <!ELEMENT address (#PCDATA)>            of a DTD is
  <!ELEMENT tel        (#PCDATA)>         not XML
  <!ELEMENT fax (#PCDATA)>
                                          syntax
  <!ELEMENT email (#PCDATA)>
]>
      “Internal” means that the DTD and the
        XML Document are in the same file            33
   Suffixes:
     ?          optional         foreword?
     +          one or more      chapter+
     *          zero or more     appendix*
   Separators
     ,          both, in order   foreword?, chapter+
     |          or               section|chapter
   Grouping
     ()         grouping         (section|chapter)+
   The syntax is <!ELEMENT name category>
    ◦ The name is the element name used in start and end
      tags
    ◦ The category may be EMPTY:
      In the DTD: <!ELEMENT br EMPTY>
      In the XML: <br></br> or just <br />
    ◦ In the XML, an empty element may not have any
      content between the start tag and the end tag
    ◦ An empty element may (and usually does) have
      attributes
   The syntax is <!ELEMENT name category>
    ◦ The category may be ANY
       This indicates that any content--character data,
       elements, even undeclared elements--may be used
      Since the whole point of using a DTD is to define the
       structure of a document, ANY should be avoided
       wherever possible
    ◦ The category may be (#PCDATA), indicating that only
      character data may be used
        In the DTD: <!ELEMENT paragraph (#PCDATA)>
        In the XML: <paragraph>A shot rang out!</paragraph>
        The parentheses are required!
        Note: In (#PCDATA), whitespace is kept exactly as entered
        Elements may not be used within parsed character data
        Entities are character data, and may be used
   A category may describe one or more children:
        <!ELEMENT novel (foreword, chapter+)>
    ◦   Parentheses are required, even if there is only one child
    ◦   A space must precede the opening parenthesis
    ◦   Commas (,) between elements mean that all children must
        appear, and must be in the order specified
    ◦   “|” separators means any one child may be used
    ◦   All child elements must themselves be declared
    ◦   Children may have children
    ◦   Parentheses can be used for grouping:
        <!ELEMENT novel (foreword, (chapter+|section+))>
   # #PCDATA describes elements with only
    character data
   #PCDATA can be used in an “or” grouping:
    ◦ <!ELEMENT note (#PCDATA|message)*>
    ◦ This is called mixed content
    ◦ Certain (rather severe) restrictions apply:
      #PCDATA must be first
      The separators must be “|”
      The group must be starred (meaning zero or more)
   The format of an attribute is:
       <!ATTLIST element-name
             name type requirement
             name type requirement>
    where the name-type-requirement may be
    repeated as many times as desired
    ◦ Note that only spaces separate the parts, so careful
      counting is essential
    ◦ The element-name tells which element may have these
      attributes
    ◦ The name is the name of the attribute
    ◦ Each element has a type, such as CDATA (character data)
    ◦ Each element may be required, optional, or “fixed”
    ◦ In the XML, attributes may occur in any order
   There are ten attribute types
   These are the most important ones:
    ◦ CDATA                The value is character data
    ◦ (man|woman|child)    The value is one of enumerated
                           values
    ◦ ID                   The value is a unique identifier
      ID values must be legal XML names and must be unique
       within the document
    ◦ NMTOKEN       The value is a legal XML name
      This is sometimes used to disallow whitespace in the name
      It also disallows numbers, since an XML name cannot begin
       with a digit
   IDREF      The ID of another element
   IDREFS     A list of other IDs
   NMTOKENS   A list of valid XML names
   ENTITY     An entity
   ENTITIES   A list of entities
   NOTATION   A notation
   xml:       A predefined XML value
   Recall that an attribute has the form
    <!ATTLIST element-name name type requirement>
   The requirement is one of:
    ◦ A default value, enclosed in quotes
       Example: <!ATTLIST degree CDATA "PhD">
    ◦ #REQUIRED
       The attribute must be present
    ◦ #IMPLIED
       The attribute is optional
    ◦ #FIXED "value"
       The attribute always has the given value
       If specified in the XML, the same value must be used
Invoice Element Declaration:
 <?xml version=“1.0” ?>
 <!ELEMENT employee (#PCDATA)>




<! ElementName AttributeName Type Default >


<!ATTLIST employee type (FullTime | PartTime) “FullTime” >



  Usage in XML file:
    <?xml version=“1.0” ?>
    <employee type=“PartTime”/>
   CDATA
     ◦ CDATA attributes are strings , any text is allowed
   ID
     ◦ The values of an ID attribute must be a name. All id the ID attributes used in a
       document must be unique. IDs uniquely identify individual elements in a
       document.Elements can only have a single ID attrinute
   IDREF or IDREFS
     ◦ An IDREF attributes value must be the value of a single ID attribute on some
       element in the document. The value of an IDREFs attribute may contain multiple
       IDREF values seperated by white space.
   ENTITY or ENTITIES
     ◦ An ENTITY attribute’s must be the name of a single ENTITY. The value of an
       ENTITIES attribute may contain multiple entity names separated by white space.
   NMTOKEN or NMTOKENS
     ◦ Name token attributes are a restricted form of string attribute, but there are no
       other restrictions on the word.
   List of Names Enumerated
     ◦ You can specify that the value of an attribute must be taken from a specific list
       of names. This frequently called an enumerated type because each of the
       possible values must be explicitely enumerated in the declaration
   #REQUIRED
    ◦ The attribute must have an explicitly specified value for every occurrence of the
      element in the document
   #IMPLIED
    ◦ The attribute value is not required and no default value is provided. If a value is not
      specified the XMP processor must proceed without one.
   “value”
    ◦ An attrubute can be given any legal value as a default. The attribute value is not
      required on each element of the document, and if it is not present it will appear to be
      the specified default
   #FIXED “value”
    ◦ An attribute declaration may specify that an attribute has a fixed value. In this case,
      the attribute is not required, but if it occurrs, it must have the specified value. If it is
      not present, it will appear to be the specified defualt
   CDATA                          ID
    ◦ Character data                ◦ Unique ID
   NMTOKEN                        IDREF
    ◦ Single token                  ◦ Match to ID
   NMTOKENS                       IDREFS
    ◦ Multiple tokens               ◦ Match to multiple ID's
   ENTITY                         NOTATION
    ◦ Attribute is entity ref       ◦ Describe non-XML data
   ENTITIES                       Name group
    ◦ Multiple entity ref's         ◦ Restricted list
   CDATA
    ◦   name = "Tom Jones"        ID
   NMTOKEN                        ◦    ID = "P09567"
    ◦   color="red"               IDREF
   NMTOKENS                       ◦    IDREF="P09567"
    ◦   values=“A12 A15 A34"      IDREFS
   ENTITY                         ◦    IDREFS="A01 A02"

    ◦   photo="MyPic"             NOTATION
   ENTITIES                       ◦    FORMAT="TeX"
    ◦   photos="pic1 pic2"        Name group
                                   ◦    coord="X"
   Can specify a default attribute value for
    when its missing from XML document, or
    state that value must be entered
    ◦   #REQUIRED   Must be specified
    ◦   #IMPLIED          May be specifed
    ◦   "default"   Default value if unspecified
    ◦   #FIXED      Only one value allowed


<ATTLIST tag      name    type        default>
<!ATTLIST seqlist sepchar NMTOKEN     #REQUIRED
                  type    (alpha|num) "num"
   There are exactly five predefined entities: &lt;, &gt;, &amp;,
    &quot;, and &apos;
   Additional entities can be defined in the DTD:
      <!ENTITY copyright "Copyright Dr. Dave">
   Entities can be defined in another document:
      <!ENTITY copyright SYSTEM "MyURI">
   Example of use in the XML:
      This document is &copyright; 2002.
•   Entities are a way to include fixed text (sometimes called
    “boilerplate”)
•   Entities should not be confused with character references,
    which are numerical values between & and #
    • Example: &233#; or &xE9#; to indicate the character é
   In XML, element names are defined by the developer. This often
     results in a conflict when trying to mix XML documents from
     different XML applications.

This XML carries HTML table information:   This XML carries information about a table
<table>                                    (a piece of furniture):
 <tr>                                      <table>
  <td>Apples</td>                           <name>Wooden Table</name>
  <td>Bananas</td>                          <width>80</width>
 </tr>                                      <length>120</length>
</table>                                   </table>



 •If these both XML tags were added together, there would be a name conflict.
 • Both contain a <table> element, but the elements have different content and
 meaning.
 •An XML parser will not know how to handle these differences.
Name conflicts in XML can easily be avoided using a name prefix.
This XML carries information about an HTML table, and a piece of
  furniture:
<h:table>
                             In the example above, there
 <h:tr>
  <h:td>Apples</h:td>        will be no conflict because the
  <h:td>Bananas</h:td>       two <table> elements have
 </h:tr>                     different names.
</h:table>
<f:table>
 <f:name>Wooden Table</f:name>
 <f:width>80</f:width>
 <f:length>120</f:length>
</f:table>
   When using prefixes in XML, a so-called namespace for the prefix must be
    defined.
   The namespace is defined by the xmlns attribute in the start tag of an element.
   The namespace declaration has the following
   syntax. xmlns:prefix="URI".

<root xmlns:h="http://www.w3.org/TR/html4/"
        xmlns:f=“http://www.w3schools.com/furniture”>
<h:table>
 <h:tr>
  <h:td>Apples</h:td>
  <h:td>Bananas</h:td>
 </h:tr>
</h:table>
<f:table>
 <f:name>African Coffee Table</f:name>
 <f:width>80</f:width>
 <f:length>120</f:length>
</f:table>
</root>
   Recall that DTDs are used to define the tags
    that can be used in an XML document
   An XML document may reference more than
    one DTD
   Namespaces are a way to specify which DTD
    defines a given tag
   XML, like Java, uses qualified names
    ◦   This helps to avoid collisions between names
    ◦   Java: myObject.myVariable
    ◦   XML: myDTD:myTag
    ◦   Note that XML uses a colon (:) rather than a dot (.)


                                                 53
   A namespace is defined as a unique string
    ◦ To guarantee uniqueness, typically a URI (Uniform
      Resource Indicator) is used, because the author
      “owns” the domain
    ◦ It doesn't have to be a “real” URI; it just has to be
      a unique string
    ◦ Example: http://www.matuszek.org/ns
      There are two ways to use namespaces:
    ◦ Declare a default namespace
    ◦ Associate a prefix with a namespace, then use the
      prefix in the XML to refer to the namespace



                                              54
   In any start tag you can use the reserved attribute name xmlns:
      <book xmlns="http://www.matuszek.org/ns">
    ◦ This namespace will be used as the default for all elements
      up to the corresponding end tag
    ◦ You can override it with a specific prefix

   You can use almost this same form to declare a prefix:
      <book xmlns:dave="http://www.matuszek.org/ns">
    ◦ Use this prefix on every tag and attribute you want to use
      from this namespace, including end tags--it is not a default
      prefix
      <dave:chapter dave:number="1">To Begin</dave:chapter>

   You can use the prefix in the start tag in which it is defined:
      <dave:book xmlns:dave="http://www.matuszek.org/ns">


                                                      55
   XSL stands for EXtensible Stylesheet Language, and is a
    style sheet language for XML documents.
   XSLT stands for XSL Transformations.
   XSLT is used to transform XML documents into other
    formats, like XHTML.
   XSLT is used to transform an XML document into another
    XML document, or another type of document that is
    recognized by a browser, like HTML and XHTML.
   XSLT does this by transforming each XML element into an
    (X)HTML element.
   With XSLT you can add/remove elements and attributes
    to or from the output file.
   You can also rearrange and sort elements.
   You can also perform tests and make decisions about
    which elements to hide and display, and a lot more.
   DTDs are a very weak specification language
    ◦ You can’t put any restrictions on element contents.
    ◦ It’s difficult to specify:
      All the children must occur, but may be in any order.
      This element must occur a certain number of times.
    ◦ There are only ten data types for attribute values.
   DTDs aren’t written in XML!
    ◦ If you want to do any validation, you need one parser for
      the XML and another for the DTD.
    ◦ This makes XML parsing harder than it needs to be.
    ◦ There is a newer and more powerful technology:
          XML Schemas.
    ◦ However, DTDs are still very much in use.
   An XML Schema describes the structure of an XML document.
   XML Schema is an XML-based alternative to DTD.
   The XML Schema language is also referred to as XML Schema
    Definition (XSD).
   Ex: remainder.xsd
<?xml version="1.0"?>
< xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema>
 <xs:element name="note">
   <xs:complexType>
    <xs:sequence>
      <xs:element name="to" type="xs:string"/>
      <xs:element name="from" type="xs:string"/>
      <xs:element name="heading" type="xs:string"/>
      <xs:element name="body" type="xs:string"/>
   </xs:sequence>
  </xs:complexType>
 < /xs:element>
< /xs:schema>
<?xml version="1.0"?>
< note xmlns="http://www.w3schools.com"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.w3schools.com note.xsd">
   < to>Ravi</to>
   < from>AEC</from>
   < heading>Reminder</heading>
   < body>Welcome to Aditya</body>
< /note>

<?xml version="1.0"?>
< note xmlns="http://www.w3schools.com"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:noNamespaceSchemaLocation=“note.xsd">
   < to>Ravi</to>
   < from>AEC</from>
   < heading>Reminder</heading>
   < body>Welcome to Aditya</body>
< /note>
   The purpose of an XML Schema is to define the legal
    building blocks of an XML document, just like a DTD.
   An XML Schema:
    ◦   defines elements that can appear in a document.
    ◦   defines attributes that can appear in a document.
    ◦   defines which elements are child elements.
    ◦   defines the order of child elements.
    ◦   defines the number of child elements.
    ◦   defines whether an element is empty or can include text.
    ◦   defines data types for elements and attributes.
    ◦   defines default and fixed values for elements and
        attributes.
   XML Schemas are the Successors of DTDs
    ◦   XML   Schemas are extensible to future additions.
    ◦   XML   Schemas are richer and more powerful than DTDs.
    ◦   XML   Schemas are written in XML.
    ◦   XML   Schemas support data types.
    ◦   XML   Schemas support namespaces.
   XML Schemas are much more powerful than DTDs.
   XML Schemas is the support for data types.
    ◦ It is easier to validate the correctness of data.
    ◦ It is easier to work with data from a database.
    ◦ It is easier to define data facets (restrictions on data), data patterns
      (data formats) and easy to convert data between different data types.
   XML Schemas is that they are written in XML.
    ◦ You don't have to learn a new language.
    ◦ You can use your XML editor to edit your Schema files.
    ◦ You can use your XML parser to parse your Schema files.
   XML Schemas provides Secure Data Communication.
    ◦ A date like: "03-11-2004" will be interpreted as in some countries,
      3.November and in other as 11.March.
    ◦ However, an XML element with a data type like this:
    ◦ <date type="date">2004-03-11</date>
    ◦ ensures a mutual understanding between sender and reciever, i.e., the
      XML "date“ type requires the format "YYYY-MM-DD".
   XML Schemas are Extensible.
    ◦ Reuse your Schema in other Schemas.
    ◦ Create your own data types derived from the standard types.
    ◦ Reference multiple schemas in the same document.
 Defining Simple Element :
  <xs:element name="xxx" type="yyy"/>
 XML Schema has a lot of built-in data types.
   o xs:string
   o xs:decimal
   o xs:integer
   o xs:boolean
   o xs:date
   o xs:time

Example
 Here are some XML elements:
   <lastname>Refsnes</lastname>
   <age>36</age>
   <dateborn>1970-03-27</dateborn>
 Here are the corresponding simple element definitions in Schema:
   <xs:element name="lastname" type="xs:string"/>
   <xs:element name="age" type="xs:integer"/>
   <xs:element name="dateborn" type="xs:date"/>
   <xs:element name="color" type="xs:string" default="red"/>
   <xs:element name="color" type="xs:string" fixed="red"/>
 Syntex :
   <xs:attribute name="xxx" type="yyy"/>
 Example
 Here is an XML element with an attribute:
   <lastname lang="EN">Smith</lastname>
 And here is the corresponding attribute definition:
   <xs:attribute name="lang" type="xs:string"/>
   <xs:attribute name="lang" type="xs:string" default="EN"/>
   <xs:attribute name="lang" type="xs:string" fixed="EN"/>
   <xs:attribute name="lang" type="xs:string" use="required"/>
   Restrictions are used to define acceptable values for XML
    elements or attributes.
    Restrictions on XML elements are called facets.

Restrictions on Values
 The example defines an element called "age" with a restriction.
The value of age cannot be lower than 0 or greater than 100:
<xs:element name="age">
 <xs:simpleType>
  <xs:restriction base="xs:integer">
   <xs:minInclusive value="0"/>
   <xs:maxInclusive value="100"/>
  </xs:restriction>
 </xs:simpleType>
</xs:element>
 The example below defines an element called "car" with a
restriction.
The only acceptable values are: Audi, Golf, BMW:
<xs:element name="car">
 <xs:simpleType>
  <xs:restriction base="xs:string">
   <xs:enumeration value="Audi"/>
   <xs:enumeration value="Golf"/>
   <xs:enumeration value="BMW"/>
  </xs:restriction>
 </xs:simpleType>
</xs:element>
 Below example defines an element "letter" with a restriction.
 The acceptable value is ONE of the LOWERCASE letters from a to z:
<xs:element name="letter">
 <xs:simpleType>
   <xs:restriction base="xs:string">
    <xs:pattern value="[a-z]"/>
   </xs:restriction>
 </xs:simpleType>
</xs:element>
 The only acceptable value is THREE of the UPPERCASE letters from a
to z:
<xs:element name="initials">
 <xs:simpleType>
   <xs:restriction base="xs:string">
    <xs:pattern value="[A-Z][A-Z][A-Z]"/>
   </xs:restriction>
 </xs:simpleType>
</xs:element>
 The next example defines an element called "gender" with a
restriction. The only acceptable value is male OR female:
<xs:element name="gender">
 <xs:simpleType>
   <xs:restriction base="xs:string">
    <xs:pattern value="male|female"/>
   </xs:restriction>
 </xs:simpleType>
</xs:element>
 The example defines an element “mobileno" with a restriction.
 There must be exactly 10 digits:
<xs:element name=“mobileno">
 <xs:simpleType>
   <xs:restriction base="xs:string">
    <xs:pattern value="[0-9]{10}"/>
   </xs:restriction>
 </xs:simpleType>
</xs:element>
 The whiteSpace constraint is set to "preserve", which means that the XML
processor WILL NOT remove any white space characters:
<xs:element name="address">
 <xs:simpleType>
  <xs:restriction base="xs:string">
   <xs:whiteSpace value="preserve"/>
  </xs:restriction>
 </xs:simpleType>
</xs:element>
 The whiteSpace constraint is set to "replace", which means that the XML
processor WILL REPLACE all white space characters (line feeds, tabs, spaces, and
carriage returns) with spaces:
<xs:element name="address">
 <xs:simpleType>
  <xs:restriction base="xs:string">
   <xs:whiteSpace value="replace"/>
  </xs:restriction>
 </xs:simpleType>
</xs:element>
   The value must be minimum five characters
    and maximum eight characters:
   <xs:element name="password">
     <xs:simpleType>
      <xs:restriction base="xs:string">
       <xs:minLength value="5"/>
       <xs:maxLength value="8"/>
      </xs:restriction>
     </xs:simpleType>
    </xs:element>
   A complex element is an XML element that
    contains other elements and/or attributes.
   There are four kinds of complex elements:
    ◦   empty elements
    ◦   elements that contain only other elements
    ◦   elements that contain only text
    ◦   elements that contain both other elements and text
 It is a software library (or a package) that
  provides methods (or interfaces) for client
  applications to work with XML documents
 It checks the well-formattedness
 It may validate the documents
 It does a lot of other detailed things so that a
  client is shielded from that complexities
 DOM: Document Object Model
 SAX: Simple API for XML
 A DOM parser implements DOM API
 A SAX parser implement SAX API
 Most major parsers implement both
  DOM and SAX API’s
   A DOM document is an object containing
    all the information of an XML document

   It is composed of a tree (DOM tree) of
    nodes , and various nodes that are
    somehow associated with other nodes in
    the tree but are not themselves part of the
    DOM tree
   There are 12 types of nodes in a DOM
    Document object
         Document node
         Element node
         Text node
         Attribute node
         Processing instruction node
         …….
Sample XML document

<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href=“test.css"?>
<!-- It's an xml-stylesheet processing instruction. -->
<!DOCTYPE shapes SYSTEM “shapes.dtd">
<shapes>
      ……
      <squre color=“BLUE”>
           <length> 20 </length>
      </squre>
      ……
</shapes>
 A DOM parser creates an internal structure in
  memory which is a DOM document object
 Client applications get the information of the
  original XML document by invoking methods
  on this Document object or on other objects it
  contains
 DOM parser is tree-based (or DOM obj-based)
 Client application seems to be pulling the data
  actively, from the data flow point of view
   Advantage:
       (1) It is good when random access to widely
            separated parts of a document is
    required
       (2) It supports both read and write operations

   Disadvantage:
       (1) It is memory inefficient
       (2) It seems complicated, although not really
 It does not first create any internal structure
 Client does not specify what methods to call
 Client just overrides the methods of the API
  and place his own code inside there
 When the parser encounters start-tag, end-
  tag,etc., it thinks of them as events
 When such an event occurs, the handler
  automatically calls back to a particular method
  overridden by the client, and feeds as
  arguments the method what it sees
 SAX parser is event-based,it works like an
  event handler in Java (e.g. MouseAdapter)
 Client application seems to be just receiving
  the data inactively, from the data flow point of
  view
 Advantage:
     (1) It is simple
     (2) It is memory efficient
     (3) It works well in stream application
 Disadvantage:
  The data is broken into pieces and clients
  never have all the information as a whole
  unless they create their own data structure

Weitere ähnliche Inhalte

Was ist angesagt?

Basic xml syntax
Basic xml syntaxBasic xml syntax
Basic xml syntax
Raghu nath
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
yht4ever
 
Kickstart Tutorial Xml
Kickstart Tutorial XmlKickstart Tutorial Xml
Kickstart Tutorial Xml
LiquidHub
 

Was ist angesagt? (20)

Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Xml
XmlXml
Xml
 
Xml
XmlXml
Xml
 
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - Introduction to XML
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - Introduction to XMLFergus Fahey - DRI/ARA(I) Training: Introduction to EAD - Introduction to XML
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - Introduction to XML
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
XML
XMLXML
XML
 
XML
XMLXML
XML
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
01 Xml Begin
01 Xml Begin01 Xml Begin
01 Xml Begin
 
XML | Computer Science
XML | Computer ScienceXML | Computer Science
XML | Computer Science
 
Basic xml syntax
Basic xml syntaxBasic xml syntax
Basic xml syntax
 
XML Introduction
XML IntroductionXML Introduction
XML Introduction
 
SQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML DataSQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML Data
 
Basic XML
Basic XMLBasic XML
Basic XML
 
WEB TECHNOLOGIES XML
WEB TECHNOLOGIES XMLWEB TECHNOLOGIES XML
WEB TECHNOLOGIES XML
 
XML
XMLXML
XML
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Xml
XmlXml
Xml
 
Kickstart Tutorial Xml
Kickstart Tutorial XmlKickstart Tutorial Xml
Kickstart Tutorial Xml
 
XML and DTD
XML and DTDXML and DTD
XML and DTD
 

Andere mochten auch (8)

Servletand sessiontracking
Servletand sessiontrackingServletand sessiontracking
Servletand sessiontracking
 
How to Use LinkedIn for Business
How to Use LinkedIn for BusinessHow to Use LinkedIn for Business
How to Use LinkedIn for Business
 
Mc unit 4-jwfiles
Mc unit 4-jwfilesMc unit 4-jwfiles
Mc unit 4-jwfiles
 
Unit3wt
Unit3wtUnit3wt
Unit3wt
 
Unit2wt
Unit2wtUnit2wt
Unit2wt
 
Java Scripts
Java ScriptsJava Scripts
Java Scripts
 
Lightroom 6 read me
Lightroom 6 read meLightroom 6 read me
Lightroom 6 read me
 
Mobile computing
Mobile computingMobile computing
Mobile computing
 

Ähnlich wie Xml

XML Presentation-2
XML Presentation-2XML Presentation-2
XML Presentation-2
Sudharsan S
 
Web Services Part 1
Web Services Part 1Web Services Part 1
Web Services Part 1
patinijava
 
Introduction to xml
Introduction to xmlIntroduction to xml
Introduction to xml
soumya
 

Ähnlich wie Xml (20)

XML Presentation-2
XML Presentation-2XML Presentation-2
XML Presentation-2
 
eXtensible Markup Language (By Dr.Hatem Mohamed)
eXtensible Markup Language (By Dr.Hatem Mohamed)eXtensible Markup Language (By Dr.Hatem Mohamed)
eXtensible Markup Language (By Dr.Hatem Mohamed)
 
uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2
 
Unit 5 xml (1)
Unit 5   xml (1)Unit 5   xml (1)
Unit 5 xml (1)
 
Unit3wt
Unit3wtUnit3wt
Unit3wt
 
Xml schema
Xml schemaXml schema
Xml schema
 
Xml
Xml Xml
Xml
 
Xml intro1
Xml intro1Xml intro1
Xml intro1
 
XML/XSLT
XML/XSLTXML/XSLT
XML/XSLT
 
Xml
XmlXml
Xml
 
Web Services Part 1
Web Services Part 1Web Services Part 1
Web Services Part 1
 
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5   XMLM.FLORENCE DAYANA WEB DESIGN -Unit 5   XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
 
xml.pptx
xml.pptxxml.pptx
xml.pptx
 
Xml andweb services
Xml andweb services Xml andweb services
Xml andweb services
 
XML-Unit 1.ppt
XML-Unit 1.pptXML-Unit 1.ppt
XML-Unit 1.ppt
 
Xml
XmlXml
Xml
 
Intro xml
Intro xmlIntro xml
Intro xml
 
paper about xml
paper about xmlpaper about xml
paper about xml
 
Module 5 XML Notes.pdf
Module 5 XML Notes.pdfModule 5 XML Notes.pdf
Module 5 XML Notes.pdf
 
Introduction to xml
Introduction to xmlIntroduction to xml
Introduction to xml
 

Mehr von vamsi krishna

Software project management
Software project managementSoftware project management
Software project management
vamsi krishna
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data mining
vamsi krishna
 
Advanced computer architecture
Advanced computer architectureAdvanced computer architecture
Advanced computer architecture
vamsi krishna
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,post
vamsi krishna
 
Javax.servlet,http packages
Javax.servlet,http packagesJavax.servlet,http packages
Javax.servlet,http packages
vamsi krishna
 

Mehr von vamsi krishna (10)

Software project management
Software project managementSoftware project management
Software project management
 
Network programming
Network programmingNetwork programming
Network programming
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data mining
 
Advanced computer architecture
Advanced computer architectureAdvanced computer architecture
Advanced computer architecture
 
Web technologies
Web technologiesWeb technologies
Web technologies
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,post
 
Javax.servlet,http packages
Javax.servlet,http packagesJavax.servlet,http packages
Javax.servlet,http packages
 
Cookies
CookiesCookies
Cookies
 
Unit4wt
Unit4wtUnit4wt
Unit4wt
 
Unit 1wt
Unit 1wtUnit 1wt
Unit 1wt
 

Xml

  • 1.
  • 2. It is Data Description Language.  It is a flexible way to create common information formats.  It also provide to share both the format and the data on the World Wide Web, intranets, and elsewhere.  Computer users might agree on a standard or common way to describe a piece of information using a platform independent format with XML.  Such a standard way of describing data would enable a user, to exchange the data with description between any type of platforms.
  • 3. XML → Extensible Markup Language  Just a text file, with tags, attributes, free text... looks much like HTML  Used to create special-purpose markup languages  Facilitates sharing of structured text and information across the Internet.  XML Structure facilitates parsing of data.  XML tags are not pre-defined, you must define your own tags.  Two Apps still have to agree on what the meaning of the "descriptive tags“.
  • 4. HTML MML SGML CML XML XML for Developers - Version 1a 4
  • 5. HTML XML <H1>Cars for Sale</H1> <for_sale> <h3>Audi 80</h3> <heading>Cars for Sale</heading> 1800 cc<BR> <make>Audi 80</make> Blue<BR> <engine>1800 cc</engine> Manual<BR> <color>Blue</color> 1988<BR> <transmission>Manual</transmission> $1250 <year>1988</year> <P> <price>$1250</price> <H3>Toyota Corolla</h3> <make>Toyota Corolla</make> 1250 cc<BR> <engine>1250</engine> Red<BR> <color>Red</color> Automatic<BR> <transmission>Automatic</transmission> 1984<BR> <year>1984</year> Red<BR> <price>$940</price> $940<BR> </for_sale> Example1.html Example1.xml XML for Developers - Version 1a 5
  • 6. HTML XML in IE5 browser Cars for Sale Audi 80 1800 cc Blue Manual 1988 $1250 Toyota Corolla 1250 cc Red Automatic 1984 Red $940 6
  • 7. XML HTML  It is free & extensible  Derived language from language. SGML.  Tags are user-defined.  Tags are pre-defined  It is about describing  It is about displaying information  Extensible set of tags information.  Content orientated  Fixed set of tags  Standard Data  Presentation oriented infrastructure  No data validation  Allows multiple output capabilities forms  Single presentation
  • 8. Many number of extensible languages are defined from XML. ◦ Wireless Markup Language (WML). ◦ Chemical Markup Language (CML) ◦ Bio-informatic Sequence Markup Language (BSML). ◦ Mathematical Markup Language (MathML). ◦ Open Office Markup Language ( OOML )  Directly usable over the internet, means any platform and any protocol can understand XML.  Plain text documents and easier to write & transport Programs.  Can be used with SGML without any conflict along with other web technologies.  Clearly understandable by human.  Accurate validation can be possible with the help of DTD and schema.  We can define structure of data and along with description.  Semi structured Data Bases are also defined using XML.
  • 9. Web publishing : XML allows the customer to customize web pages. With XML, you store the data once and then render that content for different viewers.  Web searching and automating Web tasks: XML defines the type of information contained in a document, making it easier to return useful results when searching the Web.  General applications: XML provides a standard method to use, store, transmit, and display data for all kinds of applications and devices.  e-business applications: XML allows to make electronic data interchange (EDI) for both business-to-business transactions, and business-to-consumer transactions.  Metadata applications: XML makes it easier to express metadata in a portable, reusable format.  Pervasive computing: XML provides portable and structured information types for display on pervasive (wireless) computing devices such as personal digital assistants (PDAs), cellular phones.
  • 10. DTD (Document Type Definition) and XML Schemas are used to define legal XML tags and their attributes for particular purposes  CSS (Cascading Style Sheets) describe how to display HTML or XML in a browser  XSLT (eXtensible Stylesheet Language Transformations) and XPath are used to translate from one form of XML to another  DOM (Document Object Model), SAX (Simple API for XML, and JAXP (Java API for XML Processing) are all APIs for XML parsing 10
  • 11.  XML documents use a self-describing and simple syntax. <?xml version=“1.0” encoding=“ISO-8859-1”?> <note> <to> Ramesh </to> <from> Kiran</from> <heading> Reminder </heading> <body> Don’t forget me this weekend </body> </note>
  • 12. XML Syntax consists of ◦ XML Declaration ◦ XML Elements ◦ XML Attributes  The first line of an XML document should always consist of an XML declaration defining the version of XML.  An XML document may start with one or more processing instructions (PIs) or directives: <?xml version="1.0"?> <?xml-stylesheet type="text/css" href="ss.css"?>  Following the directives, there must be exactly one root element containing all the rest of the XML: <weatherReport> ... </weatherReport>
  • 13. XML elements have relationships  Elements can have different content types  Element Naming Rules: 1) Names contain letters, numbers & other characters. 2) Names must not start with a number or punctuation marks. 3) Names must not start with the letters xml. 4) Names cannot contain spaces.  Names (as used for tags and attributes) must begin with a letter or underscore, and can consist of: ◦ Letters, both Roman (English) and foreign ◦ Digits, both Roman and foreign . (dot) - (hyphen) _ (underscore)
  • 14. <person gender="male"> <firstname>Ravi</firstname> <lastname>Kumar</lastname> </person>
  • 15. Attributes and elements are somewhat interchangeable  Example using just elements: <name> <first>David</first> <last>Matuszek</last> </name>  Example using attributes: <name first="David" last="Matuszek"></name>  You will find that elements are easier to use in your programs--this is a good reason to prefer them  Attributes often contain metadata, such as unique IDs  Generally speaking, browsers display only elements (values enclosed by tags), not tags and attributes 15
  • 16. While elements can contain multiple values attributes cannot  Attributes are not expandable  Elements can describe structure but not Attributes  Attributes are more difficult to manipulate by program code than elements  Attribute values are difficult to validate against a DTD
  • 17. <novel> <foreword> <paragraph> This is the great American novel.</paragraph> </foreword> <chapter number="1"> <paragraph>It was a dark and stormy night.</paragraph> <paragraph>Suddenly, a shot rang out!</paragraph> </chapter> </novel> novel foreword chapter number="1" paragraph paragraph paragraph This is the great It was a dark Suddenly, a shot American novel. and stormy night. rang out! 17
  • 18. Every element must have both a start tag and an end tag, e.g. <name> ... </name> ◦ But empty elements can be abbreviated: <break />. ◦ XML tags are case sensitive ◦ XML tags may not begin with the letters xml, in any combination of cases  Elements must be properly nested, e.g. not <b><i>bold and italic</b></i>  Every XML document must have one and only one root element.  The values of attributes must be enclosed in single or double quotes, e.g. <time unit="days">  Character data cannot contain < or & 18
  • 19. Start with <?xml version="1.0"?>  XML is case sensitive  You must have exactly one root element that encloses all the rest of the XML  Every element must have a closing tag  Elements must be properly nested  Attribute values must be enclosed in double or single quotation marks  There are only five pre-declared entities 19
  • 20. Five special characters must be written as entities: &amp; for & (almost always necessary) &lt; for < (almost always necessary) &gt; for > (not usually necessary) &quot; for " (necessary inside double quotes) &apos; for ' (necessary inside single quotes)  These entities can be used even in places where they are not absolutely required  These are the only predefined entities in XML 20
  • 21. The XML declaration looks like this: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> ◦ The XML declaration is not required by browsers, but is required by most XML processors (so include it!) ◦ If present, the XML declaration must be first--not even whitespace should precede it ◦ Note that the brackets are <? and ?> ◦ version="1.0" is required (this is the only version so far) ◦ encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or something else, or it can be omitted ◦ standalone tells whether there is a separate DTD 21
  • 22. PIs (Processing Instructions) may occur anywhere in the XML document (but usually first)  A PI is a command to the program processing the XML document to handle it in a certain way  XML documents are typically processed by more than one program  Programs that do not recognize a given PI should just ignore it  General format of a PI: <?target instructions?>  Example: <?xml-stylesheet type="text/css" href="mySheet.css"?> 22
  • 23. <!-- This is a comment in both HTML and XML -->  Comments can be put anywhere in an XML document  Comments are useful for: ◦ Explaining the structure of an XML document ◦ Commenting out parts of the XML during development and testing  Comments are not elements and do not have an end tag  The blanks after <!-- and before --> are optional  The character sequence -- cannot occur in the comment  The closing bracket must be -->  Comments are not displayed by browsers, but can be seen by anyone who looks at the source code 23
  • 24. By default, all text inside an XML document is parsed.  You can force text to be treated as unparsed character data by enclosing it in <![CDATA[ ... ]]>  Any characters, even & and <, can occur inside a CDATA  Whitespace inside a CDATA is (usually) preserved  The only real restriction is that the character sequence ]]> cannot occur inside a CDATA  CDATA is useful when your text has a lot of illegal characters (for example, if your XML document contains some HTML text) 24
  • 25. <note> <to> Ramesh </to> <from> Kiran</from> <heading> Reminder </heading> <body> Don’t forget me this weekend </body> </note> <remainder> <heading> Reminder </heading> <to> Ramesh </to> <from> Kiran</from> <message> Don’t forget me this weekend </message> </ remainder >
  • 26. Basically DTD is used to specify the set of rules for structuring data in xml file.  It is used to define the building blocks of XML document.  Using DTD we can specify the various elements types, attributes and their relationship.  DTD constraints structure of XML data ◦ What elements can occur ◦ What attributes can/must an element have. ◦ What subelements can/must occur inside each element, and how many times.  DTD syntax ◦ <!ELEMENT element (subelements-specification) > ◦ <!ATTLIST element (attributes) >
  • 27. A DTD adds syntactical requirements in addition to the well-formed requirement  It helps in eliminating errors when creating or editing XML documents  It clarifies the intended semantics  It simplifies the processing of XML documents 27
  • 28. <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Raj</to> <from>AEC</from> <heading>Invitation</heading> <body>Welcome to Aditya!</body> </note>
  • 29. !DOCTYPE note defines that the root element of this document is note  !ELEMENT note defines that the note element contains four child elements: "to,from,heading,body"  !ELEMENT to defines the to element to be of type "#PCDATA"  !ELEMENT from defines the from element to be of type "#PCDATA"  !ELEMENT heading defines the heading element to be of type "#PCDATA"  !ELEMENT body defines the body element to be of type "#PCDATA"
  • 30. <person> <name> K.Vijay Kumar </name> Exactly one name <greet> Happy new year </greet> At most one greeting <addr>19-12, main road </addr> As many address <addr> Kakinada </addr> lines as needed <tel> 943786254 </tel> Mixed telephones <fax> 227862544 </fax> and faxes <tel> 227862551 </tel> <email> vkumar123@gmail.com </email> As many as needed </person> 30
  • 31. name to specify a name element  greet? to specify an optional (0 or 1) greet elements  name, greet? to specify a name followed by an optional greet  addr* to specify 0 or more address lines  tel | fax a tel or a fax element  (tel | fax)* 0 or more repeats of tel or fax  email* 0 or more email elements 31
  • 32. So the whole structure of a person entry is specified by name, greet?, addr*, (tel | fax)*, email*  This is known as a regular expression 32
  • 33. <?xml version="1.0" encoding="UTF-8"?> The name of <!DOCTYPE addressbook [ the DTD is <!ELEMENT addressbook (person*)> addressbook <!ELEMENT person (name, greet?, address*, (fax | tel)*, email*)> <!ELEMENT name (#PCDATA)> <!ELEMENT greet (#PCDATA)> The syntax <!ELEMENT address (#PCDATA)> of a DTD is <!ELEMENT tel (#PCDATA)> not XML <!ELEMENT fax (#PCDATA)> syntax <!ELEMENT email (#PCDATA)> ]> “Internal” means that the DTD and the XML Document are in the same file 33
  • 34. Suffixes: ? optional foreword? + one or more chapter+ * zero or more appendix*  Separators , both, in order foreword?, chapter+ | or section|chapter  Grouping () grouping (section|chapter)+
  • 35. The syntax is <!ELEMENT name category> ◦ The name is the element name used in start and end tags ◦ The category may be EMPTY:  In the DTD: <!ELEMENT br EMPTY>  In the XML: <br></br> or just <br /> ◦ In the XML, an empty element may not have any content between the start tag and the end tag ◦ An empty element may (and usually does) have attributes
  • 36. The syntax is <!ELEMENT name category> ◦ The category may be ANY  This indicates that any content--character data, elements, even undeclared elements--may be used  Since the whole point of using a DTD is to define the structure of a document, ANY should be avoided wherever possible ◦ The category may be (#PCDATA), indicating that only character data may be used  In the DTD: <!ELEMENT paragraph (#PCDATA)>  In the XML: <paragraph>A shot rang out!</paragraph>  The parentheses are required!  Note: In (#PCDATA), whitespace is kept exactly as entered  Elements may not be used within parsed character data  Entities are character data, and may be used
  • 37. A category may describe one or more children: <!ELEMENT novel (foreword, chapter+)> ◦ Parentheses are required, even if there is only one child ◦ A space must precede the opening parenthesis ◦ Commas (,) between elements mean that all children must appear, and must be in the order specified ◦ “|” separators means any one child may be used ◦ All child elements must themselves be declared ◦ Children may have children ◦ Parentheses can be used for grouping: <!ELEMENT novel (foreword, (chapter+|section+))>
  • 38. # #PCDATA describes elements with only character data  #PCDATA can be used in an “or” grouping: ◦ <!ELEMENT note (#PCDATA|message)*> ◦ This is called mixed content ◦ Certain (rather severe) restrictions apply:  #PCDATA must be first  The separators must be “|”  The group must be starred (meaning zero or more)
  • 39. The format of an attribute is: <!ATTLIST element-name name type requirement name type requirement> where the name-type-requirement may be repeated as many times as desired ◦ Note that only spaces separate the parts, so careful counting is essential ◦ The element-name tells which element may have these attributes ◦ The name is the name of the attribute ◦ Each element has a type, such as CDATA (character data) ◦ Each element may be required, optional, or “fixed” ◦ In the XML, attributes may occur in any order
  • 40. There are ten attribute types  These are the most important ones: ◦ CDATA The value is character data ◦ (man|woman|child) The value is one of enumerated values ◦ ID The value is a unique identifier  ID values must be legal XML names and must be unique within the document ◦ NMTOKEN The value is a legal XML name  This is sometimes used to disallow whitespace in the name  It also disallows numbers, since an XML name cannot begin with a digit
  • 41. IDREF The ID of another element  IDREFS A list of other IDs  NMTOKENS A list of valid XML names  ENTITY An entity  ENTITIES A list of entities  NOTATION A notation  xml: A predefined XML value
  • 42. Recall that an attribute has the form <!ATTLIST element-name name type requirement>  The requirement is one of: ◦ A default value, enclosed in quotes  Example: <!ATTLIST degree CDATA "PhD"> ◦ #REQUIRED  The attribute must be present ◦ #IMPLIED  The attribute is optional ◦ #FIXED "value"  The attribute always has the given value  If specified in the XML, the same value must be used
  • 43. Invoice Element Declaration: <?xml version=“1.0” ?> <!ELEMENT employee (#PCDATA)> <! ElementName AttributeName Type Default > <!ATTLIST employee type (FullTime | PartTime) “FullTime” > Usage in XML file: <?xml version=“1.0” ?> <employee type=“PartTime”/>
  • 44. CDATA ◦ CDATA attributes are strings , any text is allowed  ID ◦ The values of an ID attribute must be a name. All id the ID attributes used in a document must be unique. IDs uniquely identify individual elements in a document.Elements can only have a single ID attrinute  IDREF or IDREFS ◦ An IDREF attributes value must be the value of a single ID attribute on some element in the document. The value of an IDREFs attribute may contain multiple IDREF values seperated by white space.  ENTITY or ENTITIES ◦ An ENTITY attribute’s must be the name of a single ENTITY. The value of an ENTITIES attribute may contain multiple entity names separated by white space.  NMTOKEN or NMTOKENS ◦ Name token attributes are a restricted form of string attribute, but there are no other restrictions on the word.  List of Names Enumerated ◦ You can specify that the value of an attribute must be taken from a specific list of names. This frequently called an enumerated type because each of the possible values must be explicitely enumerated in the declaration
  • 45. #REQUIRED ◦ The attribute must have an explicitly specified value for every occurrence of the element in the document  #IMPLIED ◦ The attribute value is not required and no default value is provided. If a value is not specified the XMP processor must proceed without one.  “value” ◦ An attrubute can be given any legal value as a default. The attribute value is not required on each element of the document, and if it is not present it will appear to be the specified default  #FIXED “value” ◦ An attribute declaration may specify that an attribute has a fixed value. In this case, the attribute is not required, but if it occurrs, it must have the specified value. If it is not present, it will appear to be the specified defualt
  • 46. CDATA  ID ◦ Character data ◦ Unique ID  NMTOKEN  IDREF ◦ Single token ◦ Match to ID  NMTOKENS  IDREFS ◦ Multiple tokens ◦ Match to multiple ID's  ENTITY  NOTATION ◦ Attribute is entity ref ◦ Describe non-XML data  ENTITIES  Name group ◦ Multiple entity ref's ◦ Restricted list
  • 47. CDATA ◦ name = "Tom Jones"  ID  NMTOKEN ◦ ID = "P09567" ◦ color="red"  IDREF  NMTOKENS ◦ IDREF="P09567" ◦ values=“A12 A15 A34"  IDREFS  ENTITY ◦ IDREFS="A01 A02" ◦ photo="MyPic"  NOTATION  ENTITIES ◦ FORMAT="TeX" ◦ photos="pic1 pic2"  Name group ◦ coord="X"
  • 48. Can specify a default attribute value for when its missing from XML document, or state that value must be entered ◦ #REQUIRED Must be specified ◦ #IMPLIED May be specifed ◦ "default" Default value if unspecified ◦ #FIXED Only one value allowed <ATTLIST tag name type default> <!ATTLIST seqlist sepchar NMTOKEN #REQUIRED type (alpha|num) "num"
  • 49. There are exactly five predefined entities: &lt;, &gt;, &amp;, &quot;, and &apos;  Additional entities can be defined in the DTD: <!ENTITY copyright "Copyright Dr. Dave">  Entities can be defined in another document: <!ENTITY copyright SYSTEM "MyURI">  Example of use in the XML: This document is &copyright; 2002. • Entities are a way to include fixed text (sometimes called “boilerplate”) • Entities should not be confused with character references, which are numerical values between & and # • Example: &233#; or &xE9#; to indicate the character é
  • 50. In XML, element names are defined by the developer. This often results in a conflict when trying to mix XML documents from different XML applications. This XML carries HTML table information: This XML carries information about a table <table> (a piece of furniture): <tr> <table> <td>Apples</td> <name>Wooden Table</name> <td>Bananas</td> <width>80</width> </tr> <length>120</length> </table> </table> •If these both XML tags were added together, there would be a name conflict. • Both contain a <table> element, but the elements have different content and meaning. •An XML parser will not know how to handle these differences.
  • 51. Name conflicts in XML can easily be avoided using a name prefix. This XML carries information about an HTML table, and a piece of furniture: <h:table> In the example above, there <h:tr> <h:td>Apples</h:td> will be no conflict because the <h:td>Bananas</h:td> two <table> elements have </h:tr> different names. </h:table> <f:table> <f:name>Wooden Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table>
  • 52. When using prefixes in XML, a so-called namespace for the prefix must be defined.  The namespace is defined by the xmlns attribute in the start tag of an element.  The namespace declaration has the following  syntax. xmlns:prefix="URI". <root xmlns:h="http://www.w3.org/TR/html4/" xmlns:f=“http://www.w3schools.com/furniture”> <h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table> <f:table> <f:name>African Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table> </root>
  • 53. Recall that DTDs are used to define the tags that can be used in an XML document  An XML document may reference more than one DTD  Namespaces are a way to specify which DTD defines a given tag  XML, like Java, uses qualified names ◦ This helps to avoid collisions between names ◦ Java: myObject.myVariable ◦ XML: myDTD:myTag ◦ Note that XML uses a colon (:) rather than a dot (.) 53
  • 54. A namespace is defined as a unique string ◦ To guarantee uniqueness, typically a URI (Uniform Resource Indicator) is used, because the author “owns” the domain ◦ It doesn't have to be a “real” URI; it just has to be a unique string ◦ Example: http://www.matuszek.org/ns There are two ways to use namespaces: ◦ Declare a default namespace ◦ Associate a prefix with a namespace, then use the prefix in the XML to refer to the namespace 54
  • 55. In any start tag you can use the reserved attribute name xmlns: <book xmlns="http://www.matuszek.org/ns"> ◦ This namespace will be used as the default for all elements up to the corresponding end tag ◦ You can override it with a specific prefix  You can use almost this same form to declare a prefix: <book xmlns:dave="http://www.matuszek.org/ns"> ◦ Use this prefix on every tag and attribute you want to use from this namespace, including end tags--it is not a default prefix <dave:chapter dave:number="1">To Begin</dave:chapter>  You can use the prefix in the start tag in which it is defined: <dave:book xmlns:dave="http://www.matuszek.org/ns"> 55
  • 56. XSL stands for EXtensible Stylesheet Language, and is a style sheet language for XML documents.  XSLT stands for XSL Transformations.  XSLT is used to transform XML documents into other formats, like XHTML.  XSLT is used to transform an XML document into another XML document, or another type of document that is recognized by a browser, like HTML and XHTML.  XSLT does this by transforming each XML element into an (X)HTML element.  With XSLT you can add/remove elements and attributes to or from the output file.  You can also rearrange and sort elements.  You can also perform tests and make decisions about which elements to hide and display, and a lot more.
  • 57. DTDs are a very weak specification language ◦ You can’t put any restrictions on element contents. ◦ It’s difficult to specify:  All the children must occur, but may be in any order.  This element must occur a certain number of times. ◦ There are only ten data types for attribute values.  DTDs aren’t written in XML! ◦ If you want to do any validation, you need one parser for the XML and another for the DTD. ◦ This makes XML parsing harder than it needs to be. ◦ There is a newer and more powerful technology: XML Schemas. ◦ However, DTDs are still very much in use.
  • 58. An XML Schema describes the structure of an XML document.  XML Schema is an XML-based alternative to DTD.  The XML Schema language is also referred to as XML Schema Definition (XSD).  Ex: remainder.xsd <?xml version="1.0"?> < xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> < /xs:element> < /xs:schema>
  • 59. <?xml version="1.0"?> < note xmlns="http://www.w3schools.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3schools.com note.xsd"> < to>Ravi</to> < from>AEC</from> < heading>Reminder</heading> < body>Welcome to Aditya</body> < /note> <?xml version="1.0"?> < note xmlns="http://www.w3schools.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation=“note.xsd"> < to>Ravi</to> < from>AEC</from> < heading>Reminder</heading> < body>Welcome to Aditya</body> < /note>
  • 60. The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.  An XML Schema: ◦ defines elements that can appear in a document. ◦ defines attributes that can appear in a document. ◦ defines which elements are child elements. ◦ defines the order of child elements. ◦ defines the number of child elements. ◦ defines whether an element is empty or can include text. ◦ defines data types for elements and attributes. ◦ defines default and fixed values for elements and attributes.  XML Schemas are the Successors of DTDs ◦ XML Schemas are extensible to future additions. ◦ XML Schemas are richer and more powerful than DTDs. ◦ XML Schemas are written in XML. ◦ XML Schemas support data types. ◦ XML Schemas support namespaces.  XML Schemas are much more powerful than DTDs.
  • 61. XML Schemas is the support for data types. ◦ It is easier to validate the correctness of data. ◦ It is easier to work with data from a database. ◦ It is easier to define data facets (restrictions on data), data patterns (data formats) and easy to convert data between different data types.  XML Schemas is that they are written in XML. ◦ You don't have to learn a new language. ◦ You can use your XML editor to edit your Schema files. ◦ You can use your XML parser to parse your Schema files.  XML Schemas provides Secure Data Communication. ◦ A date like: "03-11-2004" will be interpreted as in some countries, 3.November and in other as 11.March. ◦ However, an XML element with a data type like this: ◦ <date type="date">2004-03-11</date> ◦ ensures a mutual understanding between sender and reciever, i.e., the XML "date“ type requires the format "YYYY-MM-DD".  XML Schemas are Extensible. ◦ Reuse your Schema in other Schemas. ◦ Create your own data types derived from the standard types. ◦ Reference multiple schemas in the same document.
  • 62.  Defining Simple Element : <xs:element name="xxx" type="yyy"/>  XML Schema has a lot of built-in data types. o xs:string o xs:decimal o xs:integer o xs:boolean o xs:date o xs:time Example  Here are some XML elements: <lastname>Refsnes</lastname> <age>36</age> <dateborn>1970-03-27</dateborn>  Here are the corresponding simple element definitions in Schema: <xs:element name="lastname" type="xs:string"/> <xs:element name="age" type="xs:integer"/> <xs:element name="dateborn" type="xs:date"/> <xs:element name="color" type="xs:string" default="red"/> <xs:element name="color" type="xs:string" fixed="red"/>
  • 63.  Syntex : <xs:attribute name="xxx" type="yyy"/>  Example  Here is an XML element with an attribute: <lastname lang="EN">Smith</lastname>  And here is the corresponding attribute definition: <xs:attribute name="lang" type="xs:string"/> <xs:attribute name="lang" type="xs:string" default="EN"/> <xs:attribute name="lang" type="xs:string" fixed="EN"/> <xs:attribute name="lang" type="xs:string" use="required"/>
  • 64. Restrictions are used to define acceptable values for XML elements or attributes.  Restrictions on XML elements are called facets. Restrictions on Values  The example defines an element called "age" with a restriction. The value of age cannot be lower than 0 or greater than 100: <xs:element name="age"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="100"/> </xs:restriction> </xs:simpleType> </xs:element>
  • 65.  The example below defines an element called "car" with a restriction. The only acceptable values are: Audi, Golf, BMW: <xs:element name="car"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction> </xs:simpleType> </xs:element>
  • 66.  Below example defines an element "letter" with a restriction.  The acceptable value is ONE of the LOWERCASE letters from a to z: <xs:element name="letter"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[a-z]"/> </xs:restriction> </xs:simpleType> </xs:element>  The only acceptable value is THREE of the UPPERCASE letters from a to z: <xs:element name="initials"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[A-Z][A-Z][A-Z]"/> </xs:restriction> </xs:simpleType> </xs:element>
  • 67.  The next example defines an element called "gender" with a restriction. The only acceptable value is male OR female: <xs:element name="gender"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="male|female"/> </xs:restriction> </xs:simpleType> </xs:element>  The example defines an element “mobileno" with a restriction.  There must be exactly 10 digits: <xs:element name=“mobileno"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{10}"/> </xs:restriction> </xs:simpleType> </xs:element>
  • 68.  The whiteSpace constraint is set to "preserve", which means that the XML processor WILL NOT remove any white space characters: <xs:element name="address"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:whiteSpace value="preserve"/> </xs:restriction> </xs:simpleType> </xs:element>  The whiteSpace constraint is set to "replace", which means that the XML processor WILL REPLACE all white space characters (line feeds, tabs, spaces, and carriage returns) with spaces: <xs:element name="address"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:whiteSpace value="replace"/> </xs:restriction> </xs:simpleType> </xs:element>
  • 69. The value must be minimum five characters and maximum eight characters:  <xs:element name="password"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:minLength value="5"/> <xs:maxLength value="8"/> </xs:restriction> </xs:simpleType> </xs:element>
  • 70. A complex element is an XML element that contains other elements and/or attributes.  There are four kinds of complex elements: ◦ empty elements ◦ elements that contain only other elements ◦ elements that contain only text ◦ elements that contain both other elements and text
  • 71.  It is a software library (or a package) that provides methods (or interfaces) for client applications to work with XML documents  It checks the well-formattedness  It may validate the documents  It does a lot of other detailed things so that a client is shielded from that complexities
  • 72.
  • 73.  DOM: Document Object Model  SAX: Simple API for XML  A DOM parser implements DOM API  A SAX parser implement SAX API  Most major parsers implement both DOM and SAX API’s
  • 74. A DOM document is an object containing all the information of an XML document  It is composed of a tree (DOM tree) of nodes , and various nodes that are somehow associated with other nodes in the tree but are not themselves part of the DOM tree
  • 75. There are 12 types of nodes in a DOM Document object Document node Element node Text node Attribute node Processing instruction node …….
  • 76. Sample XML document <?xml version="1.0"?> <?xml-stylesheet type="text/css" href=“test.css"?> <!-- It's an xml-stylesheet processing instruction. --> <!DOCTYPE shapes SYSTEM “shapes.dtd"> <shapes> …… <squre color=“BLUE”> <length> 20 </length> </squre> …… </shapes>
  • 77.
  • 78.  A DOM parser creates an internal structure in memory which is a DOM document object  Client applications get the information of the original XML document by invoking methods on this Document object or on other objects it contains  DOM parser is tree-based (or DOM obj-based)  Client application seems to be pulling the data actively, from the data flow point of view
  • 79. Advantage: (1) It is good when random access to widely separated parts of a document is required (2) It supports both read and write operations  Disadvantage: (1) It is memory inefficient (2) It seems complicated, although not really
  • 80.  It does not first create any internal structure  Client does not specify what methods to call  Client just overrides the methods of the API and place his own code inside there  When the parser encounters start-tag, end- tag,etc., it thinks of them as events
  • 81.  When such an event occurs, the handler automatically calls back to a particular method overridden by the client, and feeds as arguments the method what it sees  SAX parser is event-based,it works like an event handler in Java (e.g. MouseAdapter)  Client application seems to be just receiving the data inactively, from the data flow point of view
  • 82.  Advantage: (1) It is simple (2) It is memory efficient (3) It works well in stream application  Disadvantage: The data is broken into pieces and clients never have all the information as a whole unless they create their own data structure