3. XML Document Structure
⢠An XML document consists of a number of discrete components
⢠Not all the sections of an XML document may be necessary,
â But their inclusion helps to make for a well-structured XML document
⢠A well-structured XML document can
â Easily be transported between systems and devices
4. Major portions of an XML document
⢠The major portions of an XML document include the following:
â The XML declaration
â The Document Type Declaration (DTD)
â The element data
â The attribute data
â The character data or XML content
5. XML Declaration
⢠XML Declaration is a definite way of stating exactly
â What the document contains.
⢠XML document can optionally have an XML declaration
â It must be the first statement of the XML document
⢠XML declaration is a processing instruction of the form
<?xml ...?>
6. Components of XML Declaration
Component Meaning
<?xml Starts the beginning of the processing instruction
Version= âxxxâ Describes the specific version of XML being used
standalone= âxxxâ Defines whether documents are allowed to contain
external markup declarations
encoding= âxxxâ Indicates the character encoding that the document uses.
The default is âUS-ASCIIâ but can be set to any value
Example :
7. Document Type Declaration (DOCTYPE)
⢠DOCTYPE
â Gives a name to the XML content , and
â Provides a means to guarantee the documentâs validity,
⢠Either by including or specifying a link to a Document Type Definition (DTD).
⢠DOCTYPE is optional in XML
⢠Valid XML documents must declare the document type to which they
comply
8. General Form of DOCTYPE
⢠General Forms of the Document Type Declarations
<!DOCTYPE NAME SYSTEM âfileâ>
<!DOCTYPE NAME [ ]>
<!DOCTYPE NAME SYSTEM âfileâ [ ]>
First form refers to
â A document that only allows use of an externally defined DTD subset.
Second declaration
â Only allows an internally defined subset within the document.
Last form provides
â A place for inclusion of an internally defined DTD subset b/w square brackets
while also making use of an external subset.
9. Example on DOCTYPE
⢠Example on First Forms
<!DOCTYPE shirt SYSTEM âshirt.dtdâ>
â Root (first) tag in the document will be the <shirt> element
â DTD is saved to a file named shirt.dtd
11. Markup and Content
⢠XML documents are composed of markup and content.
⢠In general, six kinds of markup can occur in an XML document:
â elements,
â entity references,
â comments,
â processing instructions,
â marked sections, and
â Document Type Declarations.
12. Elements
⢠XML elements are
â Either a matched pair of XML tags or single XML tags that are âself-closing.â
⢠For example,
â A shirt element begins with <shirt> and ends with </shirt>.
⢠When elements do not come in pairs,
â The element name is suffixed by the forward slash.
⢠The âunmatchedâ elements are known as empty elements
⢠Elements can be arbitrarily nested within other elements
13. Attributes
⢠Within elements,
Additional information can be communicated to XML processors
â That modifies the nature of the encapsulated content.
⢠Attributes are name/value pairs contained within the start element
â That can specify text strings that modify the context of the element.
⢠Example:
<price currency=âUSDâ>âŚ</price>
<on_sale start_date=â10-15-2001â/>
14. Entity References
⢠Some characters have a special meaning in XML,
⢠Entity references indicate to XML-processing applications
â That a special text string is to follow that will be replaced with a different literal value,
⢠Entity references are delimited by
â An ampersand at the beginning and
â A semicolon at the ending.
⢠Ex : Inserting a > sign in our text
<descript> Following says 8 is greater than 5 </descript>
<equation>4 > 5</equation>
Major Entity References Character
< <
> >
& &
" "
' '
15. Comments
⢠Comments can be placed anywhere in a document and
â They are not considered to be part of the textual content of an XML document.
⢠Character sequence <!-- begins a comment and --> ends the comment.
⢠B/w these 2 delimiters,
â Any text at all can be written, including valid XML markup.
⢠Only restriction is that
â Comment delimiters cannot be used; neither can the literal string --.
⢠Example :
<!-- The below element talks about Elephant I once owned... -->
<animal>Elephant</animal>
16. Processing Instructions (PIs)
⢠PIs are not a textual part of an XML document
â But provide information to applications as to how the content should be processed.
⢠Unlike comments, XML processors are required to pass along PIs.
⢠Processing instructions have the following form:
<?instruction options?>
⢠Instruction name is called the PI target
â It is a special identifier that the processing application is intended to understand.
⢠Any following information can be optionally specified
⢠Example: <?send-message âprocess completeâ?>
17. Marked CDATA Sections
⢠Some documents will contain a large number of characters and text
â That an XML processor should ignore and pass to an application.
⢠These are known as character data (or CDATA) sections.
⢠Within an XML document, a CDATA section instructs the parser
â To ignore all markup characters except the end of the CDATA markup instruction.
⢠This allows for a section of XML code to be âescapedâ
â So that it doesnât inadvertently disrupt XML processing.
⢠CDATA sections follow this general form:
<![CDATA[content]]>
18. Marked CDATA Sections
⢠All content contained in the CDATA section is
â Passed as string literals directly to the application without interpretation
⢠Example:
<object_code>
<![CDATA[
function master(poltice integer) {
if poltice<=3 then {
Mas=poltice+IntToString(FindElement(â<chicken>â));
}
}
]]>
</object_code>
19. Document Type Definitions (DTD)
⢠Donât confuse the DOCTYPE with the DTD.
⢠A DOCTYPE and a DTD serve very different, although related purposes.
â DOCTYPE is used to identify and name the XML content
â DTD is used to validate the metadata contained within.
⢠DTDs represent a specific form of XML text
â That is allowable in an XML document.
⢠DTDs and XML Schema are the means for defining the validity constraints
on XML documents
20. XML Content
⢠XML content can consist of any data, including binary data,
â As long as it doesnât violate rules that would confuse the content with valid XML
metadata instructions.
⢠XML content can contain any characters,
â Including any valid Unicode and international characters.
⢠XML content can be as long as necessary
21.
22. XML document with an internal DTD
⢠A DTD defines the structure & the legal elements and attributes of an XML
document.
⢠An application can use a DTD to verify that XML data is valid.
⢠If the DTD is declared inside the XML file,
â It must be wrapped inside the <!DOCTYPE> definition.
⢠Document Type Declaration (DOCTYPE) gives a name to the XML
content
23. Document Type Declaration (DOCTYPE)
⢠A DTD defines the structure & the legal elements and attributes of an XML
document.
⢠An application can use a DTD to verify that XML data is valid.
⢠If the DTD is declared inside the XML file,
â It must be wrapped inside the <!DOCTYPE> definition.
⢠Document Type Declaration (DOCTYPE) gives a name to the XML
content