2. “Markup” refers to the use of tags to describe data Data describing data is meta data Tags identify where data begins and ends, and has some information about that data Often referred to as “self describing” Standard Generalized Markup Language was created to offer universal standards for sharing and moving information Markup
3. Extensible Markup Language fills the gap between display of HTML and complexity of SGML XML is compatible with rules of SGML XML isn’t a language Set of standards about how to create a language to define and work with particular data XML
4. Tags are used similar to HTML A tag must always have a close <name>Randy</name> <middle /> Tags are defined as needed No set of predefined tags as in HTML Tags typically aren’t about display Display is separated from data, unlike HTML Using XML
5. XML is hierarchical Individual items in XML are elements One element can belong to another Child and parent Similar to a one-to-many relationship Structure is called a ‘tree’ An item with children is called a branch An item with no children is a leaf XML Structure
6. An element can contain data An element can contain other elements An element can contain data and other elements Definition of elements for specific data make up a vocabulary Elements
7. Complies with rules Rules allow easy transfer and read of data independent of platform, application Parser reads XML file Parser typically runs as a service to another application A file that doesn’t comply with rules has afatal errorand parser cannot continue By definition, a file that isn’t well formed has a fatal error Any violation of rules is an error Well-Formed XML
8. Start tag must have an end tag or be self-closing Tags cannot overlap Must have one – and only one – root element Element names obey naming conventions XML is case-sensitive Whitespace is maintained in PCDATA Well Formed
9. Element name can include a space after the name Element cannot have a space at beginning of start tag Element name must begin with letters or dash After the first character, numbers, hyphens, periods are acceptable Cannot use spaces or: (colon) in names Colon is reserved for special uses XML cannot be used as the first 3 letters of a name (Upper, lower or mixed case) Naming Elements
10. CDATA refers to character data Values that are treated as those characters PCDATA refers to parsed character data Values that are translated for a specific meaning or purpose Whitespace is treated differently than HTML Maintained Carriage return and linefeed characters are both treated as single linefeed by parser Working with Data
11. Provide another way to represent values Defined within the start tag of an element Work in a name/value pair Must include both a name and value for a valid statement (an empty string is a valid value) Value must be enclosed in single or double quotes Opening quote must be same as closing (can’t pair a single quote and double quote) Be consistent for ease of coding, reading, and maintenance Attributes
12. Attribute names must conform to same rules as element names Start with letter or dash Can use numbers, hyphens, periods after the first character Name of each attribute must be unique within an element Attribute Names
13. Elements can be more complex Can include child elements if needed Attributes are about a single value Attributes can simplify logic Can avoid or reduce nesting Can simplify logic Choice of element or attribute most often simply a design choice, preference Using Attributes and Elements
14. Provide information to aid the processing of the file <? XML version=“1.0”?> If include XML declaration must be first entry Cannot have any character preceding the open tag If include XML declaration must have at least the version Have versions 1.0, 1.1 XML Declarations
15. Optional settings are encoding, standalone Encoding specifies which character set is being used (how characters are represented) Standalone tells the parser if document is complete by itself, or relies on another file Optional XML Declarations
16. Processing instructions are for consuming application Not used by XML parser Includes information/commands that application needs to complete some task <? Statement ?> Processing Instructions
17. Some symbols have special meaning Less than (<) Greater than (>) Ampersand (&) Cannot use these characters directly unless wrapped in a CDATA section If need single symbol can substitute < for <, > for >, & for & Special Characters
18. DTD stands for Document Type Definition Allows an XML document to go further than meeting the requirements of being well-formed Specifies requirements to be valid A valid XML document matches definitions of allowable elements, attributes DTD Overview
19. Validation can be done in code (i.e. using javascript, VB and DOM) DTD’s allow use of a validating parser that compares the document against specifications Typically makes application changes and maintenance easier Less tied to a particular programming language/environment Validation
20. Includes name of root element Allows specification of where the DTD is located DTD can be embedded in the XML file (local) DTD can refer to external file, Uniform Resource Identifier (URI) Local takes precedence over external Document Type Declaration
21. Element Declaration has 3 parts: Declaration Element name Element content Element content can include a list of child elements or data Element Declaration
22. DTD included in XML document Definition of a student: <!DOCTYPE student[ <!ELEMENT student(first, last, studentID)> <!ELEMENT first (#PCDATA)> <!ELEMENT last(#PCDATA)> <ELEMENT studentID(#PCDATA)> ]> LocalDTD Document Type Declaration Element Declaration A student element is made up of first name, last name, and student id elements
23. DTD exists in external file/location Must use keyword to specify type of location SYSTEM is a reference to local file system PUBLIC is reference to DTD accessed through a catalog Can use both together If can’t find catalog reference can use specified file External Definition
24. Reference in XML file: <!DOCTYPE student SYSTEM “student.dtd”> External file: <!ELEMENT student(first, last, studentID)> <!ELEMENT first (#PCDATA)> <!ELEMENT last(#PCDATA)> <ELEMENT studentID(#PCDATA)> ]> Sample External Definition Document Type Declaration Element Declaration
25. Element name must match name in XML document If using namespaces, prefixes must match Content Model defines what the element can store An element Mixed (i.e. data and element) Empty Any Working With Elements
26. Error raised if an element is missing Error raised if there are extra elements Error raised if elements in a different order For a student, our content must be in firstname, lastname, studentID order If find an element “major”, error If order varies, error If missing first, last, or studentID, error Content by Sequence
27. Can allow content to vary between elements | (vertical bar or pipe) indicates OR If add a Grade element to a student that can be a letter or percent: <!ELEMENT grade (letter | percent)> <!ELEMENT letter (#PCDATA)> <!ELEMENT percent (#PCDATA) Indicates that must have letter or percent element Content by Choice
28. Allows combination of elements and parsed character data Can include additional information within an element, eg. how to display Rules: Managed by using Choice (or) PCDATA must appear first in list of elements List cannot include inner content model (only simple elements) If there are child elements, include * * Indicates that may appear zero or more times Mixed Content
29. If want to include emphasis with the letter grade Data: <letter><em>4</em></letter> Declaration: <!ELEMENT letter (#PCDATA | em)*> Describes a letter element as the content (pcdata) plus emphasis element Mixed Content -2
30. An element can be empty <br /> (never has child, content) Declaration includes EMPTY: <!ELEMENT br EMPTY> Means that the element CANNOT contain content Empty Content
31. An element can contain any kind of value (or be empty) Any elements declared in the DTD can occur, any number of times Only elements that are part of the DTD can be part of the document! May be empty May contain PCDATA Least restrictive model Any Content
32. How many times can an element occur? How many times must an element occur? Cardinality
33. Elements tend to be used to describe a logical unit of information Attributes are typically used to store data about characteristics (properties) May have a Movie element with attributes for Title, Rental Price, Rental Days No specific rules about how to use elements and attributes Attributes and DTD’s
34. Attributes allow more limits on data Can have a list of acceptable values Can have a default value Some ability to specify a data type Concise, about a single name/value pair Attributes have limits Can’t store long strings of text Can’t nest values Whitespace can’t be ignored Attributes and Elements
35. Declaration: <!ATTLISTElementNameAttrNameAttrType Default> Specify the Element the attribute belongs to Specify the Name of the attribute Specify the Type of data the attribute stores Specify characteristics of the values (Default or attribute value) List either the default value or other characteristic of value – required, optional Specifying Attributes
36. CDATA – unparsed character data Enumerated – series/list of string values Entity/Entities – reference entity definition(s) ID – unique identifier for the element IDREF – refer to the ID of another element IDREFS – list of ID’s of other elements separated by whitespace NMTOKEN/NMTOKENS – value(s) of attribute can be anything that follows rules for XML name Sample Attribute Data Types
37. Specifies that attribute value must be found in a particular list Each value in list must be valid XML name Limits on spaces, characters Use | (pipe) to separate members of list If specifying list letter grades for a student: <!ATTLIST student grade (A | B | C | D | F | V | W | I) #IMPLIED> Enumerated Attributes Element Attribute Enumerated List
38. An ID specifies that the element must have a unique value within the document Allows reliable way to refer to a specific element No spaces allowed in value Typically replace space with underscore Attribute list can include only one ID IDREF, IDREFS allows an element to be associated with another or multiple other elements A student element must have a student ID: <!ATTLIST student studentID ID #REQUIRED> ID, IDREF, IDREFS
39. Attributes can refer to entities “Entity” refers to substituting a reference for a text value & refers to the & character Unparsed Entity is a reference that isn’t parsed Can reuse references for long values, or hard to manage characters (i.e. tab, line feed) Entity must be declared in the DTD <!ENTITY classTitle “XML”> When classTitle found in document, replaced with XML Entities and Attributes
40. Can specify how the value will appear in the document Must always specify a value declaration DEFAULT sets a value for an attribute if a value isn’t provided Include default value in double quotes FIXED sets a value that must occur; if an attribute has a different value, a validation error occurs REQUIRED specifies that the attribute (and value) must exist IMPLIED means the attribute is optional Attribute Value Declarations
41. Alternative to DTD’s as way to define structure Essentially defining a language Structure may be also referred to as vocabulary Ensures that data matches specifications Serves as basis for other XML-related technologies XML Schemas
42. Use XML for definition Doesn’t have separate structure like DTD’s Schema must be well-formed Support Namespace recommendations Allows same name to be used in different Schemas and properly understood Provides for built-in and user-defined data types Can be easily reused Supports concepts such as inheritance One object is based on another Working with Schemas
43. Allows more specificity than DTD’s Can specify dates, numbers, ranges Datatypes fall into two categories: Simple deals with basic values Complex describes more intricate values or structures Schema Datatypes
44. Schema file uses an .xsd extension Root element is the schema Can nest all elements within the schema Everything is hierarchical OR Can have multiple elements as child elements of the schema root Allows use of a definition any place in the document (data) file Elements which are child elements of schema are global Creating Schemas
45. Simple data type is about text, numbers, date Sometime referred to as “primitives” Data types built in to Schema vocabulary (and related elements, attributes) are in the XML Schema namespace Need reference to namespace to have valid XML – where to find the definition Elements that are Simple Datatypes don’t have attributes Including an attribute makes an element Complex Simple Datatypes
46. The simpleType allows customization of base types Can create limits on values Specify ranges Specify lists <xsd:simpleType=“Degrees”> <xsd:restriction base=“xsd:string”> <xsd:enumeration value=“AA” /> <xsd:enumeration value=“AS” /> </xsd: restriction> </xsd:simpleType> Defining (Simple) Datatypes
47. Allows combination of different elements and specification of order, new data types Can create an element Course which is comprised of simple types <xsd:element name=“course”> <xsd:complexType> <xsd:sequence> <xsd:element name=“department” type=“xsd:string”/> <xsd:element name=“number” type=“xsd:string”/> <xsd:element name=“title” type=“xsd:string”/> <xsd:element name=“credits” type=“xsd:integer”/> </xsd:sequence> </xsd:complexType> </xsd:element> ComplexDatatypes
48. When using a schema, need to create a reference from data file Use either the schemaLocation or noNamespaceSchemaLocation attribute of the root element <course xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xsi:noNamespaceShemaLocation=“course.xsd”> Using a Schema
49. Qualification refers to whether a value (element, attribute) must be qualified by a its namespace When an element (or attribute) doesn’t have a namespace declaration it’s unqualified Determines how name is used in data (instance) document A schema has the attributes elementFormDefault and elementFormDefault Set to qualified or unqualified By setting to qualified, must include a namespace when use attributes or elements Qualification
50. Allows elements to appear in any order or not at all Rules governing use Must be only content model declaration of a <complexType> definition For example, can’t follow with <sequence> Can only have element declarations as children The children of the <all> element may appear once – or not at all <all> Declaration
51. Can create a group of attributes similar to element groups Allows re-use of common members without multiple definitions Attribute groups cannot be recursive (refer to themselves Attribute Groups
52. A list allows an element or attribute to store multiple values Values are separated by whitespace, so whitespace cannot be part of the content itemType attribute defines the data type Can be built-in XML or a defined simpleType data type <list> Declarations
53. <union> allows the combination of two data type for an element or attribute If have a possiblePoints element, expected value would be an integer; <union> would allow a string entry to note a “Missing” value Separate data types with whitespace <simpleType name=“CreditValue”> <union memberTypes=“xs:integerxs:string /> </simpleType <union> Declarations
54. XSL stands for Extensible Stylesheet Stylesheets are used to manage organization and presentation of data Implemented as an XML language Rules of XML apply Made up of XSL-FO (Formatting Objects) XSLT (Transformations) XSL
55. XSL-FO focused on presentation to screen and paper Not well-supported by browsers XSLT emphasizes re-organization of data Typically used for presentation but can also be used for conversion of data storage format XSLT is a declarative language Similar to SQL, describe results not steps XSL Implementation
56. Cascading Style Sheets used to separate presentation from data XSLT used to change – transform – data Convert an XML document to XHTML Can use both together XSLT v CSS
57. XSL requires several steps XML processor reads document Creates document tree XSL processor applies rules from stylesheet Rules applied to document tree Rules applied by using pattern matching Identify nodes to apply rules to Rules are stored as templates Using XSL
58. XSL works by using an Input Tree Input Tree comes from XML processor Process of changing input values is call Tree Transformation Result of transformation is the Result Tree Result Tree can include XML HTML (must adhere to XML rules, i.e., XHTML) Formatting Objects XSL Process
59. Extensible Stylesheet Transformations is method of changing (transforming) XML based on rules of a stylesheet Xpath allows manipulation of parts of XML document Not XML-based Provides compact references Useful in URI’s, attributes Document must exist as nodes (previously parsed) XSLT
60. Templates are definitions of rules, organization Patterns define values searching for (where to apply templates) Expressions allow use of functions using nodes as inputs When referring to document attributes preface name with “@” XSLT Constructs
61. <xsl:stylesheet> is root element Uses namespace to define elements, attributes valid in a stylesheet <xsl:template> defines the rules/ transformations to apply Match attribute specifies pattern to apply rules to Functions similar to criteria <xsl:apply-templates> applies the rules defined for a particular element Select attribute specifies elements to apply to XSLT Elements
62. <xsl:value-of> returns the value of a specified node, function Select attribute specifies value source <xsl:copy> copies a node to the result tree without any child nodes or attributes <xsl:copy-of> copies a node and child/attribute nodes <xsl:output> controls the result tree method=“xml|html|text” XSLT Elements – 2
63. <xsl:if> provides a boolean test to determine processing <xsl:choose> offers an IF ... THEN ... ELSE construct <xsl:for-each> allows each node in a group to be processed <xsl:sort> specifies order for a group of nodes XSLT Elements – 3
64. Match can use node name current position (represented by “.”) relative position (for example, parent = “..”) Specifies where the transformation to be applied Match
65. XPath provides a logical model for working with XML document Nodes are used to represent serialized XML (in memory) Not all parts of XML document are represented (XML declaration, DOCTYPE) XPath used in combination with other tools (such as XSLT) XPATH Introduction
66. Legal XPath code is called an expression XPath expressions that return a node set is a location path Expressions can be absolute and relative Absolute path includes a full definition of how to find node Relative path is based on current context (location) XPath Expressions
67. Root node represents document Can have only one child node (document element) Element node represents elements QName (qualified name) includes namespace prefix and element name Attribute node represent attributes Have name and value Are not represented as child nodes Text node represents text value of an element Does not have a name Namespace node gives access to the namespace URI and prefix Comment node Processing Instruction Node Node Types
68. Boolean Written as true() and false() String Number – floating point values Node-set – unordered set Follows document order XPath 1.0 Types
69. Element node references can be spelled out or abbreviated /child::movies/child::movie/child::price OR /movies/movie/price child::nodename can also be written nodename Attribute node references attribute::attributename OR @attributename XPath Abbreviations
71. Default axis Selects nodes that are immediate nodes of context (current) node Can use * to refer to all child nodes Child Axis
72. Can use node() to return all child nodes including comments, processing instructions, and text nodes Can return just text nodes using text() Text nodes are unnamed Child Axis References
73. Used to select attributes belonging to a particular element node To return all attributes attribute::* @* To return particular attribute attribute::attributename @attributename Attribute Axis
74. Used to filter node sets Predicate similar to query criteria Can use specific values or location references Predicates