SlideShare a Scribd company logo
1 of 89
Download to read offline
Gegevensbanken
                      “Laatse Les”
                         Prof. Erik Duval
                           2009 - 2010




                                1

Sunday 30 May 2010
http://www.slideshare.net/erik.duval




Sunday 30 May 2010
                     2
•     NoSQL (Met dank aan Steven Noels)
               •     XML (met dank aan prof. Olivié!)
               •     over het examen...




                                          3

Sunday 30 May 2010
Sunday 30 May 2010
                     4




                         http://en.wikipedia.org/wiki/Extensible_Markup_Language
Sunday 30 May 2010
                     5




                         http://www.itjobboard.be/ICT-banen/xml/Belgie/alle/0/relevantie/nl/
6   http://www.khbo.be/12385
Sunday 30 May 2010
7   http://www.w3.org/XML
Sunday 30 May 2010
8   http://www.w3c.it/talks/2005/openCulture/slide7-0.html

Sunday 30 May 2010
Sunday 30 May 2010
                     9




                         http://en.wikipedia.org/wiki/List_of_XML_markup_languages
XML is not ...
   •      Extension of HTML
        •      XHTML is XML-compliant, and extensible

   •      Just for Web pages
        •      Useful when data are stored or exchanged

   •      Concerned with semantics
        •      XML does not define semantics, just syntax

   •      Innovative new technology
        •      Standard, building on existing technology

   •      Only a hype
        •      Though also
Sunday 30 May 2010
                                       10
XML is ...
   •      Endorsed by W3C and major companies
   •      Extensible
        •      No tag name limitations
        •      No language limitations
   •      Human   software developer-readable

        •      Can be processed with basic text tools
   •      Open standard
        •      no vendor lock-in (in theory...)
   •      Easy to implement
        •      powerful, cheap (free), off-the-shelf XML tools
Sunday 30 May 2010
                                        11
•     1969: SGML (Standard Generalized Markup Language)
                     •   Meta-language: describe other languages
                     •   Powerful, but rather complicated
                     •   1986: ISO standard

               •     1992: HTML (HyperText Markup Language)
                     •   Based on SGML
                     •   Simple, but limited

               •     1996: Start design of XML
                     •   By World Wide Web Consortium (W3C)

               •     1998: Publication of XML 1.0
                                           12

Sunday 30 May 2010
Design Goals
               •     Easy to use over the Internet
                     •   Power of SGML
                     •   Simplicity of HTML
               •     Human-legible
               •     Easy to create
               •     Compactness is not an issue
               •     “The ASCII of the Web”
                                     13

Sunday 30 May 2010
XML Basics
           <Person>
                <Name>
                     <First>Thomas</First>
                     <Last>Atkinson</Last>
                </Name>
                <Age>30</Age>
           </Person>



               •     Self-defined, meaningful tags
               •     Separate data and its representation
                                       14

Sunday 30 May 2010
•      Language for defining syntax
   •      Records and fields have explicit boundaries
        •      parse-able without knowing structure (self-descriptive)
   •      Unicode support (UTF-8, UTF-16, ...)
   •      Web-aware
        •      DTD, ENTITY and Schema can be loaded through URL
   •      Strictly parsed: no ambiguity (case sensitive!)
   •      Extensible: namespaces

                                       15

Sunday 30 May 2010
<?xml version="1.0” encoding=“UTF-8”?>
    <!-- processing instruction: XML follows -->
  <!DOCTYPE addressbook SYSTEM
      "http://www/~koenh/ddml/addressbook.dtd”>
        <!-- Document Type Declaration... -->
        <!-- ExternalDTDPointer -->
  <addressbook> <!--root element -->
    <person first-name="John" family-name="Doe”
      employee-number="1234">
      <contact-info>
        <email address="Jdoe@home.com"/>
      </contact-info>
      <address street="Celestijnenlaan”
        number="200A"/>
    </person></addressbook>
                         16

Sunday 30 May 2010
<H1        align=”center”        > a Heading </H1>
                       attribute
                       opening                        closing
                                              content
                         tag                            tag
                                   element




                                      17

Sunday 30 May 2010
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                       attribute
                       opening                        closing
                                              content
                         tag                            tag
                                   element




                                      17

Sunday 30 May 2010
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                       attribute
                       opening                        closing
                                              content
                         tag                            tag
                                   element




                                      17

Sunday 30 May 2010
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                       attribute
                       opening                        closing
                                              content
                         tag                            tag
                                   element




                                      17

Sunday 30 May 2010
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                       attribute
                       opening                        closing
                                              content
                         tag                            tag
                                   element




                                      17

Sunday 30 May 2010
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                       attribute
                       opening                        closing
                                              content
                         tag                            tag
                                   element




                                      17

Sunday 30 May 2010
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                       attribute
                       opening                        closing
                                              content
                         tag                            tag
                                   element




                                      17

Sunday 30 May 2010
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                        attribute
                       opening                        closing
                                              content
                         tag                            tag
                                    element

    •      Major differences:
         •      Case sensitive
         •      Proper nesting: No <A> … <B> … </A> … </B>
         •      Unicode instead of ASCII
                                      17

Sunday 30 May 2010
Vocabularies

   •      Agreed-upon XML tag sets for specific domain
   •      Examples
         •      Chemical Markup Language (CML)
         •      Business: ebXML, RosettaNet, BizTalk
         •      Mathematics: MathML
         •      Multimedia: Synchronized Multimedia Integration Language (SMIL)
         •      Etc.
                                          18

Sunday 30 May 2010
•      well-formed: follows XML syntax

        •      Proper tag and attribute names

        •      Tags properly closed

        •      Attributes and text between tags do not contain
               ‘<‘ (escape with &lt;)

   •      valid: well-formed and vocabulary

        •      All elements and their attributes declared in DTD

        •      Attribute values follow DTD type declarations
              •      CDATA, ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, enumerated

        •      Nesting and sequencing of elements follows DTD
                                          19

Sunday 30 May 2010
Elements
    •      XML’s container for
          •      Attributes
          •      Character data
          •      Other elements (“child” elements)

    •      Delimited by opening and closing tags
          •      Non-empty element:	

 <name>..</name>

          •      Empty element:      	

<name/>

    •      Form a simple hierarchic tree
          •      Root = “document element”
                                       20

Sunday 30 May 2010
Attributes and Strings
     •      Attributes
           •         Name-value pairs: name=value
           •         Only strings as value!
     •      Strings
           •         Enclosed by ‘...’ or “...”
                     → replace with &apos; or &quot;
     •      Character data
           •         Any text that is not markup
           •         ‘&’, ‘<’ and ‘>’ are markup
                      → replace with &amp; &lt; and &gt;
                                      21

Sunday 30 May 2010
Document structure

   •      Prolog (optional)
        •      <?xml version="1.0” encoding=“UTF-8”?>

              •                (compulsory)
                     version="number"

              •
             encoding="character encoding" (optional)

   •      Document type declaration
           • <!DOCTYPE document_element ... >

• Body
     – The document element
                                    22

Sunday 30 May 2010
Another example
<?xml version="1.0" standalone="no"?>
<!DOCTYPE BankAccounts ...>
<!-- This is an example XML document -->
<BankAccounts>
       <Account accountNr="123-456789-01" use="personal">
               <Owners> <Person ID="1258-a8d72-98">
                         <Name>John Smith</Name></Person>
                        <Person ID="5842-df5ef-e9">
                         <Name>Claudia Scott</Name></Person>
               </Owners>
               <CreditCards><CreditCard number="12345"/></CreditCards>
               <Balance Currency="EUR">50000</Balance>
       </Account>
        ...
</BankAccounts>                       23

Sunday 30 May 2010
Document Type Definition
<!ELEMENT address EMPTY>
  <!-- no content, used for attributes only -->
<!ATTLIST address city CDATA #REQUIRED
  <!-- character data: any string -->
  <!-- value for that attribute must be present -->
                               state NMTOKEN #REQUIRED
  <!-- name token: letters, numbers, ., -, _ and : only -->
                               number CDATA #REQUIRED
                               street CDATA #REQUIRED>

<!ELEMENT addressbook (person+)>
  <!-- 1 or more -->


<!ELEMENT contact-info
  (home-phone|mobile-phone|email)*>
  <!-- choice -->
  <!-- o or more -->

                                       24

Sunday 30 May 2010
Document Type Definition
<!ELEMENT email EMPTY>
<!ATTLIST email address CDATA #REQUIRED>

<!ELEMENT home-phone EMPTY>
<!ATTLIST home-phone number CDATA #REQUIRED>

<!ELEMENT job-info EMPTY>
<!ATTLIST job-info is-manager (yes|no) 'no’
  <!-- default -->
                     emp-type (FullTime|PartTime)
'FullTime’
                     job-description CDATA #REQUIRED>

<!ELEMENT misc-info (#PCDATA)>
  <!-- Parsed Character Data: cannot contain subelements -->


<!ELEMENT mobile-phone EMPTY>
<!ATTLIST mobile-phone 25number CDATA #REQUIRED>
Sunday 30 May 2010
Document Type Definition

<!ELEMENT manager EMPTY>
<!ATTLIST manager empnumber IDREF #REQUIRED>
  <!-- reference to empnumber of person -->


<!ELEMENT person (contact-info,address,
   job-info?,manager?,misc-info?)>
  <!-- sequence -->
  <!-- zero or one -->
<!ATTLIST person first-name CDATA #REQUIRED
          middle-initial CDATA #IMPLIED
  <!-- can, but need not be provided -->
                     employee-number ID #REQUIRED
  <!-- can be referred to by manager.empnumber -->
                     family-name CDATA #REQUIRED>


                                       26

Sunday 30 May 2010
namespaces: problem
<widget type="gadget">
     <head size="medium"/>
     <big><subwidget ref="gizmo"/></big>
     <info>
          <head><title>Gadget</title></head>
          <body><h1>Gadget</h1>
               A gadget contains a big gizmo
          </body>                     Name collision!
     </info>
</widget>                      27

Sunday 30 May 2010
namespaces: approach


   •      A collection of names, identified by a URI
          reference, which are used in XML documents as
          element types and attribute names
     •xmlns:prefix="URI"
   •      URI used only as identifier
        •      does not need to point to anything

   •      applies to all nested elements and attributes
                                    28

Sunday 30 May 2010
namespaces: example
 <widget xmlns="http://www.widget.org"
      xmlns:xhtml="http://www.w3.org/TR/xhtml1"
      type="gadget">
      <head size="medium"/>
      <big><subwidget ref="gizmo"/></big>
      <info><xhtml:head><xhtml:title>Gadget
                       </xhtml:title></xhtml:head>
                     <xhtml:body><xhtml:h1>Gadget
                       </xhtml:h1>A gadget contains...
                     </xhtml:body></info>
   </widget>                       29

Sunday 30 May 2010
Another example

<Address>                            <Server>
  <Street>Celestijnenlaan</Street>     <Name>www</Name>
  <Nr>200A</Nr>                        <Address>
                                           134.58.43.1
  <City>Heverlee-Leuven</City>
                                         </Address>
  <Country>Belgium</Country>         </Server>
</Address>




                            ?
                              30

Sunday 30 May 2010
Another example (2)
<Address                                   <Server
  xmlns="www.all.edu/departments">           xmlns="www.dns.net/servers">
  <Street>Celestijnenlaan</Street>           <Name>www</Name>
  <Nr>200A</Nr>                              <Address>
  <City>Heverlee-Leuven</City>                 134.58.43.1
                                             </Address>
  <Country>Belgium</Country>               </Server>
</Address>



       <Department xmlns:edu="www.all.edu/departments"
                   xmlns:dns="www.dns.net/servers">
         <edu:Address>
           <Street>Celestijnenlaan</Street>
           ...
         </edu:Address>
         <dns:Name>www</dns:Name>
         <dns:Address>134.58.43.1</dns:Address>
       </Department>


                                      31

Sunday 30 May 2010
Accessing XML documents

   •      Manual text file manipulation
         •      Cumbersome & Error-prone

   •      Parser
         •      Simplifies document manipulation
              •      Ensures proper grammar, well-formedness

              •      Abstracts content from grammar

         •      Accessed through standard API
           • Document Object Model (DOM)
           • Simple API for XML (SAX)
                                           32

Sunday 30 May 2010
•      DOM parser
        •      create DOM object tree
   •      SAX parser
        •      generates events when elements encountered
        •      one-pass translation
        •      no need to keep whole document tree in memory
   •      Both can be validating or non-validating
   •      Many available
          (most freeware, open source)
        •      ibm xml4j, apache xerces, sun parser, microsoft,
               datachannel, oracle, ...
                                       33

Sunday 30 May 2010
DOM approach




                               http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview/3_apis.html#JAXP
                          34

Sunday 30 May 2010
DOM Node Tree
                                        Doc
<?xml version="1.0"?>
                                          Com           An example XML document
<!-- An example XML document -->
                                              El                    BankAccounts
<BankAccounts>
                                                   El                    Account
 <Account accountNr="123-456789-01“>
    <Owner ID="1258-a8d72-98">                          Att    accountNr = “123-456789-01”
       John Smith
                                                        El           Owner = “John Smith”
    </Owner>
    <Balance Currency="EUR">                                  Att       ID = “1258-a8d72-98”
     50000
                                                        El             Balance = “50000”
    </Balance>
 </Account>                                                   Att            Currency = “Eur”
 <Account ...>
 ...                                               El                    Account
</BankAccounts>                                                        ...

                                   35

Sunday 30 May 2010
parsing: DOM
 public void print(Node node) {
    ...
    NodeList nlist=node.getChildNodes();
    if (nlist != null) {
           int l = nlist.getLength();
           for (int i=0; i<l; i++) {
                 print(nlist.item(i));
                 ...
           }...}...}
                                36

Sunday 30 May 2010
Dom Benefits & Drawbacks

    •      Benefits
         •      W3C Recommendation
         •      Language- and platform-independent
         •      Random access
         •      Intuitive
    •      Drawback
         •      Entire object tree in memory
                                 37

Sunday 30 May 2010
Simple API for XML (SAX)

    •      Not an official standard
         •      Ad-hoc product by XML developers
         •      Primarily Java API
    •      Event-based mechanism
         •      Don’t call the parser, the parser calls you
         •      No object model in memory
         •      Programmer must keep state information
                                     38

Sunday 30 May 2010
http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview/3_apis.html#JAXP




Sunday 30 May 2010
                     39
                                                                                                  SAX approach
SAX parsing model
          Application
                             new ContentHandler()               ContentHandler
                     new Parser()
                                           Parser
                     setContentHandler()

                           parse()

                                                    startDocument()
                                                    startElement()

                                                     characters()

                                                     endElement()

                                                    endDocument()



                                             40

Sunday 30 May 2010
parsing: SAX
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser,
     "startQuestion","endQuestion");
     ...
     xml_parse($xml_parser,$data,feof($fp))
     ...
function startQuestion($parser,$name,$attrs)
  {
        ...if ($name == "QUESTION")
          ...new Question($attrs["QTEXT"]);
          ...               41

Sunday 30 May 2010
•      Start and end of document
      – startDocument()
      – endDocument()

    •      Start and end of element
      – startElement(namespace, name, qname, attlist)
      – endElement(namespace, name, qname)

    •      Character data
      – characters(char[] ch, int start, int length)

    •      Processing Instruction
      – processingInstruction(target, data)

    •      No event for comments!
Sunday 30 May 2010
                                    42
Another SAX example
<?xml version="1.0" standalone="no"?>

<!DOCTYPE BankAccounts ...>

<!-- This is an example XML document -->

<BankAccounts>

     <Account accountNr="123-456789-01" use="personal">

            <Owners>

                 <Person ID="1258-a8d72-98"><Name>John Smith</Name></Person>

                 <Person ID="5842-df5ef-e9"><Name>Claudia Scott</Name></Person>

            </Owners>

            <CreditCards><CreditCard number="12345"/></CreditCards>

            <Balance Currency="EUR">50000</Balance>

     </Account>

      ...

</BankAccounts>                             43

Sunday 30 May 2010
public class AvgBalanceCalculator extends DefaultHandler
  {private double total = 0.0;
   private int count = 0;
   private boolean isBalance = false;

     public void startElement(String uri, String name, String qname, Attributes atts)
      {if (name.equals(“Balance")) {
            isBalance = true;
            count++; }}


     public void characters(char[] ch, int start, int len) throws SaxException
      {if (isBalance) {
            String help = new String(ch, start, len);
            double balance = (new double(help)).doubleValue();
            total = total + balance;
            isBalance = false; }}


     public void endDocument()
      {if (count != 0)

           System.out.println(“Average balance is ”+(total/count));
       }
   }
                                                44

Sunday 30 May 2010
SAX Benefits & Drawbacks
   •      Benefits
         •      Suitable when
              •      parsing large documents

              •      constructing proprietary object structures

              •      only small subset of information is needed

         •      Simple and fast

   •      Drawbacks
         •      Read-only
         •      No random access
         •      Complex searches messy to program
Sunday 30 May 2010
                                     45
beperkingen van DTDs

   •      geen typering van tekst elementen en attributen

         •      alle waarden zijn strings, geen integers, reals, enz.

   •      ongeordende verzameling van subelementen moeilijk te definiëren

         •      orde is meestal irrelevant in gegevensbanken

   •      IDs en IDREFs zijn niet getypeerd

         •      het DNO attribuut van een EMPLOYEE kan een referentie bevatten aan een andere
                EMPLOYEE, wat zinloos is
                vb. <EMPLOYEE SSN="_888665555 " SEX="M" DNO="_888665555 ">

         •      het DNO attribuut zou als beperking moeten hebben dat het slechts aan een
                DEPARTMENT kan refereren


                                                        46

Sunday 30 May 2010
XML Schema
    •      typering van waarden

         •      vb. integer, string, enz.
         •      ook beperkingen op min/max waarden
    •      types door gebruiker gedefinieerd
    •      is gespecificeerd in XML syntax,
         •      meer gestandaardiseerde voorstelling

    •      is geïntegreerd met namespaces
    •      en nog andere mogelijkheden
         •      lijst types, uniciteitsbeperking op sleutels,
                verwijssleutelbeperkingen, overerving,…
                                              47

Sunday 30 May 2010
XSDL


               •     XML Schema Definition Language
               •     documenten met suffix .xsd




                                       48

Sunday 30 May 2010
XML Schema: voorbeeld
       XML schema

       <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
       ....
       <xsd:element name="PWORKER" minOccurs="0" maxOccurs="unbounded">
         <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="HOURS" type="xsd:float"/>
            </xsd:sequence>
            <xsd:attribute name="SSN" type="xsd:IDREF" use="required"/>
         </xsd:complexType>
       </xsd:element>
       ....
       </xsd:schema>


       XML instantie

                     <PWORKER SSN="_123456789">
                       <HOURS>7.5</HOURS>
                     </PWORKER>            49

Sunday 30 May 2010
XML: eenvoudige types
–        ingebouwde eenvoudige types
        •      string, integer, decimal, float, boolean, date, time,…
        •      <xsd:element name=“gebdat” type=“xsd:date” />
–        door gebruiker gedefinieerde eenvoudige types
        •      gedefinieerd met simpleType element
        •      restriction element geeft het basistype waarop gesteund is
        •      <xsd:simpleType name=“salaryRange”>
                 <xsd:restriction base=“xsd:integer”>
                   <xsd:minInclusive value=“25000” />
                   <xsd:maxInclusive value=“100000” />
                 </xsd:restriction>
               </xsd:simpleType>
                                         50

Sunday 30 May 2010
XML: eenvoudige types
      <xsd:simpleType name=“studentClassificatie”>
            <xsd:restriction base=“xsd:string”>
              <xsd:enumeration value=“bachelorstudent” />
      	

 	

 <xsd:enumeration value=“masterstudent” />
      	

 	

 <xsd:enumeration value=“doctorstudent” />
            </xsd:restriction>
      </xsd:simpleType>

      <xsd:simpleType name=“deptType”>
        <xsd:restriction base=“xsd:string”>
          <xsd:length value=“3” />
        </xsd:restriction>
      </xsd:simpleType>            51

Sunday 30 May 2010
52

Sunday 30 May 2010
53

Sunday 30 May 2010
54

Sunday 30 May 2010
55

Sunday 30 May 2010
XPath (example)
                                ROOT

                                       COMPANY
            /COMPANY/EMPLOYEE

                                            EMPLOYEE

                                                 SSN

                                                       _123456789

                                            EMPLOYEE

                                                 SSN

                                                       _333445555


                                            EMPLOYEE

                                                 SSN

                                                       _999887777


                                 56

Sunday 30 May 2010
ROOT

                                        COMPANY

            / COMPANY/EMPLOYEE
                                             EMPLOYEE

                                                  SSN

                                                        _123456789

                                             EMPLOYEE

                                                  SSN

                                                        _333445555


                                             EMPLOYEE

                                                  SSN

                                                        _999887777


                                  57

Sunday 30 May 2010
ROOT

                                               COMPANY
                 /   COMPANY/EMPLOYEE
                                                    EMPLOYEE

                                                         SSN

                                                               _123456789

                                                    EMPLOYEE

                                                         SSN

                                                               _333445555


                                                    EMPLOYEE

                                                         SSN

                                                               _999887777


                                         58

Sunday 30 May 2010
ROOT

                                       COMPANY

                     /
            /COMPANY EMPLOYEE
                                            EMPLOYEE

                                                 SSN

                                                       _123456789

                                            EMPLOYEE

                                                 SSN

                                                       _333445555


                                            EMPLOYEE

                                                 SSN

                                                       _999887777


                                 59

Sunday 30 May 2010
ROOT

                                         COMPANY
                         EMPLOYEE
                 /COMPANY/
                                              EMPLOYEE

                                                   SSN

                                                         _123456789

                                              EMPLOYEE

                                                   SSN

                                                         _333445555


                                              EMPLOYEE

                                                   SSN

                                                         _999887777


                                    60

Sunday 30 May 2010
XPath    ROOT

                                                      COMPANY
            /COMPANY/EMPLOYEE

                                                           EMPLOYEE


          <EMPLOYEE SSN="_123456789" SEX="M“                    SSN
             SUPERSSN="_333445555" DNO="_5">
                   <FNAME>John</FNAME>                                _123456789
                    <MINIT>B</MINIT>
                          ....                             EMPLOYEE
                      </EMPLOYEE>
          <EMPLOYEE SSN="_333445555" SEX="M“                    SSN
             SUPERSSN="_888665555" DNO="_5">
                <FNAME>Franklin</FNAME>
                    <MINIT>T</MINIT>                                  _333445555
                   <LNAME>Wong</LNAME>
                <BDATE>08-DEC-45</BDATE>
                      </EMPLOYEE>                          EMPLOYEE
          <EMPLOYEE SSN="_999887777" SEX="F“
             SUPERSSN="_987654321" DNO="_4">                    SSN
                 <FNAME>Alicia</FNAME>
                                                                      _999887777
                        .....


                                                61

Sunday 30 May 2010
XML family of technologies

   •      Xlink: hypertext

   •      XSL: Extensible Style Sheet Language

        •      XSL-T Transformation

        •      Formatting Objects

   •      Xschema: additional constraints on attribute types

   •      and more...

                                      62

Sunday 30 May 2010
XML applications
   •      RDF: Resource Description Framework

         •      infra

   •      XHTML: eXtensible HTML en HTML5
         •      XML compliant HTML

   •      MathML

   •      SMILE: synchronized multimedia presentation

   •      Many others

         •      Chemical Markup Language,Vector Graphics Markup Language, Open Software
                Description Format, Weather observation, astronomical data, financial data,
                electronic components, workflow, business cards, real estate, newspaper,
                classifieds, javadoc, human resource, advertising, architecture ….
                                               63

Sunday 30 May 2010
XML Working Groups
               •     XML Coordination
               •     XML Core
               •     XSL (XSLT, XSL/FO) -> W3C architecture
               •     Efficient XML Interchange
               •     XML Processing Model
               •     XML Query (XQuery, XPath)
               •     XML Schema
               •     Service Modeling Language (SML)
                                        64

Sunday 30 May 2010
More XPath Features
  •    Operator “|” used to implement union

      •    E.g. //EMPLOYEE[count(DEPENDENT) = 1] | //EMPLOYEE[not(DEPENDENT)]

          •    gives employees with either 0 or 1 dependents

  •    “//” can be used to skip multiple levels of nodes

      •    E.g. /COMPANY//FNAME

          •    finds any FNAME element anywhere under the /COMPANY element, regardless of the
               element in which it is contained.

  •    A step in the path can go to:

           parents, siblings, ancestors and descendants
      of the nodes generated by the previous step, not just to the children

      •    “//”, described above, is a short from for specifying “all descendants”

      •    “..” specifies the parent.

          •    e.g. : /COMPANY//FNAME/../BDATE
                                             65

Sunday 30 May 2010
XQuery
   •      laat toe om meer algemene queries te formuleren dan XPath
   •      algemene vorm: FLWOR uitdrukking
                     FOR 	

 	

 < for-variabele > IN < in-uitdrukking >
                     LET	

 	

 < let-variabele > := < let-uitdrukking >
                     [ WHERE	

 < filter-uitdrukking > ]
                     [ ORDER BY	

 < orde-specificatie > ]
                     RETURN	

 uitdrukking >
                           <
   •      opm: FOR en LET kunnen alleen of samen voorkomen
                                    66

Sunday 30 May 2010
•      Q1: voornaam en familienaam van alle werknemers die meer
          dan 70000 verdienen
   •      FOR $x IN doc(www.company.com/info.xml)
          // employee [employeeSalary > 70000] / employeeName
          RETURN < res > $x / firstName, $x / lastName </ res >
   •      alternatief:
          FOR $x IN doc(www.company.com/info.xml)
          company / employee
          WHERE $x / employeeSalary > 70000
          RETURN < res > $x / employeeName / firstName,
                          $x / employeeName / lastName </ res >

                                   67

Sunday 30 May 2010
•      Q3: voornaam en familienaam van alle werknemers die meer
          dan 20 uur op project nummer 5 werken, met dat aantal uren
   •      FOR $x IN doc(www.company.com/info.xml)
          / company / project [projectNumber = 5] / projectWorker ,
          $y IN doc(www.company.com/info.xml) / company /
          employee
          WHERE $x/hours > 20.0 AND $y.ssn = $x.ssn
          RETURN < res > $y / employeeName / firstName,
          $y / employeeName / lastName,
          $x / hours </ res >


                                   68

Sunday 30 May 2010
The End...


                     Bedankt!
                       Vragen...?


                           69

Sunday 30 May 2010
NoSQL

               •     non-relational
               •     distributed
               •     open source
               •     horizontally scalable
               •     “web scale”


                                             70

Sunday 30 May 2010
NoSQL

               •     non-relational
                                                  •   schema free
               •     distributed
                                                  •   easy replication
               •     open source
                                                  •   simple API
               •     horizontally scalable
                                                  •   BASE (not ACID)
               •     “web scale”


                                             70

Sunday 30 May 2010
Systems
               •     Core: Hadoop, HBase, Cassandra, Hypertable, ...
               •     Docs: CouchDB, MongoDB, Riak, Terrastore, ...
               •     Key-Value, tuple: Amazon SimpleDB, Azure, ...
               •     Graph: Neo4J, Bigdata, InfoGrid, HyperGraph, ...
               •     Object:Versant, Perst, ZODB, ...
               •     Grid: GigaSpaces, Hazelcast, ...
               •     XML: Tamino, eXist, Mark Logic, Xindice, ...
               •     ...
                                           71             http://nosql-databases.org/
Sunday 30 May 2010
nosql

               •     Google BigTable
               •     Amazon Dynamo
               •     Open source: HBase
                     •   Cassandra: last.fm, FaceBook



                                          72

Sunday 30 May 2010
nosql: why

               •     big data sets:
                     •   Digg green badge: 3 TB
                     •   Facebook inbox: 50 TB
                     •   eBay overall data: 2 PB



                                          73

Sunday 30 May 2010
http://about.digg.com/blog/looking-future-cassandra
                                                  74

                                                       Sunday 30 May 2010
http://about.digg.com/blog/looking-future-cassandra
                                                  74

                                                       Sunday 30 May 2010
http://about.digg.com/blog/looking-future-cassandra
14 seconds



                                                    74

                                                         Sunday 30 May 2010
http://about.digg.com/blog/looking-future-cassandra
                     75

Sunday 30 May 2010
Text




                      76    http://www.slideshare.net/oemebamo/database-sharding-at-netlog-presentation

Sunday 30 May 2010
no attempt to ACID
               •     Atomicity
               •     Consistency
               •     Isolation
               •     Durability


               •     trade ACID off in favor of high availability

                                           77

Sunday 30 May 2010
query


               •     associative array, key-value pair
               •     XQuery
               •     SPARQL




                                           78

Sunday 30 May 2010
Vragen...?

                         79

Sunday 30 May 2010

More Related Content

More from Erik Duval

InfoVis1415: slides sessie 11, 11 mei 2015
InfoVis1415: slides sessie 11, 11 mei 2015InfoVis1415: slides sessie 11, 11 mei 2015
InfoVis1415: slides sessie 11, 11 mei 2015Erik Duval
 
InfoVis1415: slides sessie 10, 4 mei 2015
InfoVis1415: slides sessie 10, 4 mei 2015InfoVis1415: slides sessie 10, 4 mei 2015
InfoVis1415: slides sessie 10, 4 mei 2015Erik Duval
 
Evaluation
 of information visualisation
Evaluation
 of information visualisationEvaluation
 of information visualisation
Evaluation
 of information visualisationErik Duval
 
InfoVis1415: slides sessie 9, 27 april 2015
InfoVis1415: slides sessie 9, 27 april 2015InfoVis1415: slides sessie 9, 27 april 2015
InfoVis1415: slides sessie 9, 27 april 2015Erik Duval
 
Social Media and Science a wedding made in Heaven...
 or in Hell?
Social Media and Science a wedding made in Heaven...
 or in Hell?Social Media and Science a wedding made in Heaven...
 or in Hell?
Social Media and Science a wedding made in Heaven...
 or in Hell?Erik Duval
 
Information visualisation: 
Data ink design principles
Information visualisation: 
Data ink design principlesInformation visualisation: 
Data ink design principles
Information visualisation: 
Data ink design principlesErik Duval
 
InfoVis1415: slides sessie 8, 20 april 2015
InfoVis1415: slides sessie 8, 20 april 2015InfoVis1415: slides sessie 8, 20 april 2015
InfoVis1415: slides sessie 8, 20 april 2015Erik Duval
 
A short history (and even shorter future)
 of information visualisation
A short history (and even shorter future)
 of information visualisationA short history (and even shorter future)
 of information visualisation
A short history (and even shorter future)
 of information visualisationErik Duval
 
InfoVis1415: slides sessie 7, 30 March 2015
InfoVis1415: slides sessie 7, 30 March 2015InfoVis1415: slides sessie 7, 30 March 2015
InfoVis1415: slides sessie 7, 30 March 2015Erik Duval
 
InfoVis1415: slides sessie 6, 23 March 2015
InfoVis1415: slides sessie 6, 23 March 2015InfoVis1415: slides sessie 6, 23 March 2015
InfoVis1415: slides sessie 6, 23 March 2015Erik Duval
 
History of Human Computer Interaction
History of Human Computer InteractionHistory of Human Computer Interaction
History of Human Computer InteractionErik Duval
 
InfoVis1415: slides sessie 5, 9 March 2015
InfoVis1415: slides sessie 5, 9 March 2015InfoVis1415: slides sessie 5, 9 March 2015
InfoVis1415: slides sessie 5, 9 March 2015Erik Duval
 
InfoVis1415: slides sessie 4, 2 March 2015
InfoVis1415: slides sessie 4, 2 March 2015InfoVis1415: slides sessie 4, 2 March 2015
InfoVis1415: slides sessie 4, 2 March 2015Erik Duval
 
InfoVis1415: slides sessie 3, 23 Feb 2015
InfoVis1415: slides sessie 3, 23 Feb 2015InfoVis1415: slides sessie 3, 23 Feb 2015
InfoVis1415: slides sessie 3, 23 Feb 2015Erik Duval
 
InfoVis1415: slides sessie 2, 16 Feb 2015
InfoVis1415: slides sessie 2, 16 Feb 2015InfoVis1415: slides sessie 2, 16 Feb 2015
InfoVis1415: slides sessie 2, 16 Feb 2015Erik Duval
 
Technology that makes HUMANS smarter
Technology that makes HUMANS smarterTechnology that makes HUMANS smarter
Technology that makes HUMANS smarterErik Duval
 
InfoVis1415: slides sessie 1, 10 Feb 2015
InfoVis1415: slides sessie 1, 10 Feb 2015InfoVis1415: slides sessie 1, 10 Feb 2015
InfoVis1415: slides sessie 1, 10 Feb 2015Erik Duval
 
201502010 pen ocw_les1_erik
201502010 pen ocw_les1_erik201502010 pen ocw_les1_erik
201502010 pen ocw_les1_erikErik Duval
 
Inleiding Human Computer Interaction
Inleiding Human Computer InteractionInleiding Human Computer Interaction
Inleiding Human Computer InteractionErik Duval
 

More from Erik Duval (20)

InfoVis1415: slides sessie 11, 11 mei 2015
InfoVis1415: slides sessie 11, 11 mei 2015InfoVis1415: slides sessie 11, 11 mei 2015
InfoVis1415: slides sessie 11, 11 mei 2015
 
InfoVis1415: slides sessie 10, 4 mei 2015
InfoVis1415: slides sessie 10, 4 mei 2015InfoVis1415: slides sessie 10, 4 mei 2015
InfoVis1415: slides sessie 10, 4 mei 2015
 
Evaluation
 of information visualisation
Evaluation
 of information visualisationEvaluation
 of information visualisation
Evaluation
 of information visualisation
 
InfoVis1415: slides sessie 9, 27 april 2015
InfoVis1415: slides sessie 9, 27 april 2015InfoVis1415: slides sessie 9, 27 april 2015
InfoVis1415: slides sessie 9, 27 april 2015
 
Social Media and Science a wedding made in Heaven...
 or in Hell?
Social Media and Science a wedding made in Heaven...
 or in Hell?Social Media and Science a wedding made in Heaven...
 or in Hell?
Social Media and Science a wedding made in Heaven...
 or in Hell?
 
Information visualisation: 
Data ink design principles
Information visualisation: 
Data ink design principlesInformation visualisation: 
Data ink design principles
Information visualisation: 
Data ink design principles
 
InfoVis1415: slides sessie 8, 20 april 2015
InfoVis1415: slides sessie 8, 20 april 2015InfoVis1415: slides sessie 8, 20 april 2015
InfoVis1415: slides sessie 8, 20 april 2015
 
A short history (and even shorter future)
 of information visualisation
A short history (and even shorter future)
 of information visualisationA short history (and even shorter future)
 of information visualisation
A short history (and even shorter future)
 of information visualisation
 
InfoVis1415: slides sessie 7, 30 March 2015
InfoVis1415: slides sessie 7, 30 March 2015InfoVis1415: slides sessie 7, 30 March 2015
InfoVis1415: slides sessie 7, 30 March 2015
 
InfoVis1415: slides sessie 6, 23 March 2015
InfoVis1415: slides sessie 6, 23 March 2015InfoVis1415: slides sessie 6, 23 March 2015
InfoVis1415: slides sessie 6, 23 March 2015
 
History of Human Computer Interaction
History of Human Computer InteractionHistory of Human Computer Interaction
History of Human Computer Interaction
 
InfoVis1415: slides sessie 5, 9 March 2015
InfoVis1415: slides sessie 5, 9 March 2015InfoVis1415: slides sessie 5, 9 March 2015
InfoVis1415: slides sessie 5, 9 March 2015
 
InfoVis1415: slides sessie 4, 2 March 2015
InfoVis1415: slides sessie 4, 2 March 2015InfoVis1415: slides sessie 4, 2 March 2015
InfoVis1415: slides sessie 4, 2 March 2015
 
InfoVis1415: slides sessie 3, 23 Feb 2015
InfoVis1415: slides sessie 3, 23 Feb 2015InfoVis1415: slides sessie 3, 23 Feb 2015
InfoVis1415: slides sessie 3, 23 Feb 2015
 
InfoVis1415: slides sessie 2, 16 Feb 2015
InfoVis1415: slides sessie 2, 16 Feb 2015InfoVis1415: slides sessie 2, 16 Feb 2015
InfoVis1415: slides sessie 2, 16 Feb 2015
 
Technology that makes HUMANS smarter
Technology that makes HUMANS smarterTechnology that makes HUMANS smarter
Technology that makes HUMANS smarter
 
InfoVis1415: slides sessie 1, 10 Feb 2015
InfoVis1415: slides sessie 1, 10 Feb 2015InfoVis1415: slides sessie 1, 10 Feb 2015
InfoVis1415: slides sessie 1, 10 Feb 2015
 
201502010 pen ocw_les1_erik
201502010 pen ocw_les1_erik201502010 pen ocw_les1_erik
201502010 pen ocw_les1_erik
 
Inleiding Human Computer Interaction
Inleiding Human Computer InteractionInleiding Human Computer Interaction
Inleiding Human Computer Interaction
 
PenO1: tools
PenO1: toolsPenO1: tools
PenO1: tools
 

Recently uploaded

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 

Recently uploaded (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 

Gegevensbanken laatste les: XML...

  • 1. Gegevensbanken “Laatse Les” Prof. Erik Duval 2009 - 2010 1 Sunday 30 May 2010
  • 3. NoSQL (Met dank aan Steven Noels) • XML (met dank aan prof. Olivié!) • over het examen... 3 Sunday 30 May 2010
  • 4. Sunday 30 May 2010 4 http://en.wikipedia.org/wiki/Extensible_Markup_Language
  • 5. Sunday 30 May 2010 5 http://www.itjobboard.be/ICT-banen/xml/Belgie/alle/0/relevantie/nl/
  • 6. 6 http://www.khbo.be/12385 Sunday 30 May 2010
  • 7. 7 http://www.w3.org/XML Sunday 30 May 2010
  • 8. 8 http://www.w3c.it/talks/2005/openCulture/slide7-0.html Sunday 30 May 2010
  • 9. Sunday 30 May 2010 9 http://en.wikipedia.org/wiki/List_of_XML_markup_languages
  • 10. XML is not ... • Extension of HTML • XHTML is XML-compliant, and extensible • Just for Web pages • Useful when data are stored or exchanged • Concerned with semantics • XML does not define semantics, just syntax • Innovative new technology • Standard, building on existing technology • Only a hype • Though also Sunday 30 May 2010 10
  • 11. XML is ... • Endorsed by W3C and major companies • Extensible • No tag name limitations • No language limitations • Human software developer-readable • Can be processed with basic text tools • Open standard • no vendor lock-in (in theory...) • Easy to implement • powerful, cheap (free), off-the-shelf XML tools Sunday 30 May 2010 11
  • 12. 1969: SGML (Standard Generalized Markup Language) • Meta-language: describe other languages • Powerful, but rather complicated • 1986: ISO standard • 1992: HTML (HyperText Markup Language) • Based on SGML • Simple, but limited • 1996: Start design of XML • By World Wide Web Consortium (W3C) • 1998: Publication of XML 1.0 12 Sunday 30 May 2010
  • 13. Design Goals • Easy to use over the Internet • Power of SGML • Simplicity of HTML • Human-legible • Easy to create • Compactness is not an issue • “The ASCII of the Web” 13 Sunday 30 May 2010
  • 14. XML Basics <Person> <Name> <First>Thomas</First> <Last>Atkinson</Last> </Name> <Age>30</Age> </Person> • Self-defined, meaningful tags • Separate data and its representation 14 Sunday 30 May 2010
  • 15. Language for defining syntax • Records and fields have explicit boundaries • parse-able without knowing structure (self-descriptive) • Unicode support (UTF-8, UTF-16, ...) • Web-aware • DTD, ENTITY and Schema can be loaded through URL • Strictly parsed: no ambiguity (case sensitive!) • Extensible: namespaces 15 Sunday 30 May 2010
  • 16. <?xml version="1.0” encoding=“UTF-8”?> <!-- processing instruction: XML follows --> <!DOCTYPE addressbook SYSTEM "http://www/~koenh/ddml/addressbook.dtd”> <!-- Document Type Declaration... --> <!-- ExternalDTDPointer --> <addressbook> <!--root element --> <person first-name="John" family-name="Doe” employee-number="1234"> <contact-info> <email address="Jdoe@home.com"/> </contact-info> <address street="Celestijnenlaan” number="200A"/> </person></addressbook> 16 Sunday 30 May 2010
  • 17. <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  • 18. Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  • 19. Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  • 20. Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  • 21. Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  • 22. Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  • 23. Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  • 24. Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element • Major differences: • Case sensitive • Proper nesting: No <A> … <B> … </A> … </B> • Unicode instead of ASCII 17 Sunday 30 May 2010
  • 25. Vocabularies • Agreed-upon XML tag sets for specific domain • Examples • Chemical Markup Language (CML) • Business: ebXML, RosettaNet, BizTalk • Mathematics: MathML • Multimedia: Synchronized Multimedia Integration Language (SMIL) • Etc. 18 Sunday 30 May 2010
  • 26. well-formed: follows XML syntax • Proper tag and attribute names • Tags properly closed • Attributes and text between tags do not contain ‘<‘ (escape with &lt;) • valid: well-formed and vocabulary • All elements and their attributes declared in DTD • Attribute values follow DTD type declarations • CDATA, ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, enumerated • Nesting and sequencing of elements follows DTD 19 Sunday 30 May 2010
  • 27. Elements • XML’s container for • Attributes • Character data • Other elements (“child” elements) • Delimited by opening and closing tags • Non-empty element: <name>..</name> • Empty element: <name/> • Form a simple hierarchic tree • Root = “document element” 20 Sunday 30 May 2010
  • 28. Attributes and Strings • Attributes • Name-value pairs: name=value • Only strings as value! • Strings • Enclosed by ‘...’ or “...” → replace with &apos; or &quot; • Character data • Any text that is not markup • ‘&’, ‘<’ and ‘>’ are markup → replace with &amp; &lt; and &gt; 21 Sunday 30 May 2010
  • 29. Document structure • Prolog (optional) • <?xml version="1.0” encoding=“UTF-8”?> • (compulsory) version="number" • encoding="character encoding" (optional) • Document type declaration • <!DOCTYPE document_element ... > • Body – The document element 22 Sunday 30 May 2010
  • 30. Another example <?xml version="1.0" standalone="no"?> <!DOCTYPE BankAccounts ...> <!-- This is an example XML document --> <BankAccounts> <Account accountNr="123-456789-01" use="personal"> <Owners> <Person ID="1258-a8d72-98"> <Name>John Smith</Name></Person> <Person ID="5842-df5ef-e9"> <Name>Claudia Scott</Name></Person> </Owners> <CreditCards><CreditCard number="12345"/></CreditCards> <Balance Currency="EUR">50000</Balance> </Account> ... </BankAccounts> 23 Sunday 30 May 2010
  • 31. Document Type Definition <!ELEMENT address EMPTY> <!-- no content, used for attributes only --> <!ATTLIST address city CDATA #REQUIRED <!-- character data: any string --> <!-- value for that attribute must be present --> state NMTOKEN #REQUIRED <!-- name token: letters, numbers, ., -, _ and : only --> number CDATA #REQUIRED street CDATA #REQUIRED> <!ELEMENT addressbook (person+)> <!-- 1 or more --> <!ELEMENT contact-info (home-phone|mobile-phone|email)*> <!-- choice --> <!-- o or more --> 24 Sunday 30 May 2010
  • 32. Document Type Definition <!ELEMENT email EMPTY> <!ATTLIST email address CDATA #REQUIRED> <!ELEMENT home-phone EMPTY> <!ATTLIST home-phone number CDATA #REQUIRED> <!ELEMENT job-info EMPTY> <!ATTLIST job-info is-manager (yes|no) 'no’ <!-- default --> emp-type (FullTime|PartTime) 'FullTime’ job-description CDATA #REQUIRED> <!ELEMENT misc-info (#PCDATA)> <!-- Parsed Character Data: cannot contain subelements --> <!ELEMENT mobile-phone EMPTY> <!ATTLIST mobile-phone 25number CDATA #REQUIRED> Sunday 30 May 2010
  • 33. Document Type Definition <!ELEMENT manager EMPTY> <!ATTLIST manager empnumber IDREF #REQUIRED> <!-- reference to empnumber of person --> <!ELEMENT person (contact-info,address, job-info?,manager?,misc-info?)> <!-- sequence --> <!-- zero or one --> <!ATTLIST person first-name CDATA #REQUIRED middle-initial CDATA #IMPLIED <!-- can, but need not be provided --> employee-number ID #REQUIRED <!-- can be referred to by manager.empnumber --> family-name CDATA #REQUIRED> 26 Sunday 30 May 2010
  • 34. namespaces: problem <widget type="gadget"> <head size="medium"/> <big><subwidget ref="gizmo"/></big> <info> <head><title>Gadget</title></head> <body><h1>Gadget</h1> A gadget contains a big gizmo </body> Name collision! </info> </widget> 27 Sunday 30 May 2010
  • 35. namespaces: approach • A collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names •xmlns:prefix="URI" • URI used only as identifier • does not need to point to anything • applies to all nested elements and attributes 28 Sunday 30 May 2010
  • 36. namespaces: example <widget xmlns="http://www.widget.org" xmlns:xhtml="http://www.w3.org/TR/xhtml1" type="gadget"> <head size="medium"/> <big><subwidget ref="gizmo"/></big> <info><xhtml:head><xhtml:title>Gadget </xhtml:title></xhtml:head> <xhtml:body><xhtml:h1>Gadget </xhtml:h1>A gadget contains... </xhtml:body></info> </widget> 29 Sunday 30 May 2010
  • 37. Another example <Address> <Server> <Street>Celestijnenlaan</Street> <Name>www</Name> <Nr>200A</Nr> <Address> 134.58.43.1 <City>Heverlee-Leuven</City> </Address> <Country>Belgium</Country> </Server> </Address> ? 30 Sunday 30 May 2010
  • 38. Another example (2) <Address <Server xmlns="www.all.edu/departments"> xmlns="www.dns.net/servers"> <Street>Celestijnenlaan</Street> <Name>www</Name> <Nr>200A</Nr> <Address> <City>Heverlee-Leuven</City> 134.58.43.1 </Address> <Country>Belgium</Country> </Server> </Address> <Department xmlns:edu="www.all.edu/departments" xmlns:dns="www.dns.net/servers"> <edu:Address> <Street>Celestijnenlaan</Street> ... </edu:Address> <dns:Name>www</dns:Name> <dns:Address>134.58.43.1</dns:Address> </Department> 31 Sunday 30 May 2010
  • 39. Accessing XML documents • Manual text file manipulation • Cumbersome & Error-prone • Parser • Simplifies document manipulation • Ensures proper grammar, well-formedness • Abstracts content from grammar • Accessed through standard API • Document Object Model (DOM) • Simple API for XML (SAX) 32 Sunday 30 May 2010
  • 40. DOM parser • create DOM object tree • SAX parser • generates events when elements encountered • one-pass translation • no need to keep whole document tree in memory • Both can be validating or non-validating • Many available (most freeware, open source) • ibm xml4j, apache xerces, sun parser, microsoft, datachannel, oracle, ... 33 Sunday 30 May 2010
  • 41. DOM approach http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview/3_apis.html#JAXP 34 Sunday 30 May 2010
  • 42. DOM Node Tree Doc <?xml version="1.0"?> Com An example XML document <!-- An example XML document --> El BankAccounts <BankAccounts> El Account <Account accountNr="123-456789-01“> <Owner ID="1258-a8d72-98"> Att accountNr = “123-456789-01” John Smith El Owner = “John Smith” </Owner> <Balance Currency="EUR"> Att ID = “1258-a8d72-98” 50000 El Balance = “50000” </Balance> </Account> Att Currency = “Eur” <Account ...> ... El Account </BankAccounts> ... 35 Sunday 30 May 2010
  • 43. parsing: DOM public void print(Node node) { ... NodeList nlist=node.getChildNodes(); if (nlist != null) { int l = nlist.getLength(); for (int i=0; i<l; i++) { print(nlist.item(i)); ... }...}...} 36 Sunday 30 May 2010
  • 44. Dom Benefits & Drawbacks • Benefits • W3C Recommendation • Language- and platform-independent • Random access • Intuitive • Drawback • Entire object tree in memory 37 Sunday 30 May 2010
  • 45. Simple API for XML (SAX) • Not an official standard • Ad-hoc product by XML developers • Primarily Java API • Event-based mechanism • Don’t call the parser, the parser calls you • No object model in memory • Programmer must keep state information 38 Sunday 30 May 2010
  • 47. SAX parsing model Application new ContentHandler() ContentHandler new Parser() Parser setContentHandler() parse() startDocument() startElement() characters() endElement() endDocument() 40 Sunday 30 May 2010
  • 48. parsing: SAX $xml_parser = xml_parser_create(); xml_set_element_handler($xml_parser, "startQuestion","endQuestion"); ... xml_parse($xml_parser,$data,feof($fp)) ... function startQuestion($parser,$name,$attrs) { ...if ($name == "QUESTION") ...new Question($attrs["QTEXT"]); ... 41 Sunday 30 May 2010
  • 49. Start and end of document – startDocument() – endDocument() • Start and end of element – startElement(namespace, name, qname, attlist) – endElement(namespace, name, qname) • Character data – characters(char[] ch, int start, int length) • Processing Instruction – processingInstruction(target, data) • No event for comments! Sunday 30 May 2010 42
  • 50. Another SAX example <?xml version="1.0" standalone="no"?> <!DOCTYPE BankAccounts ...> <!-- This is an example XML document --> <BankAccounts> <Account accountNr="123-456789-01" use="personal"> <Owners> <Person ID="1258-a8d72-98"><Name>John Smith</Name></Person> <Person ID="5842-df5ef-e9"><Name>Claudia Scott</Name></Person> </Owners> <CreditCards><CreditCard number="12345"/></CreditCards> <Balance Currency="EUR">50000</Balance> </Account> ... </BankAccounts> 43 Sunday 30 May 2010
  • 51. public class AvgBalanceCalculator extends DefaultHandler {private double total = 0.0; private int count = 0; private boolean isBalance = false; public void startElement(String uri, String name, String qname, Attributes atts) {if (name.equals(“Balance")) { isBalance = true; count++; }} public void characters(char[] ch, int start, int len) throws SaxException {if (isBalance) { String help = new String(ch, start, len); double balance = (new double(help)).doubleValue(); total = total + balance; isBalance = false; }} public void endDocument() {if (count != 0) System.out.println(“Average balance is ”+(total/count)); } } 44 Sunday 30 May 2010
  • 52. SAX Benefits & Drawbacks • Benefits • Suitable when • parsing large documents • constructing proprietary object structures • only small subset of information is needed • Simple and fast • Drawbacks • Read-only • No random access • Complex searches messy to program Sunday 30 May 2010 45
  • 53. beperkingen van DTDs • geen typering van tekst elementen en attributen • alle waarden zijn strings, geen integers, reals, enz. • ongeordende verzameling van subelementen moeilijk te definiëren • orde is meestal irrelevant in gegevensbanken • IDs en IDREFs zijn niet getypeerd • het DNO attribuut van een EMPLOYEE kan een referentie bevatten aan een andere EMPLOYEE, wat zinloos is vb. <EMPLOYEE SSN="_888665555 " SEX="M" DNO="_888665555 "> • het DNO attribuut zou als beperking moeten hebben dat het slechts aan een DEPARTMENT kan refereren 46 Sunday 30 May 2010
  • 54. XML Schema • typering van waarden • vb. integer, string, enz. • ook beperkingen op min/max waarden • types door gebruiker gedefinieerd • is gespecificeerd in XML syntax, • meer gestandaardiseerde voorstelling • is geïntegreerd met namespaces • en nog andere mogelijkheden • lijst types, uniciteitsbeperking op sleutels, verwijssleutelbeperkingen, overerving,… 47 Sunday 30 May 2010
  • 55. XSDL • XML Schema Definition Language • documenten met suffix .xsd 48 Sunday 30 May 2010
  • 56. XML Schema: voorbeeld XML schema <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> .... <xsd:element name="PWORKER" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="HOURS" type="xsd:float"/> </xsd:sequence> <xsd:attribute name="SSN" type="xsd:IDREF" use="required"/> </xsd:complexType> </xsd:element> .... </xsd:schema> XML instantie <PWORKER SSN="_123456789"> <HOURS>7.5</HOURS> </PWORKER> 49 Sunday 30 May 2010
  • 57. XML: eenvoudige types – ingebouwde eenvoudige types • string, integer, decimal, float, boolean, date, time,… • <xsd:element name=“gebdat” type=“xsd:date” /> – door gebruiker gedefinieerde eenvoudige types • gedefinieerd met simpleType element • restriction element geeft het basistype waarop gesteund is • <xsd:simpleType name=“salaryRange”> <xsd:restriction base=“xsd:integer”> <xsd:minInclusive value=“25000” /> <xsd:maxInclusive value=“100000” /> </xsd:restriction> </xsd:simpleType> 50 Sunday 30 May 2010
  • 58. XML: eenvoudige types <xsd:simpleType name=“studentClassificatie”> <xsd:restriction base=“xsd:string”> <xsd:enumeration value=“bachelorstudent” /> <xsd:enumeration value=“masterstudent” /> <xsd:enumeration value=“doctorstudent” /> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name=“deptType”> <xsd:restriction base=“xsd:string”> <xsd:length value=“3” /> </xsd:restriction> </xsd:simpleType> 51 Sunday 30 May 2010
  • 63. XPath (example) ROOT COMPANY /COMPANY/EMPLOYEE EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 56 Sunday 30 May 2010
  • 64. ROOT COMPANY / COMPANY/EMPLOYEE EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 57 Sunday 30 May 2010
  • 65. ROOT COMPANY / COMPANY/EMPLOYEE EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 58 Sunday 30 May 2010
  • 66. ROOT COMPANY / /COMPANY EMPLOYEE EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 59 Sunday 30 May 2010
  • 67. ROOT COMPANY EMPLOYEE /COMPANY/ EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 60 Sunday 30 May 2010
  • 68. XPath ROOT COMPANY /COMPANY/EMPLOYEE EMPLOYEE <EMPLOYEE SSN="_123456789" SEX="M“ SSN SUPERSSN="_333445555" DNO="_5"> <FNAME>John</FNAME> _123456789 <MINIT>B</MINIT> .... EMPLOYEE </EMPLOYEE> <EMPLOYEE SSN="_333445555" SEX="M“ SSN SUPERSSN="_888665555" DNO="_5"> <FNAME>Franklin</FNAME> <MINIT>T</MINIT> _333445555 <LNAME>Wong</LNAME> <BDATE>08-DEC-45</BDATE> </EMPLOYEE> EMPLOYEE <EMPLOYEE SSN="_999887777" SEX="F“ SUPERSSN="_987654321" DNO="_4"> SSN <FNAME>Alicia</FNAME> _999887777 ..... 61 Sunday 30 May 2010
  • 69. XML family of technologies • Xlink: hypertext • XSL: Extensible Style Sheet Language • XSL-T Transformation • Formatting Objects • Xschema: additional constraints on attribute types • and more... 62 Sunday 30 May 2010
  • 70. XML applications • RDF: Resource Description Framework • infra • XHTML: eXtensible HTML en HTML5 • XML compliant HTML • MathML • SMILE: synchronized multimedia presentation • Many others • Chemical Markup Language,Vector Graphics Markup Language, Open Software Description Format, Weather observation, astronomical data, financial data, electronic components, workflow, business cards, real estate, newspaper, classifieds, javadoc, human resource, advertising, architecture …. 63 Sunday 30 May 2010
  • 71. XML Working Groups • XML Coordination • XML Core • XSL (XSLT, XSL/FO) -> W3C architecture • Efficient XML Interchange • XML Processing Model • XML Query (XQuery, XPath) • XML Schema • Service Modeling Language (SML) 64 Sunday 30 May 2010
  • 72. More XPath Features • Operator “|” used to implement union • E.g. //EMPLOYEE[count(DEPENDENT) = 1] | //EMPLOYEE[not(DEPENDENT)] • gives employees with either 0 or 1 dependents • “//” can be used to skip multiple levels of nodes • E.g. /COMPANY//FNAME • finds any FNAME element anywhere under the /COMPANY element, regardless of the element in which it is contained. • A step in the path can go to: parents, siblings, ancestors and descendants of the nodes generated by the previous step, not just to the children • “//”, described above, is a short from for specifying “all descendants” • “..” specifies the parent. • e.g. : /COMPANY//FNAME/../BDATE 65 Sunday 30 May 2010
  • 73. XQuery • laat toe om meer algemene queries te formuleren dan XPath • algemene vorm: FLWOR uitdrukking FOR < for-variabele > IN < in-uitdrukking > LET < let-variabele > := < let-uitdrukking > [ WHERE < filter-uitdrukking > ] [ ORDER BY < orde-specificatie > ] RETURN uitdrukking > < • opm: FOR en LET kunnen alleen of samen voorkomen 66 Sunday 30 May 2010
  • 74. Q1: voornaam en familienaam van alle werknemers die meer dan 70000 verdienen • FOR $x IN doc(www.company.com/info.xml) // employee [employeeSalary > 70000] / employeeName RETURN < res > $x / firstName, $x / lastName </ res > • alternatief: FOR $x IN doc(www.company.com/info.xml) company / employee WHERE $x / employeeSalary > 70000 RETURN < res > $x / employeeName / firstName, $x / employeeName / lastName </ res > 67 Sunday 30 May 2010
  • 75. Q3: voornaam en familienaam van alle werknemers die meer dan 20 uur op project nummer 5 werken, met dat aantal uren • FOR $x IN doc(www.company.com/info.xml) / company / project [projectNumber = 5] / projectWorker , $y IN doc(www.company.com/info.xml) / company / employee WHERE $x/hours > 20.0 AND $y.ssn = $x.ssn RETURN < res > $y / employeeName / firstName, $y / employeeName / lastName, $x / hours </ res > 68 Sunday 30 May 2010
  • 76. The End... Bedankt! Vragen...? 69 Sunday 30 May 2010
  • 77. NoSQL • non-relational • distributed • open source • horizontally scalable • “web scale” 70 Sunday 30 May 2010
  • 78. NoSQL • non-relational • schema free • distributed • easy replication • open source • simple API • horizontally scalable • BASE (not ACID) • “web scale” 70 Sunday 30 May 2010
  • 79. Systems • Core: Hadoop, HBase, Cassandra, Hypertable, ... • Docs: CouchDB, MongoDB, Riak, Terrastore, ... • Key-Value, tuple: Amazon SimpleDB, Azure, ... • Graph: Neo4J, Bigdata, InfoGrid, HyperGraph, ... • Object:Versant, Perst, ZODB, ... • Grid: GigaSpaces, Hazelcast, ... • XML: Tamino, eXist, Mark Logic, Xindice, ... • ... 71 http://nosql-databases.org/ Sunday 30 May 2010
  • 80. nosql • Google BigTable • Amazon Dynamo • Open source: HBase • Cassandra: last.fm, FaceBook 72 Sunday 30 May 2010
  • 81. nosql: why • big data sets: • Digg green badge: 3 TB • Facebook inbox: 50 TB • eBay overall data: 2 PB 73 Sunday 30 May 2010
  • 86. Text 76 http://www.slideshare.net/oemebamo/database-sharding-at-netlog-presentation Sunday 30 May 2010
  • 87. no attempt to ACID • Atomicity • Consistency • Isolation • Durability • trade ACID off in favor of high availability 77 Sunday 30 May 2010
  • 88. query • associative array, key-value pair • XQuery • SPARQL 78 Sunday 30 May 2010
  • 89. Vragen...? 79 Sunday 30 May 2010