7. XML is hard, right?
Some hard things:
• Roundtripping data
• Manipulating XML via DOM API
• Preserving element sibling order,
comments, XML entities etc.
9. XML::Pastor
• I didn’t write it
• Written by Ayhan Ulusoy
• Available on CPAN
• Abstracts away some of the pain of XML
10. What does it do?
• Generates Perl code from W3C XML
Schema (XSD)
• Roundtrip and validate XML to/from Perl
without loss of schema information
• Lets you program without caring about
XML structure
11. Parsing with Pastor
• Parse entire XML into XML::LibXML::DOM
object
• Convert XML DOM tree into native Perl
objects
• Throw away DOM, no longer needed
12. Reasons to not use
XML::Pastor
• When you have no XML Schema
• Although several tools can infer XML
schemata from documents
• It’s a code-generator
• No stream parsing
13. XML::Pastor
Code Generation
• Write out static code to tree of .pm files
• Write out static code to single .pm file
• Create code in a scalar in memory
• Create code and eval() it for use
15. How Pastor works
Code generation
• Parse schemata into schema model
• Perl data structures containing all the
global elements, types, attributes, ...
• “Resolve” Model - determine class names,
resolve references, etc
• Create boilerplate code, write out / eval
17. How Pastor works
Generated classes
• Each generated class (i.e. type) has classdata
“XmlSchemaType” containing schema
model
• If the class isa SimpleType it may contain
restriction facets
• If the class isa ComplexType it will contain
info about child elements and attributes
18. How Pastor works
In use
• If classes generated offline, then “use”
them, if online then they are already loaded
• These classes have methods to create,
retrieve, save object to/from XML
• Manipulate/query data using OO API to
complexType fields
• Validate modified objects against schema
38. XML::Pastor Scope
• Good for “data XML”
• Unsuitable for “mixed markup”
• e.g. XHTML
• Unsuitable for “huge” documents
39. XML::Pastor Supported
XML Schema Features
• Simple and Complex Types
• Global Elements
• Groups, Attributes, AttributeGroups
• Derive simpleTypes by extension
• Derive complexTypes by restriction
• W3C built-in Types, Unions, Lists
• (Most) Restriction Facets for Simple types
• External Schema import, include, redefine
40. XML::Pastor
known limitations
• Mixed elements unsupported
• Substitution groups unsupported
• ‘any’ and ‘anyAttribute’ elements
unsupported
• Encodings (only UTF-8 officially supported)
• Default values for attributes - help needed
41. XML Data Binding
• Binding XML documents to objects
specifically designed for the data in those
documents
• Allows e.g. data-centric applications to
manipulate data more naturally than by
using DOM API
47. XML::Twig
• Manipulates XML directly
• Using code is coupled closely to
document structure
• Optimised for processing huge documents
as trees
• No schemata, no validation
48. XML::Compile
• Original design rationale is to deal with
SOAP envelopes and WSDL documents
• Different approach but similar goals to
Pastor - processes XML based on XSD into
Perl data structures
• More like XML::Simple with Schema
support
49. XML::Compile pt. 2
• Schema support incomplete
• Shaky support for imports, includes
• Include restriction on targetNamespace
• I haven’t used it yet but it looks good
50. XML::Simple
• Working roundtrip binding for simple cases
• e.g. XMLout(XMLin($file))
works
• Simple API
• Produces single deep data structure
• Gotchas with element multiplicity
51. XML::Simple pt. 2
• No schemata, no validation
• Can be teamed with a SAX parser
• More suitable for configuration files?
52. XML::Smart
• Similar implementation to XML::Pastor
• Uses tie() and lots of crac^H^H^H^Hmagic
• Gathers structure information from XML
instance, rather than schema
• No code generation!
53. XML::Smart pt. 2
• No schemata, so no schema validation
• Based on Object::MultiType - overloaded
objects as HASH, ARRAY, SCALAR, CODE
& GLOB
• Like Pastor, overloads array/hashref access
to the data - promotes decoupling
• Reasonable docs, some community growing
57. XML Schema Inference
• Create an XML schema from an XML
document instance
• Every document has an (implicit) schema
• Tools like Relaxer, Trang, as well as the
System.Xml.Serializer the .NET Framework
can all infer XML Schemata from document
instances