How to build Perl classes with roundtrip data binding to XML, painlessly, using W3C XML Schema and XML::Pastor
Slides from a previous revision of this talk are online at:
http://www.slideshare.net/joelbernstein/painless-oo-xml-with-xmlpastorq-presentation/
I will be presenting an expanded, more practical, 2009 version of this talk. Now with more code and less theory!
- XML is hard, right? Some things which are hard.
- XML data binding
- Comparisons of modules
- XML::Twig
- XML::Smart
- XML::Simple
- XML::Pastor
- Pastor howto
- XML schema inference
- Trang, Relaxer
- Relaxer howto
- The future?
For more information on XML::Pastor see:
http://search.cpan.org/~aulusoy/XML-Pastor/
Relaxer download:
http://www.relaxer.jp/download/relaxer-1.0.zip
Relaxer book (Japanese...):
http://www.amazon.co.jp/exec/obidos/ASIN/4894715279/
Trang:
http://www.thaiopensource.com/download/trang-20030619.zip
8. XML Data Binding
• Binding XML documents to objects
specifically designed for the data in
those documents.
• I often have to do this.
9. XML is hard, right?
Some hard things:
• Roundtripping data
• Manipulating XML via DOM API
• Preserving element sibling order,
comments, XML entities etc.
23. XML::Pastor
• Available on CPAN
• Abstracts away some of the pain of XML
• Ayhan Ulusoy is the author
• I am just a user
24. What does it do?
• Generates Perl code from W3C XML
Schema (XSD)
• Roundtrip and validate XML to/from Perl
without loss of schema information
• Lets you program without caring about
XML structure
25. pastorize
• Automates codegen process
• Conceptually similar to DBIC::Schema::Loader
• TMTOWTDI - offline or runtime
• Works on multiple XSDs (caveat, collisions)
26. pastorize in use
pastorize --mode offline --style multiple
--destination /tmp/lib/perl
--class_prefix MyApp::Data
/some/path/to/schema.xsd
44. Parsing with Pastor
• Parse entire XML into XML::LibXML::DOM
object
• Convert XML DOM tree into native Perl
objects
• Throw away DOM, no longer needed
45. Reasons to not use
XML::Pastor
• When you have no XML Schema
• Although several tools can infer XML
schemata from documents
• It’s a code-generator
• No stream parsing
46. XML::Pastor Scope
• Good for “data XML”
• Unsuitable for “mixed markup”
• e.g. XHTML
• Unsuitable for “huge” documents
47. XML::Pastor
known limitations
• Mixed elements unsupported
• Substitution groups unsupported
• ‘any’ and ‘anyAttribute’ elements
unsupported
• Encodings (only UTF-8 officially supported)
• Default values for attributes - help needed
49. XML::Twig
• Manipulates XML directly
• Using code is coupled closely to
document structure
• Optimised for processing huge documents
as trees
• No schemata, no validation
50. XML::Compile
• Original design rationale is to deal with
SOAP envelopes and WSDL documents
• Different approach but similar goals to
Pastor - processes XML based on XSD into
Perl data structures
• More like XML::Simple with Schema
support
51. XML::Compile pt. 2
• Schema support incomplete
• Shaky support for imports, includes
• Include restriction on targetNamespace
• I haven’t used it yet but it looks good
52. XML::Simple
• Working roundtrip binding for simple cases
• e.g. XMLout(XMLin($file))
works
• Simple API
• Produces single deep data structure
• Gotchas with element multiplicity
53. XML::Simple pt. 2
• No schemata, no validation
• Can be teamed with a SAX parser
• More suitable for configuration files?
54. XML::Smart
• Similar implementation to XML::Pastor
• Uses tie() and lots of crac^H^H^H^Hmagic
• Gathers structure information from XML
instance, rather than schema
• No code generation!
55. XML::Smart pt. 2
• No schemata, so no schema validation
• Based on Object::MultiType - overloaded
objects as HASH, ARRAY, SCALAR, CODE
& GLOB
• Like Pastor, overloads array/hashref access
to the data - promotes decoupling
• Reasonable docs, some community growing
59. XML::Pastor Supported
XML Schema Features
• Simple and Complex Types
• Global Elements
• Groups, Attributes, AttributeGroups
• Derive simpleTypes by extension
• Derive complexTypes by restriction
• W3C built-in Types, Unions, Lists
• (Most) Restriction Facets for Simple types
• External Schema import, include, redefine
60. XML Schema Inference
• Create an XML schema from an XML
document instance
• Every document has an (implicit) schema
• Tools like Relaxer, Trang, as well as the
System.Xml.Serializer the .NET Framework
can all infer XML Schemata from document
instances
65. XML::Pastor
Code Generation
• Write out static code to tree of .pm files
• Write out static code to single .pm file
• Create code in a scalar in memory
• Create code and eval() it for use
66. How Pastor works
Code generation
• Parse schemata into schema model
• Perl data structures containing all the
global elements, types, attributes, ...
• “Resolve” Model - determine class names,
resolve references, etc
• Create boilerplate code, write out / eval
67. How Pastor works
Generated classes
• Each generated class (i.e. type) has classdata
“XmlSchemaType” containing schema
model
• If the class isa SimpleType it may contain
restriction facets
• If the class isa ComplexType it will contain
info about child elements and attributes
68. How Pastor works
In use
• If classes generated offline, then “use”
them, if online then they are already loaded
• These classes have methods to create,
retrieve, save object to/from XML
• Manipulate/query data using OO API to
complexType fields
• Validate modified objects against schema
69. Thanks for coming
See you next year
http://search.cpan.org/dist/XML-Pastor/