4. An Aside: AP’s Ingestion Pipleline ATOM + XHTML One way we ingest content: we transform ATOM and XHTML into our internal XML (APPL) and NITF XSLT Transform APPL + NITF This is greatly simplified, obviously.
5. <p>The budget was just £100.</p> <p>How could it be done for so little money? <p>Luckily open source tools were available.</p> These are not new problems.</p> The solutions were even standardized.<p/> Converting from HTML to XML
6. Hard to enforce rules in the spec “HeadLine - this element must contain the same value as the entry’s <title> element” “summary is required for non-text content items, such as news photos and video. This element is optional for text story content items.” XML structure complies with XSD… …but can fail in downstream systems
7.
8. Validate and Fix Prior to Ingestion Original ATOM + XHTML Tidy fixes sloppy HTML Custom XSLT tidies up XML W3C schema validates structure & syntax Schematron schema validates business rules Valid ATOM + XHTML, ready for ingestion
10. Schematron Fact checker for XML documents Business rules that can’t be expressed in W3C XSD schema MediaType="Video" Format="ANPA1312" Previously, we had to inspect new feeds to catch errors The risk is that feeds are approved but errors appear later (Not to mention manual checking of XML is tedious)
11. Schematron Small, powerful, lightweight fact-checker for XML documents Specify constraints using XPATH rules You write the error messages Schematron Schema One time compile into an XSLT Validation as an XSLT transform Validate Presence or absence of specific content Relationships between elements and attributes Reports Validation reports
12. Anatomy of a Schematron Rule Establish the context of the rule with an XPATH expression XSLT-style test establishes the constraint for each assert <sch:rule context="atom:feed/atom:link"> <sch:assert test="starts-with(@href, 'http://')"> The feed/link/@href must contain an http url </sch:assert> </sch:rule> You write the error message to be used if the assert fails
13. DSDL – Pipeline Validation XSD RELAX NG Grammar Schematron Rules NVDL Namespace dispatch DTTL Datatype CRSL Character repertoire DSRL Document Semantic Renaming Still under development
14. Declaratively specify a pipeline (using XML, naturally) Similar in concept to Yahoo! Pipes BizTalk But XML specific and a W3C standard