2. Pre-reqs!
Hopefully everyone already knows about
these…!
• Wiki: http://www.xcri.org/wiki/
(then click 1.2 specification)
• KB: http://www.xcri.co.uk/
(specifically the Course Data Programme
area)
• Forum: http://www.xcri.org/forum/
3. What does it (validation) aim to do?
Feed quality and consistency!
It is important to note that you will need exposure to both the XCRI-CAP 1.2
standard (via the wiki) and the Data Definitions and Vocabulary documents
(via the website) to create a feed which validates!
4. How does it work?
XML structural issues:
• Nesting
XML Validation • Namespace declarations
• Character literals
• …
XML document structure:
• Correct element capitalisation
XML Schema Validation • Correct element namespaces
• Simple element content validation
• …
Everything else:
• URL/Email/Telephone/etc. validation
Rule-based Validation • Element lengths
• Data Definitions Document rules
• …
5. What feedback do I get?
• A summary of issues Exceptions
broken down by severity
• Issues grouped by severity
then issue Warnings
• “Helpful” text
• Drill-down into line and
character information (if Recommendations
available)
6. Hints and Tips
• Start with small XML snippets (e.g. a single
course) then work upwards to a full
catalog
• If you have problems then use the XCRI
forum: http://www.xcri.org/forum
• Some issues require manual checking
• Remember that validation is iterative
• Bear in mind that element order is
important (for validation, NOT the spec)
7. How do I use it?
By far the easiest way is to use the online
version at http://validator.xcri.co.uk. This
way you get any bug fixes or rule base
alterations immediately…
...however the project is open-source and
you can download it all from
http://code.google.com/p/xcricap-validator/
(.NET 4.0, written in C#).
Hi, my name’s Craig Hawker and I’m here to give you an overview of the XCRI-CAP 1.2 Validator. You’ll all be glad to know that this is going to be very brief. I’m purposefully not going to dig deep into specifics but I’m sure people will have questions – some might already – so I’m around all day. I’m also doing a “validator surgery” outside after this until lunch. If anyone’s signed up, that is!I’m a software developer and I’ve been involved with XCRI-CAP for a couple of years now. I initially got involved by creating a feed for a local college, then started getting involved in providing tooling around XCRI-CAP 1.1 for the community. This resulted in the .NET XCRI-CAP Generator Library and the online XCRI-CAP 1.1 validator. Since then I’ve also been involved with XCRI-CAP 1.2 and am the primary developer of the XCRI-CAP 1.2 Validator, which is what we’re going to have a brief look at today.We’re going to, very quickly, go over:Pre-reqsWhat the validator aims to do and whyHow the validator works – at a very high levelWhat feedback you get out of the validatorSome hints and tips for using the validatorWhere the validator is and how you can get at itAn example of an issue identified by the validator and the process involved in resolving the issueAnd then that’s it. If anyone has any questions then I’m here all day and/or you can contact me using the details at the end of the presentation.
Okay, before we get started it’s best that I highlight a few pre-requisites. Hopefully everyone already knows about these but I’ll include them just in case.The wiki is where the XCRI-CAP 1.2 specification lives. It contains information on the elements, their namespaces, and information about how and where they should be used. It also contains links across to the sample schema files which, whilst not being part of the specification, are something you should be aware of and probably reference.The knowledgebase contains a huge amount of supplementary information around XCRI but the main bit I would say to focus on is the Course Data Programme area. This area contains the data definition documents which further refine elements that are mandatory and expected as part of this process.Finally: if you have any questions then the forum’s a great resource. There’s also the mailing list but that’s closed-access and, unless someone happens to be signed up when you send that email, there’s a chance it’ll not help other institutions. If it’s a question that you think would benefit others, I would recommend trying to use the forum.
When the XCRI-CAP 1.2 standard was completed, JISC recognised that it was important to ensure that the quality and consistency of the feeds produced under their guidance was high. This includes both from the perspective of technical validity but also with respect to the content that the feeds contained.In combination with the Data Definitions and Vocabulary documents – available via the website – the validator aims to help drive the quality and consistency of feeds produced by the JISC-funded Course Data Programme. It’s important to note that you will need access to both the XCRI-CAP 1.2 specification (the wiki) and the Data Definitions and Vocabulary documents (on the website) in order to produce a valid feed.
The validator works in three ways. Each of these do separate checks and they aim to pretty much build upon each other.Firstly, the system checks that the file is structurally-sound XML. This identifies things such as incorrect tag nesting, issues with undeclared namespace prefixes, that kind of thing. Any issues that are found are run through a “translator” to get a helpful error message rather than an obscure one from the XML subsystem.Secondly, the system checks that the file is valid according to the XML Schemas. XML Schemas declare what the structure of a document is – what elements from what namespaces are allowed, how many of them, etc. XML Schema documents for XCRI-CAP 1.2 have been made available through the XCRI.co.uk website and these schemas will be used by default. You CAN override those schemas by referencing others within your feed using schemaLocation – for expansion, for example – but this isn’t recommended as it may lead to inconsistencies in the feeds. However, it will respect them if you do.Then, thirdly, the system checks that the file is valid according to a rule-base which has been developed. This additional level of validation is required firstly because XML Schema is limited in some of the things it can express, but secondly because the XML Schema files only contain the XCRI-CAP 1.2 specification, NOT the additional rules from within the Data Definitions Document. In my experience so far, this is where some people have become unstuck. If you don’t have the data definitions and vocabulary framework documents then get them and read through. They’re available in a couple of different formats depending upon what’s easiest for your team to use.
So, once you’ve validated a feed, what do you end up with? We’ll go and do a couple of examples in a minute (assuming the connectivity works) but basically:You get a summary of issues, broken down by severity. There are three severities of issues: Exceptions, Warnings and Recommendations:Exceptions consist of structural issues (incorrect elements, casing issues, namespaces, some formatting…)Warnings consist of less important issues, but issues that should still be corrected (elements that shouldn’t be used, potential truncation of elements, some formatting)Recommendations consist of the issues that you should highly consider resolving, or that require manual checkingYou can click on each severity and get more information on the issues.Issues are then shown, one for each rule or exception that was encountered. So if you have 100 elements that contain a date formatted the wrong way, you only get a single line shown up. As most feeds will be generated by some code, this is likely to only be a single fix that’s required.For every issue or exception that’s encountered, the text is parsed and converted into something which is more usable, if it can. This will often include links out to the wiki or references back into the Data Definitions Document for further information, as well as guidance on fixing the issue.For every issue or exception that’s encountered, you can choose to click and drill down into more information including the line and character information if that’s exposed by the XML subsystem during validation.
“Start with small XML snippets”For example: start with a root <catalog /> element and a single provider within it. This will allow you to test most of your namespace prefixes, as well as identifying formatting for elements like telephone numbers, without having to deal with lots of issues caused by courses. Once you’ve got that validating (aside from the exception about there being no courses…!), add a single course. Then a single presentation. And so on. Once you have a validating feed, use that as a template for the XML your Course Management System is to produce.“Use the forum”Despite the best efforts of everyone involved, there’s always the possibility that you find an issue with the validator, don’t understand the exception text that’s being raised, or have an issue with one of the rules being run. If that’s the case then raise the issue on the XCRI forum. It’s not the most heavily utilised but you’ll find a lot of technical people check it on a fairly-frequent basis.“Manual checking”There are a couple of rules within the rulebase that may require you to manually check. These are rules that we can’t currently programmatically check. These include things like “only use the contributor element when other refinements are not available”, or “producers should use URLs for identifiers that also resolve to human-readable content”. You need to check these and disregard the issue as appropriate.“Validation is iterative”That’s a very important point. The validator itself uses xpath to identify elements to check the rules against. You can’t run a file through the validator, fix the issues, and assume it’s then fine. You need to check it again. A good example is a feed that contains a <presentation /> element within the wrong namespace. The validator will highlight the namespace issue but, because the namespace is incorrect, it won’t then run the <presentation /> validation rules on its contents.
I think that’s the waffle out the way. This is the important part: how do you get at it?The easiest way is to use the online version at http://validator.xcri.co.uk. All you need to do is go there with a web browser and either point it at a publically-available URL for your feed or, if it’s in development, paste in your XML file, and it’ll validate. The advantage of this is that if there’re any bug fixes or changes to the rulebase, or anything like that, you’ll get them automatically.That said, the project is open-source and can be downloaded from the Google Code repository you can see up there. It’s a Mercurial repository so you’ll need a Mercurial client. It’s a .NET 4.0 application written in C# so you’ll probably need Visual Studio 2010 (or 2012 if you’re really up to date) to use it. If anyone wants to make changes and push them back then catch me later and I can add you as a contributor.However, the only real reason I think that people will want to do that is if they want to include the validation code as part of a CI process or something similar; MOST people should just use the online version.