The process of converting data is one of migrating information from an unsuitable
source or format to a suitable one—often not an exact science. Data scoring is a way
to measure the accuracy of your conversion. Discover a simple scoring technique in
XQuery that you can apply to the result of a small text-to-XML conversion.
Scoring converted data is all about analyzing the quality of the conversion. Quality
can mean different things, and converting data from a database carries with it
different problems than converting data from documents with more natural language.
The technique that this tip presents makes no assumptions: You can apply it to any
XML code of interest. To see the technique in practice, you will convert plain
text—not comma-separated files, but plain text from news items grabbed from the
Internet.
Frequently used acronyms
• HTML: Hypertext Markup Language
• W3C: World Wide Web Consortium
• URL: Uniform Resource Locator
• XML: Extensible Markup Language
• XSLT: Extensible Stylesheet Transformations