Lecture 3 - Slicing and Dicing Data Categories -The Art of Taxonomy
1. XML FOR DUMMIES
http://it-slideshares.blogspot.com
Book author: Lucinda Dykes and Ed Tittle
Part 1 : XML Basics
Lecture 3: Slicing and Dicing Data Categories:
The Art of Taxonomy
2. http://it-slideshares.blogspot.com
Contents
1 Taking Stock of Your Data
2 Breaking Down Data in Different Ways
3 Developing Your Taxonomy
4 Testing Your Taxonomy
5 Looning Ahead to Validation
3. http://it-slideshares.blogspot.com
1. Taking Stock of Your Data
Looking at business practices and partners.
♦ Taking a close look at the flow information in your business will help you
identify the components of your content.
♦ Each different process is a specialized use of information.
♦ Take some time to talk to those people who create or frequently process
the data.Find out:
○ What users do with individual pieces of information.
○ What data users think is impossible to live without.
○ What data is unnecessary or optional.
Gathering some content.
♦ The more complete your collection of sameples is, the better chance you
have of creating markup that file fits all your content. Here are some ideas:
○ Get data from multiple source
○ Get a lot of data.
○ Get a lot of data from multiple sources.
4. http://it-slideshares.blogspot.com
Taking Stock of Your Data(cont..)
Checking whether a DTD or schema already exists.
♦ When you create a document according to a DTD or schema, you use a predefined
structure that specifies how the components of markup should be used to describe a
particular kink of content.
♦ Predifined DTDs and schemas usually come from a couple of different sources:
○ Industry groups or organizations: that want a common format for
standard data such as OFX, CML.
○ Application builders: who created their systems to run with content
described by a particular set of markup. For example : CFML, ASP.
Searching for a schemaa repository.
♦ You could search for a schema or DTD, or add one of your own to the repository.
Online at sites such as www.Biztalk.org or www.schema.net
♦ There is one still schema repository hosted by OASIS at ww.xml.org/xml/registry.jsp.
OASIS provides a every comprehensive list of proposed XML applicatuons and industry initiatives at
www.oasis-open.org.xml.html#applications.
♦ The whole point of using XML is to make your content as accessible to a system as
possible.
♦ Content analysis with XML in mind is much easier when you have a handle on the ins and
outs of XML Schemas and DTDs and how to put them together.
5. http://it-slideshares.blogspot.com
2. Breaking Down Data in Different Ways.
When developed our hypothetical book-selling business, we went through the same
data-analysis process we’re sharing with you. After we gathered our documents and
familiarized ourselves with them, we took a good hard look at what we learned
about our content. Here’s what we came up with:
- Books can be categorized in a number if different ways, includeing: Author,
title, publication date, publisher, edition, language, number of pages, size,
type(fiction, nonfiction),special features, format, price, ISBN.
- The customer information we collect includes: first name, last name, address,
city, state, zip code, email address, phone number.
- The sales information we gather in addition to customer information includes:
data , item number, price, total cost
Winnowing out the wheat from the chaff.
♦ When we analyzed our content, we made some judgments about what
information we needed to collect.
♦ We chose to exclude not useful information from our taxonomy strategy.
♦ When you content-analysis process for knowing the purpose of your markup
can help you keep your goals in sight.
6. http://it-
slideshares.blogspot.com
Breaking Down Data in Different Ways(cont..).
Types of data that can be stored in XML.
♦ XML content can be divided into two main groups: data-intensive
and document or text-intensive.
♦ On the data end of the spectrum, you find collections of data like
those that reside in a database. Each collection consists of a more
or less abitrary number of record structures, in which record contains:
○ A unique identifier or key.
○ A common collection of named, organized values.
♦ XML can capture and represent data that describes other
collection of data.
♦ XML can handle many kinds of data and can accommodate binary
information, it can supply data to other computer applications
outside XML’s control.
♦ XML document can reference anything that a computer can
represent.
7. http://it-
slideshares.blogspot.com
3. Developing Your Taxonomy
After you look at your content, you can start breaking it down into
categories and subcategories.
Here’s how we broke it down for our hypothetical book bisiness:
● Book ●Sales
○ Item Number ○ Item Number
○ Title ○ Price
○ Shipping
○ Author ○ Total Cost
○ Publisher ○ Date
○ Price ○ Source
○ Content Type
○ Format
○ ISBN.
8. http://it-
slideshares.blogspot.com
●Customer ♦ As you can see, Item Number appears as
a subcategory in both the Book and the
○Customer Number
Sales categories.
○First Name
♦ The Item Number is unique to each copy
○Last Name of a book, which makes it easy to keep track
of sales and inventory
○Address
○City
○State
○Zip Code
○E-mail Address
○Phone Number.
9. http://it-
slideshares.blogspot.com
4. Testing Your Taxonomy
Using trial and error for the best fit.
♦ You work with your markup, experiment with using
combinations of elements and attributes until you get the best results.
For example:We used two nested elements to specify the content type
for a book.
<book>
<contentType>Fiction</contentType>
</book>
♦ The markup would use as many ‘content Type’ elements
within the ‘book’ but we decided to go with ‘content Type’ as
an attribute of the ‘book’ element instead, as show here.
<book contentType=”Fiction”/>
♦ We decided on this route because we thought that we’d
want to predefine the category names and require valid
documents choose one of the names from the list in our
DTD or schema.
10. http://it-
slideshares.blogspot.com
Testing Your Taxonomy(cont..)
Testing Your content analysis.
♦ The best way to your final markup is to apply it to as many
content samples as you can lay your hands on.
-Shows the final draft of our bookstore markup.
<?xml version=”1.0” standalone=”yes”?>
<books>
<book contentType=”Fiction” format=”Hardback”>
<bookInfo>
<title>The Da Vinci Code</title>
<author>Brown, Dan</author>
<publisher>Doubleday</publisher>
<isbn>0385504209</isbn>
</bookInfo>
<salesInfo>
<price priceType=”Retail”>$24.95</price>
<itemNumber>0385504209-1</itemNumber>
11. http://it-
slideshares.blogspot.com
<date>January 12, 2005</date>
<source sourceType=”Retail” />
<shipping>$5.00</shipping>
<cost>$29.95</cost>
</salesInfo>
</book>
<totalCost>$29.95</totalCost>
<customer custType=”newRetail”>
<custNumber>5594</custNumber>
<lastName>Blow</lastName>
<firstName>Joe</firstName>
<address>52 Joetta Lane</address>
<city>Cottage Grove</city>
<state>OR</state>
<zip>97424</zip>
<phone>767-3333</phone>
<email>jblow@pacinfo.com</email>
</customer>
♦ The first line in our code <?xml version=”1.0” standalone=”yes”?> is an XML
declaration. You’ll learn all about XML declaratuons and all the other details of
XML syntax in Chapter 5.
12. http://it-
slideshares.blogspot.com
5. Looking Ahead to Validation
You can get to make up as many rules as you want or need to
make the markup do what you want it to.
The rules that you create with XML can dictate which elements
make up an XML document.
Creating XML document descriptions enables you to state the
rules that a whole class of documents must follow.
The two main forms of XML document descriptions in use today
are DTDs and XML schemas.
DTDs work well for validating XML with text-intensive content,
while XML schemas work well for validating XML with data-
intensive content.
13. http://it-
slideshares.blogspot.com
Thank you
The end chapter 3