SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
XML and content strategy
Why and how to “future-proof” your content

Publishers and other information providers increasingly          By far, the most practical, most versatile tool for manipulating
use multiple media to display their content for various          content for both current formats and those not invented yet is
applications. Books become e-books, online journal articles      XML. XML (eXtensible Markup Language) is an open standard;
are published online first and in print later, and figures are   its power derives from the fact that XML has been adopted by
aggregated into image databases. Users request chunks of         entire industries, many government agencies, and platform
content, or the publisher assembles pieces of content from       developers. When new standards emerge, such as EPUB for
multiple publications into a new publication. Information        e-book readers, the standards are derived from generic XML,
users want what they want, when they want, in the form they      allowing even files created a few years hence to flow readily
want. As publishers work to respond to the changing needs        into the new standard.
of their constituencies, the challenge is: how can publishers
“future-proof” their content?
                                                                   By far, the most practical, most
Even today, content takes many forms and has many uses.            versatile tool for manipulating
Publishers find that they need to adapt their content in
various ways (figure 1).
                                                                   content for both current formats
                                                                   and those not invented yet is XML.
                                Sampling
                                                                 Some organizations think of XML as a different set of tags.
                                                                 While XML tags are different from those used in other
   •	 Web-ready HTML on proprietary platforms
                                                                 systems like SGML or HTML, XML is actually a different way of
   •	 HTML for web previewing
                                                                 thinking about content. Karen Colson, director of publishing
   •	 PDF for printing/viewing/downloading                       and communications at the Association for Research in Vision
   •	 Distribution by third-party aggregators (Ovid, EBSCO)      and Ophthalmology (ARVO) explains it simply,
   •	 Abstract & indexing services (Scopus)
   •	 Mobile devices (iPad, smartphones)                         XML describes content, not appearance.
   •	 Archival solutions (Portico)
                                                                 An XML tag (actually, a pair of tags—one at the beginning and
                                                                 one at the end of an element) might indicate that a section
Figure 1: Sampling of data output
                                                                 of copy is a first-level heading inside a book chapter. The
                                                                 actual appearance of the heading, however, is determined by
If today’s situation is not complicated enough, the future       a different style sheet for each application. The typeface and
is likely to be even more complex. How can information           size that appears in the book might be completely different if
providers respond to the changing needs of customers and         the book is available on an e-reader, and it might be different
new technologies with greater facility in terms of time, cost,   still if the book is included on the electronic platform of a
and effort?                                                      third-party aggregator.



SPi Global
2807 North Parham Road, Suite 350, Richmond, VA 23294
T 1 804 262 4219                                                                                       www.spi-global.com
The tag for a first-level heading also can function as metadata.
For instance, a book’s table of contents might be constructed          Organizations that want to get the
by copying chapter titles and first-level headings. Or, perhaps
an aggregator’s general search function could look primarily
                                                                       most out of XML apply it consistently
at first-level headings. In either case, a pair of tags that starts    and as early as possible in the content
out regulating appearance can have multiple programmatic               development process.
applications as well.

Organizations that want to get the most out of XML apply              When an error occurs, the correction is made in the native
it consistently and as early as possible in the content               XML file so that the error can be corrected in every product
development process. When this happens, editing changes               that flows from the content. Making corrections in the native
are captured within a single, authoritative XML file, all XML         XML file represents the industry’s best practice, but practical
files are built according to the same rules, and the final            challenges exist even with this approach.
XML file is the source for all types of output. Creating this
capability requires thoughtful planning and technically astute        Julia Sawabini, director of e-commerce at Elsevier, explains
implementation.                                                       that to build the web page for a particular product, Elsevier
                                                                      pulls content from a database containing fields variously
Planning for end-to-end XML Workflow                                  populated by editorial, production, and marketing people.
The most reliable and powerful way to apply XML to                    The information is organized via style sheets but no content
documents is to do so at the very beginning of the production         is created at this point. “If there’s something wrong on
cycle. In organizations where content is created by employees,        the website, it’s wrong someplace along the way. I can’t
the content creator may enter tags, often using shortcuts or          change it.”
templates. For most publishers, however, tags are applied by
skilled markup operators based on the list of tags available          Once a correction is made, the change may not appear
to them (more on this below). Most markup operators work              immediately, as the website is updated in batches at specified
for compositors, so their function sometimes overlaps                 intervals. The incorrect product information will appear on the
with typesetting. But markup is a distinct function in the            site until the update takes place. Also, the incorrect material
production process. Once the tags are applied, production             will remain on the servers of distributors, e-bookstores, and
can proceed (figure 2).                                               other outlets for the information unless corrected files are
                                                                      sent and uploaded.

                        XML markup                                    An analogous challenge occurs in publishing printed materials.
                                                                      Sometimes a production person spots an error while
                         Copyediting                                  processing a PDF for the printer. The temptation, and often
                                                                      the reality, is that the production person corrects the PDF and
                                                                      sends it on to the printer, breathing a sigh of relief. Unless
                          Typsetting
                                                                      the production manager remembers to go back to make the
                                                                      same correction, the error still exists in the XML file.
                         Page layout
                                                                      Implicit in this discussion is the notion that XML workflow
                        Proofreading                                  includes an element that is rarely critical in a single-
                                                                      medium product—what director of production at Elsevier
                                                                      Phil Schafer describes as “a central content repository with
                     Content Repository
                                                                      full functionality.” It is not enough to save all content to a
                                                                      particular server. Ideally, the content will flow into a database-
                      Multiple outputs                                like structure that enables the owner or other authorized
                                                                      users to find specific content and manipulate it for specific
            Figure 2: Production process using XML                    publishing applications.




Page 2                                                                                                           XML and content strategy
                                                                                                Why and how to “future-proof” your content
around the phrase Homo sapiens that indicate “these words
 Data in the content management                                  are genus and species – put them in italics, and remember to
                                                                 make an index entry for this term.” In an anthropology book,
 systems are heavily tagged with                                 you might want to distinguish between Homo sapiens and
 metadata so users can get optimal                               other species such as Homo erectus, and treat both species as
 search results despite the multiple                             index sub-entries under the genus Homo. In that case, you’d
                                                                 put a pair of tags around Homo indicating “this is a genus”,
 original sources of the material.                               and a tag around either sapiens or erectus indicating “this
                                                                 is a species.” Instructions for constructing the index would
                                                                 complete the picture.
Content repositories can be critical in highly regulated areas
such as medicine. Larry McGrew, head of content and editorial
                                                                 The previous paragraph took 186 words to discuss how to
operations at Aetna, relies on multiple content management
                                                                 treat genus and species in a DTD. Multiply this by the many
systems with carefully approved material to populate
                                                                 editorial, functional, design, and marketing considerations in
Aetna’s sites that are central to their members’ experience.
                                                                 any one publication, and then multiply it again by the range
McGrew admits that this has been “extremely challenging”
                                                                 of publications you hope to represent with a single DTD. The
to implement.
                                                                 considerations become massive, and the temptation might
                                                                 be to skimp on the detail of the DTD (for instance, coding for
The DTD
                                                                 genus and species together, rather than separately). This might
The Document Type Definition (DTD)—the very rough                be a false economy, though. Nina Chang, senior publisher for
equivalent of type specifications for print products—specifies   e-journals at Lippincott Williams & Wilkins, points out,
both how an element will look in print, on the web, on e-book
readers, etc., and, to some extent, what the element means.      Richly tagged data allow for more
DTDs need to code both data and metadata.
                                                                 precise searching.
To explain how a DTD functions, look at the different tagging
                                                                 In STM and scholarly publishing, searchers want to retrieve
possibilities for how genus and species might be handled
                                                                 the information that really matters, so the detail of the DTD
depending on the media and application. For instance, we
                                                                 is important to the perception of quality. It’s helpful to refine
assume that readers of this white paper belong to the species
                                                                 the DTD as much as possible before implementation.
Homo sapiens. It is probably sufficient therefore to surround
Homo sapiens with XML tags that mean “put these words in
italics no matter what other appearance specifications you
have.” But in a zoology book, you might want to put each          One approach is to start with a DTD
genus/species into the index. In that case, you could put tags    that is already in the public domain.

  The Document Type Definition                                   As Schafer points out, “If we choose to introduce a new
  (DTD)—the very rough equivalent                                element, we have to take it to a supplier support data team
  of type specifications for print                               to ensure that it’s implemented across all of our journals.”
                                                                 And Chang of LWW points out that changing the DTD has
  products—specifies both how an                                 implications for archival data as well. For instance, do you go
  element will look in print, on the web,                        back and insert new tags to keep up with the functionality
  on e-book readers, etc., and, to some                          of new material? This requires a business decision: What
                                                                 are the changes worth to the users, compared with the
  extent, what the element means.
                                                                 inevitable costs?




XML and content strategy                                                                                                    Page 3
Why and how to “future-proof” your content
of career-oriented pressures that impel them to comply with
 Vendors that have developed and                                      constraints that authors of journal articles will accept. Still,
                                                                      over time elementary-high school and higher education
 worked with DTD’s in the past have a                                 publishers have begun to implement DTD’s, which in turn
 pragmatic knowledge of what works                                    offer them flexibility. Not only can they put content on
 well for their customers, and they also                              multiple platforms to meet student and school district needs
 have staff with backgrounds to steer                                 but also they can customize the content of publications. This
                                                                      may be one reason why most educational publishers seem
 skillfully through the complexities.                                 fairly confident of their ability to meet the idiosyncratic social
                                                                      science requirements of the single largest school district (ie,
                                                                      the Texas School Board) while continuing to publish their
At large publishing organizations, developing a sufficiently
                                                                      books for the rest of the country.
powerful and flexible DTD is a challenge. As we discussed
earlier, it is not enough to catalog all of the type specifications
                                                                      Custom publishers are another category that has found XML
that might be needed. A team building the DTD also needs
                                                                      to be an invaluable asset to their business, as seen in the
to consider whether to define specific kinds of information
                                                                      Case Study.
and to what degree of detail, and they also need to define
the metadata required for their own use and for the use of
                                                                      ONIX: A specialized DTD for book metadata
current and future third parties.
                                                                      For people in the publishing industry, ONIX (ONline
One approach is to start with a DTD that is already in the            Information eXchange) is perhaps the most familiar example
public domain. For instance, Colson of ARVO has twice used            of a DTD for metadata.
the DTD developed by the National Library of Medicine as the
basis for an organizational DTD:                                      ONIX is used extensively in the book trade as a standardized
                                                                      means of communicating information about books—from
[The DTD from the National Library of Medicine]                       author and title to weight per copy, minimum order quantity,
                                                                      subject classification, and so forth. These data then populate
is comprehensive—it works for books,
                                                                      everything from the publisher’s own Website (for instance,
Annual Meeting abstracts, and all of our                              the one maintained by Elsevier’s Sawabini) to industry giants
other publications.                                                   such as Amazon and Barnes & Noble.

Colson even used this DTD when she worked at American
Geophysical Union (AGU), even though AGU content had little              Case Study
if any relationship to medicine, because the structure worked
                                                                         Triangle Publishing Services, Inc., prepares publications
effectively for other types of scholarly content.
                                                                         for technology companies like Microsoft, Cisco, and
                                                                         Hewlett-Packard. In some cases, Triangle has prepared
Another approach is to contract with a trusted vendor.                   all the content in a book so that it can be repurposed.
Vendors that have developed and worked with DTD’s in the
past have a pragmatic knowledge of what works well for                   For example, a book with chapters on applications in
                                                                         a dozen different industries can be disaggregated into
their customers, and they also have staff with backgrounds
                                                                         a dozen different white papers for distribution online.
to steer skillfully through the complexities. Outside vendors            Or, by searching on XML tags, the book’s case studies
can do their future-oriented work freeing up in-house staff              can be extracted and used in other settings.
to manage day-to-day operations. And a good outside vendor
can also help train staff to understand the new DTD and/or a             Larry Marion, CEO and Editorial Director at Triangle,
new, XML-oriented workflow.                                              says this about taking advantage of the power of
                                                                         XML:
A large proportion of scholarly journals, with their tightly             Think about how you want to repurpose content; be
structured, relatively brief units of copy, have migrated with           as creative and granular as possible. Extra work at the
reasonable success to XML. Books have been harder because                beginning can save you pain down the road.
they are more varied, and authors often don’t have the kind




Page 4                                                                                                           XML and content strategy
                                                                                                Why and how to “future-proof” your content
In fact, if you need to understand how XML refers to types of
content and not their appearance, take a look at the display of       Data conversions are typically done
any particular title on Amazon, and then on Barnes & Noble.           by production vendors, with their
Author, title, publisher’s description, and the like look entirely
different, yet they contain precisely the same information.
                                                                      in-depth knowledge of publishing
                                                                      workflows and outputs.
Other industries and disciplines have their own specialized
metadata sets, as well.
                                                                     display, search, and the like. Similarly, links to tables and
Implementation                                                       illustrations might or might not be captured.

In some parallel universe, management might be able to
                                                                     Another challenge is that conversions may not capture
send out a memo one Friday afternoon announcing a new
                                                                     important metadata (“this is a chapter, not a scholarly paper”)
production workflow that starts the following Monday
                                                                     because the metadata simply don’t exist in the original
morning. In this world, however, it isn’t that simple. Employees
                                                                     material. Either the original publisher provides the metadata
may need to perform different tasks, or they may perform
                                                                     retrospectively, or the new party provides the metadata using
the same tasks in different sequence. Managers need to
                                                                     their best, potentially fallible judgment.
assess performance using different metrics. Suppliers need to
accept input that looks different and generate different kinds
                                                                     Building capacity for end-to-end XML requires an organization
of output, with possible changes in schedules, prices, and
                                                                     to commit staff resources, time on the calendar, and financial
quality management. For a publisher, all of this needs to take
                                                                     resources. Realistically, not every publisher can muster all
place while products already in the pipeline move through
                                                                     three kinds of resources conveniently.
the previous workflow, or some hybrid.
                                                                     Data conversions are typically done by production vendors,
                                                                     with their in-depth knowledge of publishing workflows and
  The programmatic approach, however,
                                                                     outputs.
  can miss or misinterpret improvised
  or last-minute changes.                                            Another approach is to leave file conversions to the aggregator,
                                                                     e-book platform, etc. that wants to use the data. These
                                                                     companies typically do a good job of ensuring that the XML
XML on the fly                                                       they generate is effective for their application, but if another
                                                                     vendor approaches the publisher, the process needs to be
Sometimes, an information provider will need to produce
                                                                     repeated at the cost of more money and more time.
XML hastily. For instance, a content provider may be switching
publishers or may be wishing to digitize back file content, or
                                                                     Time for XML?
work with a new third party aggregator.
                                                                     For the foreseeable future, information is going to flow into
In these situations, publishers need to convert existing data.       and through multiple platforms— from books, magazines,
With typesetting files in hand, a conversion vendor can read         and newspapers to websites, e-book readers, mobile devices,
the typesetting codes (for instance, “Heading 1”) and change         and inventions that are only sketches on a white board right
them to XML tags, for the most part programmatically. For            now. Authorities agree that XML provides the most effective
instance, if someone sees at the last minute that a “1” head         way to cope with the multiple and shifting demands. Colson
really should have been a “2” head, that person might not            of ARVO says it well:
change the typesetting code but might simply alter the type
characteristics to look like a “2” head. The XML coding will         Don’t be afraid of XML. Using XML will give you
continue to treat the heading as a “1” head, with potential          more versatility than any scheme I’m aware of.
implications for the quality of the applications such as Web




XML and content strategy                                                                                                       Page 5
Why and how to “future-proof” your content
The Contributors                                                 The Authors
Special thanks to the following individual contributors:         •	   Rich Lampert

•	   Nina Chang, Senior Publisher, Online Journals, Lippincott        The Lampert Consultancy
     Williams & Wilkins                                               www.lampert-consultancy.net
                                                                      Rich Lampert is owner of The Lampert Consultancy, LLC,
•	   Karen Colson, Director, Publishing and Communications,           established in 2004 to provide strategic, editorial, and
     Association for Research in Vision and Ophthalmology             marketing services to publishers in STM, professional,
•	   Mark Gaertner, Senior Web Producer, Team Lead,                   and scholarly publishing. Rich is also, Principal, Publishing
     BMStudio at Bristol-Myers Squibb                                 Services Division, at Doody Enterprises, Inc., which focuses
                                                                      on not-for-profit publishers.
•	   Larry Marion, CEO/Editor-in-Chief, Triangle
     Publishing Services                                         •	   Cara Kaufman

•	   Larry McGrew, Head, Content/Editorial                            Kaufman-Wills Group
     Operations, Aetna                                                www.kaufmanwills.com
                                                                      Cara Kaufman is co-founder of Kaufman-Wills Group,
•	   Julia Sawabini, Web Marketing Director, Elsevier
                                                                      LLC, which was created in 2000, to offer STM and other
•	   Phil Schafer, Director, Journal Production, Elsevier             scholarly publishers a full range of professional publishing
                                                                      services in the areas of strategic planning, business
                                                                      development, electronic publishing strategy, RFP and
                                                                      self-publishing projects, editorial services, and marketing
                                                                      and market research.
                                                                      SPi sought the help of Kaufman-Wills Group in developing
                                                                      this white paper.




Page 6                                                                                                       XML and content strategy
                                                                                            Why and how to “future-proof” your content

Weitere ähnliche Inhalte

Empfohlen

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Empfohlen (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

XML and Content Strategy

  • 1. XML and content strategy Why and how to “future-proof” your content Publishers and other information providers increasingly By far, the most practical, most versatile tool for manipulating use multiple media to display their content for various content for both current formats and those not invented yet is applications. Books become e-books, online journal articles XML. XML (eXtensible Markup Language) is an open standard; are published online first and in print later, and figures are its power derives from the fact that XML has been adopted by aggregated into image databases. Users request chunks of entire industries, many government agencies, and platform content, or the publisher assembles pieces of content from developers. When new standards emerge, such as EPUB for multiple publications into a new publication. Information e-book readers, the standards are derived from generic XML, users want what they want, when they want, in the form they allowing even files created a few years hence to flow readily want. As publishers work to respond to the changing needs into the new standard. of their constituencies, the challenge is: how can publishers “future-proof” their content? By far, the most practical, most Even today, content takes many forms and has many uses. versatile tool for manipulating Publishers find that they need to adapt their content in various ways (figure 1). content for both current formats and those not invented yet is XML. Sampling Some organizations think of XML as a different set of tags. While XML tags are different from those used in other • Web-ready HTML on proprietary platforms systems like SGML or HTML, XML is actually a different way of • HTML for web previewing thinking about content. Karen Colson, director of publishing • PDF for printing/viewing/downloading and communications at the Association for Research in Vision • Distribution by third-party aggregators (Ovid, EBSCO) and Ophthalmology (ARVO) explains it simply, • Abstract & indexing services (Scopus) • Mobile devices (iPad, smartphones) XML describes content, not appearance. • Archival solutions (Portico) An XML tag (actually, a pair of tags—one at the beginning and one at the end of an element) might indicate that a section Figure 1: Sampling of data output of copy is a first-level heading inside a book chapter. The actual appearance of the heading, however, is determined by If today’s situation is not complicated enough, the future a different style sheet for each application. The typeface and is likely to be even more complex. How can information size that appears in the book might be completely different if providers respond to the changing needs of customers and the book is available on an e-reader, and it might be different new technologies with greater facility in terms of time, cost, still if the book is included on the electronic platform of a and effort? third-party aggregator. SPi Global 2807 North Parham Road, Suite 350, Richmond, VA 23294 T 1 804 262 4219 www.spi-global.com
  • 2. The tag for a first-level heading also can function as metadata. For instance, a book’s table of contents might be constructed Organizations that want to get the by copying chapter titles and first-level headings. Or, perhaps an aggregator’s general search function could look primarily most out of XML apply it consistently at first-level headings. In either case, a pair of tags that starts and as early as possible in the content out regulating appearance can have multiple programmatic development process. applications as well. Organizations that want to get the most out of XML apply When an error occurs, the correction is made in the native it consistently and as early as possible in the content XML file so that the error can be corrected in every product development process. When this happens, editing changes that flows from the content. Making corrections in the native are captured within a single, authoritative XML file, all XML XML file represents the industry’s best practice, but practical files are built according to the same rules, and the final challenges exist even with this approach. XML file is the source for all types of output. Creating this capability requires thoughtful planning and technically astute Julia Sawabini, director of e-commerce at Elsevier, explains implementation. that to build the web page for a particular product, Elsevier pulls content from a database containing fields variously Planning for end-to-end XML Workflow populated by editorial, production, and marketing people. The most reliable and powerful way to apply XML to The information is organized via style sheets but no content documents is to do so at the very beginning of the production is created at this point. “If there’s something wrong on cycle. In organizations where content is created by employees, the website, it’s wrong someplace along the way. I can’t the content creator may enter tags, often using shortcuts or change it.” templates. For most publishers, however, tags are applied by skilled markup operators based on the list of tags available Once a correction is made, the change may not appear to them (more on this below). Most markup operators work immediately, as the website is updated in batches at specified for compositors, so their function sometimes overlaps intervals. The incorrect product information will appear on the with typesetting. But markup is a distinct function in the site until the update takes place. Also, the incorrect material production process. Once the tags are applied, production will remain on the servers of distributors, e-bookstores, and can proceed (figure 2). other outlets for the information unless corrected files are sent and uploaded. XML markup An analogous challenge occurs in publishing printed materials. Sometimes a production person spots an error while Copyediting processing a PDF for the printer. The temptation, and often the reality, is that the production person corrects the PDF and sends it on to the printer, breathing a sigh of relief. Unless Typsetting the production manager remembers to go back to make the same correction, the error still exists in the XML file. Page layout Implicit in this discussion is the notion that XML workflow Proofreading includes an element that is rarely critical in a single- medium product—what director of production at Elsevier Phil Schafer describes as “a central content repository with Content Repository full functionality.” It is not enough to save all content to a particular server. Ideally, the content will flow into a database- Multiple outputs like structure that enables the owner or other authorized users to find specific content and manipulate it for specific Figure 2: Production process using XML publishing applications. Page 2 XML and content strategy Why and how to “future-proof” your content
  • 3. around the phrase Homo sapiens that indicate “these words Data in the content management are genus and species – put them in italics, and remember to make an index entry for this term.” In an anthropology book, systems are heavily tagged with you might want to distinguish between Homo sapiens and metadata so users can get optimal other species such as Homo erectus, and treat both species as search results despite the multiple index sub-entries under the genus Homo. In that case, you’d put a pair of tags around Homo indicating “this is a genus”, original sources of the material. and a tag around either sapiens or erectus indicating “this is a species.” Instructions for constructing the index would complete the picture. Content repositories can be critical in highly regulated areas such as medicine. Larry McGrew, head of content and editorial The previous paragraph took 186 words to discuss how to operations at Aetna, relies on multiple content management treat genus and species in a DTD. Multiply this by the many systems with carefully approved material to populate editorial, functional, design, and marketing considerations in Aetna’s sites that are central to their members’ experience. any one publication, and then multiply it again by the range McGrew admits that this has been “extremely challenging” of publications you hope to represent with a single DTD. The to implement. considerations become massive, and the temptation might be to skimp on the detail of the DTD (for instance, coding for The DTD genus and species together, rather than separately). This might The Document Type Definition (DTD)—the very rough be a false economy, though. Nina Chang, senior publisher for equivalent of type specifications for print products—specifies e-journals at Lippincott Williams & Wilkins, points out, both how an element will look in print, on the web, on e-book readers, etc., and, to some extent, what the element means. Richly tagged data allow for more DTDs need to code both data and metadata. precise searching. To explain how a DTD functions, look at the different tagging In STM and scholarly publishing, searchers want to retrieve possibilities for how genus and species might be handled the information that really matters, so the detail of the DTD depending on the media and application. For instance, we is important to the perception of quality. It’s helpful to refine assume that readers of this white paper belong to the species the DTD as much as possible before implementation. Homo sapiens. It is probably sufficient therefore to surround Homo sapiens with XML tags that mean “put these words in italics no matter what other appearance specifications you have.” But in a zoology book, you might want to put each One approach is to start with a DTD genus/species into the index. In that case, you could put tags that is already in the public domain. The Document Type Definition As Schafer points out, “If we choose to introduce a new (DTD)—the very rough equivalent element, we have to take it to a supplier support data team of type specifications for print to ensure that it’s implemented across all of our journals.” And Chang of LWW points out that changing the DTD has products—specifies both how an implications for archival data as well. For instance, do you go element will look in print, on the web, back and insert new tags to keep up with the functionality on e-book readers, etc., and, to some of new material? This requires a business decision: What are the changes worth to the users, compared with the extent, what the element means. inevitable costs? XML and content strategy Page 3 Why and how to “future-proof” your content
  • 4. of career-oriented pressures that impel them to comply with Vendors that have developed and constraints that authors of journal articles will accept. Still, over time elementary-high school and higher education worked with DTD’s in the past have a publishers have begun to implement DTD’s, which in turn pragmatic knowledge of what works offer them flexibility. Not only can they put content on well for their customers, and they also multiple platforms to meet student and school district needs have staff with backgrounds to steer but also they can customize the content of publications. This may be one reason why most educational publishers seem skillfully through the complexities. fairly confident of their ability to meet the idiosyncratic social science requirements of the single largest school district (ie, the Texas School Board) while continuing to publish their At large publishing organizations, developing a sufficiently books for the rest of the country. powerful and flexible DTD is a challenge. As we discussed earlier, it is not enough to catalog all of the type specifications Custom publishers are another category that has found XML that might be needed. A team building the DTD also needs to be an invaluable asset to their business, as seen in the to consider whether to define specific kinds of information Case Study. and to what degree of detail, and they also need to define the metadata required for their own use and for the use of ONIX: A specialized DTD for book metadata current and future third parties. For people in the publishing industry, ONIX (ONline One approach is to start with a DTD that is already in the Information eXchange) is perhaps the most familiar example public domain. For instance, Colson of ARVO has twice used of a DTD for metadata. the DTD developed by the National Library of Medicine as the basis for an organizational DTD: ONIX is used extensively in the book trade as a standardized means of communicating information about books—from [The DTD from the National Library of Medicine] author and title to weight per copy, minimum order quantity, subject classification, and so forth. These data then populate is comprehensive—it works for books, everything from the publisher’s own Website (for instance, Annual Meeting abstracts, and all of our the one maintained by Elsevier’s Sawabini) to industry giants other publications. such as Amazon and Barnes & Noble. Colson even used this DTD when she worked at American Geophysical Union (AGU), even though AGU content had little Case Study if any relationship to medicine, because the structure worked Triangle Publishing Services, Inc., prepares publications effectively for other types of scholarly content. for technology companies like Microsoft, Cisco, and Hewlett-Packard. In some cases, Triangle has prepared Another approach is to contract with a trusted vendor. all the content in a book so that it can be repurposed. Vendors that have developed and worked with DTD’s in the past have a pragmatic knowledge of what works well for For example, a book with chapters on applications in a dozen different industries can be disaggregated into their customers, and they also have staff with backgrounds a dozen different white papers for distribution online. to steer skillfully through the complexities. Outside vendors Or, by searching on XML tags, the book’s case studies can do their future-oriented work freeing up in-house staff can be extracted and used in other settings. to manage day-to-day operations. And a good outside vendor can also help train staff to understand the new DTD and/or a Larry Marion, CEO and Editorial Director at Triangle, new, XML-oriented workflow. says this about taking advantage of the power of XML: A large proportion of scholarly journals, with their tightly Think about how you want to repurpose content; be structured, relatively brief units of copy, have migrated with as creative and granular as possible. Extra work at the reasonable success to XML. Books have been harder because beginning can save you pain down the road. they are more varied, and authors often don’t have the kind Page 4 XML and content strategy Why and how to “future-proof” your content
  • 5. In fact, if you need to understand how XML refers to types of content and not their appearance, take a look at the display of Data conversions are typically done any particular title on Amazon, and then on Barnes & Noble. by production vendors, with their Author, title, publisher’s description, and the like look entirely different, yet they contain precisely the same information. in-depth knowledge of publishing workflows and outputs. Other industries and disciplines have their own specialized metadata sets, as well. display, search, and the like. Similarly, links to tables and Implementation illustrations might or might not be captured. In some parallel universe, management might be able to Another challenge is that conversions may not capture send out a memo one Friday afternoon announcing a new important metadata (“this is a chapter, not a scholarly paper”) production workflow that starts the following Monday because the metadata simply don’t exist in the original morning. In this world, however, it isn’t that simple. Employees material. Either the original publisher provides the metadata may need to perform different tasks, or they may perform retrospectively, or the new party provides the metadata using the same tasks in different sequence. Managers need to their best, potentially fallible judgment. assess performance using different metrics. Suppliers need to accept input that looks different and generate different kinds Building capacity for end-to-end XML requires an organization of output, with possible changes in schedules, prices, and to commit staff resources, time on the calendar, and financial quality management. For a publisher, all of this needs to take resources. Realistically, not every publisher can muster all place while products already in the pipeline move through three kinds of resources conveniently. the previous workflow, or some hybrid. Data conversions are typically done by production vendors, with their in-depth knowledge of publishing workflows and The programmatic approach, however, outputs. can miss or misinterpret improvised or last-minute changes. Another approach is to leave file conversions to the aggregator, e-book platform, etc. that wants to use the data. These companies typically do a good job of ensuring that the XML XML on the fly they generate is effective for their application, but if another vendor approaches the publisher, the process needs to be Sometimes, an information provider will need to produce repeated at the cost of more money and more time. XML hastily. For instance, a content provider may be switching publishers or may be wishing to digitize back file content, or Time for XML? work with a new third party aggregator. For the foreseeable future, information is going to flow into In these situations, publishers need to convert existing data. and through multiple platforms— from books, magazines, With typesetting files in hand, a conversion vendor can read and newspapers to websites, e-book readers, mobile devices, the typesetting codes (for instance, “Heading 1”) and change and inventions that are only sketches on a white board right them to XML tags, for the most part programmatically. For now. Authorities agree that XML provides the most effective instance, if someone sees at the last minute that a “1” head way to cope with the multiple and shifting demands. Colson really should have been a “2” head, that person might not of ARVO says it well: change the typesetting code but might simply alter the type characteristics to look like a “2” head. The XML coding will Don’t be afraid of XML. Using XML will give you continue to treat the heading as a “1” head, with potential more versatility than any scheme I’m aware of. implications for the quality of the applications such as Web XML and content strategy Page 5 Why and how to “future-proof” your content
  • 6. The Contributors The Authors Special thanks to the following individual contributors: • Rich Lampert • Nina Chang, Senior Publisher, Online Journals, Lippincott The Lampert Consultancy Williams & Wilkins www.lampert-consultancy.net Rich Lampert is owner of The Lampert Consultancy, LLC, • Karen Colson, Director, Publishing and Communications, established in 2004 to provide strategic, editorial, and Association for Research in Vision and Ophthalmology marketing services to publishers in STM, professional, • Mark Gaertner, Senior Web Producer, Team Lead, and scholarly publishing. Rich is also, Principal, Publishing BMStudio at Bristol-Myers Squibb Services Division, at Doody Enterprises, Inc., which focuses on not-for-profit publishers. • Larry Marion, CEO/Editor-in-Chief, Triangle Publishing Services • Cara Kaufman • Larry McGrew, Head, Content/Editorial Kaufman-Wills Group Operations, Aetna www.kaufmanwills.com Cara Kaufman is co-founder of Kaufman-Wills Group, • Julia Sawabini, Web Marketing Director, Elsevier LLC, which was created in 2000, to offer STM and other • Phil Schafer, Director, Journal Production, Elsevier scholarly publishers a full range of professional publishing services in the areas of strategic planning, business development, electronic publishing strategy, RFP and self-publishing projects, editorial services, and marketing and market research. SPi sought the help of Kaufman-Wills Group in developing this white paper. Page 6 XML and content strategy Why and how to “future-proof” your content