Abstract:
Converting existing content into DITA can help you reap the benefits of your new DITA documentation system and reach your target return on investment (ROI) more quickly. A quality conversion can also provide your authors with a solid foundation and benchmark from which to create new content, based on familiar, pre-existing material. In this webinar, Yehudit Lindblom and Joe Gelb from Suite Solutions will discuss the conversion process and steps to prepare specifications for your legacy conversion:
Develop mapping rules between the current document elements and new DITA tagging structures, including specifications for migrating conditional tagging (profiling) and variables.
Develop pre- and post-conversion checklists.
Use the conversion process to help understand the evolution between your current documentation and the new DITA XML content model.
2. Who are we?
Yehudit Lindblom
• Project Manager and cat herder
Joe Gelb
• Founder and President of Suite Solutions
Suite Solutions
Our Vision: Enable you to engage your customers by providing quick access to
relevant information: DITA provides the foundation
• Help companies get it right the first time
• XML-based Authoring/Publishing Solutions
• Enterprise Intelligent Dynamic Content: SuiteShare Social KB
• Consultancy, Systems Integration, Application Development
• Cross-Industry Expertise
• High Tech, Aerospace & Defense, Discrete Manufacturing
• Healthcare, Government
3. Main Topics
Goals of this webinar
Key components of a DITA solution
Why do a conversion?
Review the process
Defining your requirements
4. Goals of this Webinar
Primary Goal: Empower (not overwhelm) you
• Understand the process, details and dependencies involved
• Understand the possibilities
• Build a solid plan based on experience
• Schedule accordingly
• Understand the skills required and the help you should seek
• Manage expectations
5. Key Components of a DITA Solution
1.
2.
3.
4.
5.
Staff
Content
Translation
Publishing
Content and configuration management
Your mission is to develop or acquire each of these
6. Why should you do a professional,
automated conversion?
1. Quickly attain a large sampling of valid DITA to configure and test
your toolset: authoring, publishing, CMS
2. Reap the benefits of your DITA investment and hit your target return
on investment (ROI ) more quickly
3. Provide authors with a solid foundation and benchmark from which
to create new content, based on familiar pre-existing material
4. Converting content manually is grueling and time-consuming
• May burn-out your authors
• Adds even more pressure on production schedules
• Your authors may be inexperienced with DITA and their lack of
expertise will tinge large amounts of content and sow
inconsistency right from the beginning…
7. Why should you do a professional, automated
conversion?
“Having done this from the other side, it was very helpful for us to see our
content, which we were familiar with, already set up in DITA format and
publishing.”
- Quote from Yours truly…
• Burning out the authors: Because it’s so mind-numbing, it winds up
taking much longer than it should and affects the scheduling.
• Politics: show good progress in a “reasonable” timeframe.
• “What? We just spent $@@@,### on this system and you can’t even
show me a PDF with our current stuff?”
8. What formats can you convert from?
In order of increasing difficulty…
1. Docbook
2. FrameMaker (structured)
3. FrameMaker (un-structured)
4. Word, RTF (remember Winhelp?)
5. HTML
• Web content
• HTML Help
• Webhelp (e.g. Robohelp)
6. Adobe InDesign
7. PDF
9. High-Level Conversion Process
1. Prepare
• Requirements: based on content audit and information architecture
• Pre-conversion checklist – preparing content for conversion
• Post-conversion checklist – “clean-up”
• Representative content sample: first large batch for conversion
2. Configure conversion tools
3. Convert first batch
• Run through pre-conversion checklist
• Analyze and generate conversion topic list
• Review topic list
• Convert
• Review, feedback, tweak tools, re-convert
• Run through post-conversion checklist
4. Convert additional content according to a schedule
10. Conversion Process
•
•
•
•
Compile the most representative sample you can. Otherwise, you risk
missing elements that need to be added later.
For example:
• Include content from all the various templates in use
• Include all possible test elements you will need to convert
Involve your style sheet developer: consider order of elements,
outputclasses
Involve your CMS vendor: understand how the tool deals with
metadata, conditionalization, variables, etc.
Schedule:
• It can take 1-2 months to get first set of content properly converted
• Subsequent conversion batches can go very quickly
• Schedule migrations as needed, allowing enough time to freeze
changes to the content and go through the conversion process
11. Conversion Requirements
Output structure and file conventions
• Which DTDs to reference
• Which encoding: UTF-8?
• Folder structure of the output
• File naming convention, for example:
• Use lower case for filenames
• Replace all spaces and non-alphanumeric characters with
underscores
• Indicate file-type/topic-type with the first letter in the filename
• Use .dita or .ditamap extensions, or .xml
12. Conversion Requirements
Mapping content elements to DITA tagging
• Map paragraph and character styles to tagging structures
Examples of mapping rules:
• Short description: first paragraph that is not a list item
• Tasks
• Tagging of context, pre-requisite, step info, etc.
• Only one task to be included per topic. If multiple tasks are
encountered within the same topic, wrap the second task/procedure
in <postreq><required-cleanup>
• If there is a <para> with “Example” text as an inline heading, use the
<example> element
• Do not insert empty <info> or other empty tags in the content
13. Conversion Requirements
Graphics
•
•
Create linked files from embedded graphics
• Determine naming convention
Graphic formats
• JPG, GIF, PNG: generally fine as-is
• BMP – generally converted to PNG
• EMF – often converted to SVG
• EPS – consider that you will need special processing in the style
sheets to handle
PDF: Antenna House + GhostScript
HTML: auto conversion to PNG
• TIFF – OK for PDF, but require handling for HTML formats, since
not all browsers support TIFF viewing out of the box
14. Conversion Requirements
Graphics
•
Types that need special processing
• Callouts
Convert to SVG?
Use numbered callouts with a list of labels underneath the graphic
• Hotspots
• Visio
15. Conversion Requirements
Publications / Maps
•
•
•
•
•
•
•
Maps versus bookmaps
Modularizing maps
• Creation of submaps for chapters, parts, appendices, etc.
• Use of maprefs versus topicrefs versus conrefs to link submaps
Creation of container topics for chapters, parts, etc.
Insertion of front-matter, back-matter, titles, metadata
List of tables, figures
Glossary
Index
17. Conversion Requirements
Other Requirements
Variables
• Auto-generation of conrefs
• Special tagging for CMS, e.g. @varref for SDL Live Content
Index entries
• Whether to leave inline or move to prolog/metadata/keywords
Context sensitivity
• Retain markers or other tagging used to produce context sensitive help
• Which element to code them in: <data>, <resourceid>?
Cross-references and relationship tables
• Copy all or some xrefs automatically to relationship tables
18. Conversion Requirements
Mapping
Example of style mapping:
PARAGRAPHS
p
<p>
Para
<p>
Body
<p>
BodyFirst
<p>
this tag has special formatting, may need to add outputclass
Para1Indent
<p>
2nd level paragraph
Indented
<p>
2nd level paragraph
Indented2
<p>
3rd level paragraph
Para2Indent
<p>
3rd level paragraph
Preface
<title>
heading 1
<title>
Head1
<title>
heading 2
<title>
Head2
<title>
heading 3
<title>
TITLES
heading level 2 always starts a new topic
heading level 3 starts a new topic for tasks only
19. Pre-conversion Checklist
Preparing the content for conversion
•
•
•
•
Some changes are best done before the conversion, some can be done
later
Check style usage
• Example: a heading styled as “Body” will not convert as a title, or
trigger separation into a separate topic
Apply character styles to get semantic domain tagging
• uicontrol, wintitle, filenames
• Example rule: all bold text should become a uicontrol
Other examples:
• Verify there is only one set of steps in each task
• Remove stem sentences that precede steps; in DITA the stem
sentences may be awkward as part of the <context> element
• If a task contains an ordered list that are not steps, assign it a different
style than the style used for steps
20. Pre-conversion Checklist
Reality Check:
• The content will need work to get into the proper structure
• Time spent preparing for the conversion will have huge impact on the
rest of your process and time to achieve production-ready results
• The conversion process essentially infers structure based on styles –
paragraph and character
• Be rigorous with the styles used in the input documents
• It is irritating grunt work to check and re-apply styles, but it will
save huge effort later
21. Conversion Topic List
During the first step, a topic list is generated for the conversion batch
• Original File Name – source file for the topic
• Heading Level – level 1 is a chapter level topic
• Topic filename – auto-generated, may be changed
• Topic title – the title taken directly from the heading
• Standalone (Y/N) – by default, each heading is converted into its own
topic. You may choose to combine concept or reference topics
• Topic type –verify the topic type and change if necessary
22. Conversion Topic List
•
•
•
•
•
A topic-type must be provided for each topic
The conversion tool suggests a topic-type based on heuristics
The list should be reviewed and modified where necessary
The conversion is executed based on the conversion topic list
You can use the topic list to insert index entries, metadata, navtitles,
etc.
23. Conversion Output
Deliverables:
• For each document, you get back:
• Valid bookmap file with all submaps
• Valid DITA XML files
• Original source images referenced in the XML files
24. Post-conversion Checklist
Examples:
• Edit top-level maps or bookmaps to include front matter, back matter,
and metadata as needed.
• Review content tagged with <required-cleanup>
• Fix any cross references that the conversion was unable to resolve
• Import content to your CMS, publish, and review the converted content
25. Conversion Schedule
•
Develop a plan and schedule for migration
• Freeze content
• Apply pre-conversion checklist
• Send batch for conversion
• Review conversion topic list
• Convert
• Apply post-conversion checklist
• Import to CMS and publish
26. Keep in Touch! Let us know how we can
help you.
For additional information, contact:
Yehudit Lindblom
Joe Gelb
solutions@suite-sol.com
U.S. Office
(609) 360-0650
EMEA Office
+972-2-993-8054
www.suite-sol.com
Follow us on Linked-In
http://www.linkedin.com/company/527916