SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Converting Unstructured Docs to
        XML/DITA/ePub

Mark Gross          Linda Morone
Background of Data Conversion Laboratory


 30 years of experience providing electronic document conversion
 services meeting the needs of technology…today & in the future

       • More than 1 billion pages converted to date
       • US Based project management team
       • Global capabilities
       • Transform legacy & future documents
       • From any format to any format
       • Specialize in complex projects
       • Identify redundant data for content reuse
       • Employ a proven automated process
       • Quality Assurance service is standard in all projects
       • Additional services include consulting, composition & transcription &
         translation


                               (Confidential)   2
Serving All Industries

              • Publishers
              • Government
              • Defense
              • Life sciences
              • Automotive
              • Aerospace
              • Heavy and Industrial Equipment
              • Financial Services
              • Manufacturing
              • Computing
              • Utilities
              • Semiconductors
              • Telecommunications


                             (Confidential)   3
Serving a Broad Client Base




                          (Confidential)   4
Converting Legacy Data … Is it Worth the Expense?




             • Comply with regulations
             • Match Industry standards
             • Meet customer expectations & needs
             • Support internal departments
             • Expand into new markets
             • Multi-purpose content




                           (Confidential)   5
Legacy Conversion: Fact or Fiction




         Client’s Perception                    Reality
     •    Painful Process                       •   Expertise & Planning
     •    Complex                               •   QC & Automation
     •    Expensive                             •   Guaranteed Results
     •    Drain on Resources                    •   Low Costs




                           (Confidential)   6
So … Which Format do you Choose



ePUB and Rendering-Focused DTDs               NLM and Publishing DTDs
• Designed for e-readers & mobile             • Support traditional publishing
   devices                                    • Flexible open standard
• Freely available                            • Freely available
• Open standard                               • Human-readable format
• Adaptable to
    – Books                                   DITA and Module-Based DTDs
    – Documents                               • Designed for multi-purposing and
    – Manuals                                    content reuse
    – User guides                             • Topic based & modular
• Support for print publishing                • Supports
  requirements is limited                         – Multiple variants
                                                  – Multiple languages
                                                  – Context independent content


                         (Confidential)   7
The Story with ePub and Rendering-Focused DTDs




       •   ePub is an emerging standard used for most eReaders
       •   Mobi is also a large player, proprietary to Amazon Kindle
       •   ePub is an evolving standard
       •   ePub is supported differently by different eReaders
       •   There are no “Silver Bullets”
       •   eBooks are publications and need care in their production
       •   Not just novels; recent DCL survey shows 75% will be using
           eBooks for complex materials




                              (Confidential)   8
Things to Keep in Mind When Converting




                •   Smaller screen size
                •   Large tables may not fit
                •   Not all Character Sets supported by all devices
                •   MathML not currently supported




                            (Confidential)   9
OCR/Text Extraction




          Pitfalls of Text Extraction

          •   Special Characters
          •   Emphasis
          •   Ligatures
          •   Hyphens – Soft and Hard




                            (Confidential)   10
Handling of Objects Mid-Paragraph




                                                 Converting exactly per source
                                                    may lead to problems …




                           (Confidential)   11
Math as Images – Changing Font Size Doesn’t Change Images




                         (Confidential)   12
Unicode Symbols Will Adjust with the Font Size Change




                          (Confidential)   13
Large Tables


 Table as Text (searchable but cut off)              Table as Image




                               (Confidential)   14
When Layout Matters


        Testing Materials                         Poetry




                            (Confidential)   15
When Layout Matters (cont’d)


           Letter                               Recipe




                          (Confidential)   16
Some Notes on the Kindle




           •   Designed for reading long documents
           •   Designed for simplicity
           •   Has some features that others don’t
           •   But also missing some features that others have
           •   Therefore, need to design the conversion differently




                            (Confidential)   17
Glossary Definitions


         iPad screenshot                         Kindle screenshot




                           (Confidential)   18
Use of CSS “Float” Style


       iPad screenshot                           Kindle screenshot




                           (Confidential)   19
Use of Borders


         iPad screenshot                         Kindle screenshot




                           (Confidential)   20
Color/Spanning/Large Tables

          iPad screenshot                         Kindle screenshot




                            (Confidential)   21
The Story with NLM and Publishing DTDs




        •   Well-documented public domain standard.
        •   Well-tested on a wide variety of materials; designed for
            complex publishing.
        •   Originally designed with NIH support for Scientific, Technical,
            and Medical (STM) publications.
        •   Extended to be robust for many more uses; widely used in
            non-STM areas.
        •   DocBook and PRISM are other standard DTD’s; each with its
            own strengths – all designed for “print” publications.




                              (Confidential)   22
Choosing the Content to Convert




         Which content will be auto-generated?


                    •   TOC
                    •   Index
                    •   Labels
                    •   Titles
                    •   List of Table, Figures, etc.




                            (Confidential)   23
Capturing Items as Multiple Formats




                      Math as images and MathML
                      Tables as images and XHTML



                                                 <disp-formula id="FD1">
                                                 <mml:math id="M1" display='block'>
                                                 <mml:semantics>
                                                 <mml:mrow>
                                                 <mml:mi>L</mml:mi>
                                                 <mml:mo>&#x0003D;</mml:mo>
                                                 <mml:mo>&#x02211;</mml:mo>
                                                 <mml:mrow>
                                                 <mml:msub>
                                                 <mml:mrow>
                                                 <mml:mi>l</mml:mi></mml:mrow>
                                                 <mml:mi>i</mml:mi></mml:msub>
                                                 <mml:mo>&#x0002F;</mml:mo>
                                                 <mml:mi>N</mml:mi></mml:mrow>
                                                 <mml:mo>&#x0002E;</mml:mo></mml:mrow>
                                                 </mml:semantics></mml:math>
                                                 </disp-formula>




                           (Confidential)   24
Determining Data Elements




   Appearance Based:             Content Based:

      •   Alignment                  •       <email> - @
      •   Placement                  •       <uri> - www
      •   Point size                 •       <degrees> - PhD, MD, BA
      •   Font                       •       <fig> - Figure, Illustration, Chart, Scheme




                            (Confidential)      25
Granularity of Tagging: Front Matter




                            (Confidential)   26
Granularity of Tagging: Back Matter




            • Are the references Harvard or Numeric?
            • Is the author name last/first or first/last?
            • What is the placement of the year within the citation?
            • Is a comma or period used after the author names?




                               (Confidential)   27
The Story with DITA and Module-Based DTDs



             • Allows for modularization of your content with Topics,
               and easy re-use in multiple outputs
             • Pre-packaged & ready to use XML (almost)
             • Ready-to-go for techdocs (mostly)
             • Infrastructure included - taxonomy (DTD and
               schema); printing stylesheets; lots of tools
             • Printable with standard tools
             • Extensible with specializations
             • Further specializations for publishing, testing, and
               other specialized areas
             • Content-based
             • What do you when things don’t fit



                             (Confidential)   28
What Makes DITA Conversions Difficult


           “Getting there using DITA is like building with prefabricated modular
           components that can be quickly assembled into a suitable structure.”
                            - Doug Henschen, intelligententerprise.com




     • DITA is a conceptual departure from linear information – and is difficult
       for many to get used to

     • Turns the traditional book into a collection of Topics

     • Topics can be thought of as interchangeable parts
         – to be reassembled in multiple ways
         – to be repurposed for multiple outputs
         – to be reused across multiple products

     • …but your documents weren’t likely to have been designed to do this.




                                 (Confidential)   29
Structuring a Book into Topics in DITA

                   Book 3
                  Reference 4
                                                                                             Book A
   Book 1          Concept 1
                                                                                            Reference 1
  Concept 4       Reference 1
                                                   DITA Content                             Reference 2
    Task 3         Concept 3
                                                Management System
                   Concept 5                                                                Reference 3
  Reference 1
                    Task 2                                                                    Task 1
  Concept 2
                                            Concept 1         Reference 1                     Task 2
  Reference 5
                                            Concept 2         Reference 2
    Task 1
                 Book 4                     Concept 3         Reference 3
                                                                                             Book B
                Concept 2                   Concept 4         Reference 4
                                                                                             Concept 1
                  Task 1                    Concept 5         Reference 5
  Book 2                                                                                    Reference 1
                Concept 3
                                                        Task 1                                Task 1
  Concept 1       Task 2
                                                        Task 2                               Concept 2
 Reference 2    Reference 3
                                                                                            Reference 2
 Concept 5                                              Task 3
                                                                                              Task 3
   Task 3
   Task 2
   Task 1
                              “Getting there using DITA is like building with prefabricated modular
 Reference 5
                              components that can be quickly assembled into a suitable structure.”
  Concept 2
                                               – Doug Henschen, intelligententerprise.com

                                       (Confidential)    30
Further Complications in DITA Conversions


              •   There’s the usual conversion issues
                   – Accuracy of the transferred text
                   – Tables
                   – Math
                   – Special Characters

              •   There’s also the structuring issues
                   – Identifying topics
                   – Identifying reusable content

              •   And the people issues
                   – Deciding what needs re-authoring
                   – Getting used to a new “document” paradigm
                   – Getting rugged individualists to collaborate more




                               (Confidential)   31
Overview of Typical DITA Technical Conversion Issues

             •   Architectural constraints of DITA – the square pegs
                  – Multiple steps within a single task topic
                  – TaskProcedure authored as a table in the source
                  – Presence of untitled tasks/topics in the source
                  – References to page numbers (irrelevant cross-references)
                  – Having more than two levels of steps

             •   How your rendering system will handle XML
                  – Figures
                  – Steps

             •   Other conversion considerations:
                  – Hierarchy in Map Files
                  – Metadata in Map Files and Topics
                  – Index Terms
                  – Conditional Text
                  – Glossary Terms
                  – Content Terms

                                (Confidential)   32
Square Peg 1 - Task / Procedure Authored As a Table

     Issue:
         Tasks are done as tables rather than numbered lists. If there’s no
         clear consistent pattern, then automated conversion keeps the
         tables as tables, and steps are not tagged as steps.



      1   Overview               In general, backup and recovery refers to the
                                 various strategies and procedures involved in
                                 protecting a system against data loss.
      2   Backup strategy and    A backup is a copy of key files. Files included
          frequency              in the backup are:
                                 • A logical backup of the database
                                       1. Key system files
                                            • Network files
                                            • Timezone
                                       2. Configuration files …




                                (Confidential)   33
Square Peg 2 - Multiple Steps In A Single Task

   Issue:
        Only one set of steps is allowed in a single task topic. When a task has two
        sets of steps within a topic, such as for two different scenarios, only one of
        the scenarios can be tagged as <steps> as per the DTD.

                   Example:

                        Replacing an XYZ Module
                        Use this procedure to replace an XYZ module
                        Remove XYZ Module
                              1. Loosen the screws.
                              2. Disengage the ejectors
                              3. Pull the module straight out
                        Insert Replacement XYZ Module
                              1. Align the module.
                              2. Insert the module, pressing in firmly
                              3. Engage the ejectors
                              4. Securely tighten the screws

                                  (Confidential)   34
Square Peg 3 - Irrelevant Cross-References


      Issue:
          Conversion to DITA may make some source cross-references irrelevant.
          For example, assuming all empty chapter headings are dropped, a
          reference to a chapter is no longer valid. In these cases, a <required-
          cleanup> tag is inserted to flag these occurrences for clean-up.

          See Chapter 1, Introduction on page 2


     Would be tagged as:

          See <required-cleanup><xref href=”chap1”> Chapter 1,
          Introduction</xref></required-cleanup>


     NOTE: Hard-keyed page numbers are typically dropped from the cross-
     reference string since they are no longer relevant in DITA.




                                (Confidential)   35
So … Maybe You Shouldn’t Bother Converting Your Content?



         •   It seems like such a pain to go through all the old luggage
             in the attic.

         •   There is always a need for some rewriting - few writers
             have the clairvoyance to author content with the intent that
             be converted in the future – might as well rewrite it all.

         •   My writers aren’t very busy right now anyway.

         •   It’s more fun and seems like less trouble to author anew.




                              (Confidential)   36
In Reality … Converting Your Content is Worth the Bother


    • Throwing it out and starting over is an expensive option
        – In DITA, rewriting at $25/page vs. converting at $3-$4/page
        – The hidden costs of redoing index entries, links and other features you’ve
          built in
        – The hidden cost of reviewing, reproofing, and recertifying it all

    • It’s usually easier to use what you have as a base, and convert over
          – Needs planning
          – Needs time

    • Planning for a good conversion experience
        – Which content will you need?
        – Which content is worth converting?
        – Which content is suitable for re-use in multiple places?
        – What tools are available?
        – How to specify the conversion to get it right?
        – When do you start all this planning?


                                (Confidential)   37
Conversion Scope Options




   Option 1: Convert nothing
   • No conversion costs                                           2
   • Delayed ROI
                                                                   1
   Option 2: Convert everything




                                                     cost
   • High conversion costs
   • Reduced ROI                                                   3

   Option 3: Convert ‘frequently used’ documents
   • Some conversion costs
   • Maximized ROI

                                                            time




                               (Confidential)   38
What to Convert, and in What Order


       •   Categorizing
            – Active documents in good shape
            – Active documents that need a lot of work
            – Somewhat inactive document that will likely be retired
            – Archival materials

       •   Prioritizing
            – Documents that are most used
            – Documents that are customer favorites
            – Documents with longest product life
            – Start with most recent documents and go back

       •   Identifying the process
            – Can be converted as is
            – Can be converted with some work
            – Needs to be rewritten
            – Don’t convert – just keep archival copies


                              (Confidential)   39
Closing Thoughts


     • Know the scope of what you want to accomplish
         – Are you trying to get eBooks quickly, or are you changing your
           publishing process
         – Are you moving everything, or will a phased approach work
         – Will your content work naturally with the selected DTD

     • Start the conversion process early
         – Shifts the critical path; speeds the process; reduces cleanup
         – Organizing early lets more of the work be done by the content
            owners
         – eases the training and change acceptance burdens
         – setting up collaborative teams sets the tone and allows one to
            “divide and conquer”

     • Converting legacy data is not trivial
         – …but faster, safer and less expensive than rewriting
         – Each DTD has special considerations to be taken into account
         – Much can be automated, but it needs planning



                            (Confidential)   40
Questions...



                                                      & Answers

   Data Conversion Laboratory             Mark Gross, President
   61-18 190th St., 2nd Floor             mgross@dclab.com
   Fresh Meadows, NY 11365                718-307-5711
   Telephone: (718) 357-8700              Linda Morone, Sr. VP of Sales & Marketing
   Fax: (718) 357-8776                    lmorone@dclab.com
   Web: http://www.dclab.com              718-307-5728



                         (Confidential)   41

Weitere ähnliche Inhalte

Andere mochten auch

Joe Gelb: Taxonomy and Delivery
Joe Gelb: Taxonomy and DeliveryJoe Gelb: Taxonomy and Delivery
Joe Gelb: Taxonomy and DeliveryJack Molisani
 
Easy steps to convert your content to structured (frame maker and xml)
Easy steps to convert your content to structured (frame maker and xml)Easy steps to convert your content to structured (frame maker and xml)
Easy steps to convert your content to structured (frame maker and xml)Publishing Smarter
 
How to Optimize Your Metadata and Taxonomy
How to Optimize Your Metadata and TaxonomyHow to Optimize Your Metadata and Taxonomy
How to Optimize Your Metadata and TaxonomyIXIASOFT
 
Reports and DITA Metrics IXIASOFT User Conference 2016
Reports and DITA Metrics IXIASOFT User Conference 2016Reports and DITA Metrics IXIASOFT User Conference 2016
Reports and DITA Metrics IXIASOFT User Conference 2016IXIASOFT
 
Optimizing Content Reuse with DITA
Optimizing Content Reuse with DITAOptimizing Content Reuse with DITA
Optimizing Content Reuse with DITAIXIASOFT
 
Developing training websites in multiple languages with (mostly) open-source ...
Developing training websites in multiple languages with (mostly) open-source ...Developing training websites in multiple languages with (mostly) open-source ...
Developing training websites in multiple languages with (mostly) open-source ...Scriptorium Publishing
 
Blurring the Lines between ECM and CCMS
Blurring the Lines between ECM and CCMSBlurring the Lines between ECM and CCMS
Blurring the Lines between ECM and CCMSLavaCon
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information ArchitectureScott Abel
 
Multiplying the Power of Taxonomy with Granular, Structured Content
Multiplying the Power of Taxonomy with Granular, Structured ContentMultiplying the Power of Taxonomy with Granular, Structured Content
Multiplying the Power of Taxonomy with Granular, Structured ContentJoe Pairman
 
Wireframing, Mockups, and Prototyping Made Easy
Wireframing, Mockups, and Prototyping Made EasyWireframing, Mockups, and Prototyping Made Easy
Wireframing, Mockups, and Prototyping Made EasyJohn Collins
 
10 Million Dita Topics Can't Be Wrong
10 Million Dita Topics Can't Be Wrong10 Million Dita Topics Can't Be Wrong
10 Million Dita Topics Can't Be WrongIXIASOFT
 
Blooms Taxonomy Made Easy
Blooms Taxonomy Made EasyBlooms Taxonomy Made Easy
Blooms Taxonomy Made EasyLaura Davis
 

Andere mochten auch (12)

Joe Gelb: Taxonomy and Delivery
Joe Gelb: Taxonomy and DeliveryJoe Gelb: Taxonomy and Delivery
Joe Gelb: Taxonomy and Delivery
 
Easy steps to convert your content to structured (frame maker and xml)
Easy steps to convert your content to structured (frame maker and xml)Easy steps to convert your content to structured (frame maker and xml)
Easy steps to convert your content to structured (frame maker and xml)
 
How to Optimize Your Metadata and Taxonomy
How to Optimize Your Metadata and TaxonomyHow to Optimize Your Metadata and Taxonomy
How to Optimize Your Metadata and Taxonomy
 
Reports and DITA Metrics IXIASOFT User Conference 2016
Reports and DITA Metrics IXIASOFT User Conference 2016Reports and DITA Metrics IXIASOFT User Conference 2016
Reports and DITA Metrics IXIASOFT User Conference 2016
 
Optimizing Content Reuse with DITA
Optimizing Content Reuse with DITAOptimizing Content Reuse with DITA
Optimizing Content Reuse with DITA
 
Developing training websites in multiple languages with (mostly) open-source ...
Developing training websites in multiple languages with (mostly) open-source ...Developing training websites in multiple languages with (mostly) open-source ...
Developing training websites in multiple languages with (mostly) open-source ...
 
Blurring the Lines between ECM and CCMS
Blurring the Lines between ECM and CCMSBlurring the Lines between ECM and CCMS
Blurring the Lines between ECM and CCMS
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
 
Multiplying the Power of Taxonomy with Granular, Structured Content
Multiplying the Power of Taxonomy with Granular, Structured ContentMultiplying the Power of Taxonomy with Granular, Structured Content
Multiplying the Power of Taxonomy with Granular, Structured Content
 
Wireframing, Mockups, and Prototyping Made Easy
Wireframing, Mockups, and Prototyping Made EasyWireframing, Mockups, and Prototyping Made Easy
Wireframing, Mockups, and Prototyping Made Easy
 
10 Million Dita Topics Can't Be Wrong
10 Million Dita Topics Can't Be Wrong10 Million Dita Topics Can't Be Wrong
10 Million Dita Topics Can't Be Wrong
 
Blooms Taxonomy Made Easy
Blooms Taxonomy Made EasyBlooms Taxonomy Made Easy
Blooms Taxonomy Made Easy
 

Ähnlich wie Converting Unstructured Docs to XML/DITA/ePub

Odessa .NET User Group - 10.11.2011 - Applied Code Generation
Odessa .NET User Group - 10.11.2011 - Applied Code Generation Odessa .NET User Group - 10.11.2011 - Applied Code Generation
Odessa .NET User Group - 10.11.2011 - Applied Code Generation Dmytro Mindra
 
Introducing MongoDB into your Organization
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your OrganizationMongoDB
 
When Conditional Content Goes Wild: Why Conditional Content Profiling (Alone)...
When Conditional Content Goes Wild: Why Conditional Content Profiling (Alone)...When Conditional Content Goes Wild: Why Conditional Content Profiling (Alone)...
When Conditional Content Goes Wild: Why Conditional Content Profiling (Alone)...dclsocialmedia
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FSMongoDB
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudRightScale
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDBMongoDB
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Optimizing and Accelerating your SharePoint Farm
Optimizing and Accelerating your SharePoint FarmOptimizing and Accelerating your SharePoint Farm
Optimizing and Accelerating your SharePoint FarmChris McNulty
 
Big data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalBig data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalramazan fırın
 
Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012
Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012
Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012Bjarni Kristjánsson
 
Nosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesNosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesMongoDB
 
Idiomatic Domain Driven Design: implementing CQRS
Idiomatic Domain Driven Design: implementing CQRSIdiomatic Domain Driven Design: implementing CQRS
Idiomatic Domain Driven Design: implementing CQRSAndrea Saltarello
 
Introduction to Structured Authoring
Introduction to Structured AuthoringIntroduction to Structured Authoring
Introduction to Structured Authoringdclsocialmedia
 
Maximal: Deploying Optimization Models on Servers and Mobile Platforms - Oct ...
Maximal: Deploying Optimization Models on Servers and Mobile Platforms - Oct ...Maximal: Deploying Optimization Models on Servers and Mobile Platforms - Oct ...
Maximal: Deploying Optimization Models on Servers and Mobile Platforms - Oct ...Bjarni Kristjánsson
 
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014MongoDB
 
NetWork - 15.10.2011 - Applied code generation in .NET
NetWork - 15.10.2011 - Applied code generation in .NET NetWork - 15.10.2011 - Applied code generation in .NET
NetWork - 15.10.2011 - Applied code generation in .NET Dmytro Mindra
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
The information supernova
The information supernovaThe information supernova
The information supernovaAlaa Al-Agamawi
 
Data-Driven User Experience
Data-Driven User ExperienceData-Driven User Experience
Data-Driven User Experiencedclsocialmedia
 

Ähnlich wie Converting Unstructured Docs to XML/DITA/ePub (20)

Odessa .NET User Group - 10.11.2011 - Applied Code Generation
Odessa .NET User Group - 10.11.2011 - Applied Code Generation Odessa .NET User Group - 10.11.2011 - Applied Code Generation
Odessa .NET User Group - 10.11.2011 - Applied Code Generation
 
Introducing MongoDB into your Organization
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your Organization
 
When Conditional Content Goes Wild: Why Conditional Content Profiling (Alone)...
When Conditional Content Goes Wild: Why Conditional Content Profiling (Alone)...When Conditional Content Goes Wild: Why Conditional Content Profiling (Alone)...
When Conditional Content Goes Wild: Why Conditional Content Profiling (Alone)...
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Optimizing and Accelerating your SharePoint Farm
Optimizing and Accelerating your SharePoint FarmOptimizing and Accelerating your SharePoint Farm
Optimizing and Accelerating your SharePoint Farm
 
Big data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalBig data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-final
 
Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012
Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012
Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012
 
Nosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesNosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use Cases
 
Idiomatic Domain Driven Design: implementing CQRS
Idiomatic Domain Driven Design: implementing CQRSIdiomatic Domain Driven Design: implementing CQRS
Idiomatic Domain Driven Design: implementing CQRS
 
Big data
Big dataBig data
Big data
 
Introduction to Structured Authoring
Introduction to Structured AuthoringIntroduction to Structured Authoring
Introduction to Structured Authoring
 
Maximal: Deploying Optimization Models on Servers and Mobile Platforms - Oct ...
Maximal: Deploying Optimization Models on Servers and Mobile Platforms - Oct ...Maximal: Deploying Optimization Models on Servers and Mobile Platforms - Oct ...
Maximal: Deploying Optimization Models on Servers and Mobile Platforms - Oct ...
 
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
 
NetWork - 15.10.2011 - Applied code generation in .NET
NetWork - 15.10.2011 - Applied code generation in .NET NetWork - 15.10.2011 - Applied code generation in .NET
NetWork - 15.10.2011 - Applied code generation in .NET
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
The information supernova
The information supernovaThe information supernova
The information supernova
 
Data-Driven User Experience
Data-Driven User ExperienceData-Driven User Experience
Data-Driven User Experience
 

Kürzlich hochgeladen

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Kürzlich hochgeladen (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Converting Unstructured Docs to XML/DITA/ePub

  • 1. Converting Unstructured Docs to XML/DITA/ePub Mark Gross Linda Morone
  • 2. Background of Data Conversion Laboratory 30 years of experience providing electronic document conversion services meeting the needs of technology…today & in the future • More than 1 billion pages converted to date • US Based project management team • Global capabilities • Transform legacy & future documents • From any format to any format • Specialize in complex projects • Identify redundant data for content reuse • Employ a proven automated process • Quality Assurance service is standard in all projects • Additional services include consulting, composition & transcription & translation (Confidential) 2
  • 3. Serving All Industries • Publishers • Government • Defense • Life sciences • Automotive • Aerospace • Heavy and Industrial Equipment • Financial Services • Manufacturing • Computing • Utilities • Semiconductors • Telecommunications (Confidential) 3
  • 4. Serving a Broad Client Base (Confidential) 4
  • 5. Converting Legacy Data … Is it Worth the Expense? • Comply with regulations • Match Industry standards • Meet customer expectations & needs • Support internal departments • Expand into new markets • Multi-purpose content (Confidential) 5
  • 6. Legacy Conversion: Fact or Fiction Client’s Perception Reality • Painful Process • Expertise & Planning • Complex • QC & Automation • Expensive • Guaranteed Results • Drain on Resources • Low Costs (Confidential) 6
  • 7. So … Which Format do you Choose ePUB and Rendering-Focused DTDs NLM and Publishing DTDs • Designed for e-readers & mobile • Support traditional publishing devices • Flexible open standard • Freely available • Freely available • Open standard • Human-readable format • Adaptable to – Books DITA and Module-Based DTDs – Documents • Designed for multi-purposing and – Manuals content reuse – User guides • Topic based & modular • Support for print publishing • Supports requirements is limited – Multiple variants – Multiple languages – Context independent content (Confidential) 7
  • 8. The Story with ePub and Rendering-Focused DTDs • ePub is an emerging standard used for most eReaders • Mobi is also a large player, proprietary to Amazon Kindle • ePub is an evolving standard • ePub is supported differently by different eReaders • There are no “Silver Bullets” • eBooks are publications and need care in their production • Not just novels; recent DCL survey shows 75% will be using eBooks for complex materials (Confidential) 8
  • 9. Things to Keep in Mind When Converting • Smaller screen size • Large tables may not fit • Not all Character Sets supported by all devices • MathML not currently supported (Confidential) 9
  • 10. OCR/Text Extraction Pitfalls of Text Extraction • Special Characters • Emphasis • Ligatures • Hyphens – Soft and Hard (Confidential) 10
  • 11. Handling of Objects Mid-Paragraph Converting exactly per source may lead to problems … (Confidential) 11
  • 12. Math as Images – Changing Font Size Doesn’t Change Images (Confidential) 12
  • 13. Unicode Symbols Will Adjust with the Font Size Change (Confidential) 13
  • 14. Large Tables Table as Text (searchable but cut off) Table as Image (Confidential) 14
  • 15. When Layout Matters Testing Materials Poetry (Confidential) 15
  • 16. When Layout Matters (cont’d) Letter Recipe (Confidential) 16
  • 17. Some Notes on the Kindle • Designed for reading long documents • Designed for simplicity • Has some features that others don’t • But also missing some features that others have • Therefore, need to design the conversion differently (Confidential) 17
  • 18. Glossary Definitions iPad screenshot Kindle screenshot (Confidential) 18
  • 19. Use of CSS “Float” Style iPad screenshot Kindle screenshot (Confidential) 19
  • 20. Use of Borders iPad screenshot Kindle screenshot (Confidential) 20
  • 21. Color/Spanning/Large Tables iPad screenshot Kindle screenshot (Confidential) 21
  • 22. The Story with NLM and Publishing DTDs • Well-documented public domain standard. • Well-tested on a wide variety of materials; designed for complex publishing. • Originally designed with NIH support for Scientific, Technical, and Medical (STM) publications. • Extended to be robust for many more uses; widely used in non-STM areas. • DocBook and PRISM are other standard DTD’s; each with its own strengths – all designed for “print” publications. (Confidential) 22
  • 23. Choosing the Content to Convert Which content will be auto-generated? • TOC • Index • Labels • Titles • List of Table, Figures, etc. (Confidential) 23
  • 24. Capturing Items as Multiple Formats  Math as images and MathML  Tables as images and XHTML <disp-formula id="FD1"> <mml:math id="M1" display='block'> <mml:semantics> <mml:mrow> <mml:mi>L</mml:mi> <mml:mo>&#x0003D;</mml:mo> <mml:mo>&#x02211;</mml:mo> <mml:mrow> <mml:msub> <mml:mrow> <mml:mi>l</mml:mi></mml:mrow> <mml:mi>i</mml:mi></mml:msub> <mml:mo>&#x0002F;</mml:mo> <mml:mi>N</mml:mi></mml:mrow> <mml:mo>&#x0002E;</mml:mo></mml:mrow> </mml:semantics></mml:math> </disp-formula> (Confidential) 24
  • 25. Determining Data Elements Appearance Based: Content Based: • Alignment • <email> - @ • Placement • <uri> - www • Point size • <degrees> - PhD, MD, BA • Font • <fig> - Figure, Illustration, Chart, Scheme (Confidential) 25
  • 26. Granularity of Tagging: Front Matter (Confidential) 26
  • 27. Granularity of Tagging: Back Matter • Are the references Harvard or Numeric? • Is the author name last/first or first/last? • What is the placement of the year within the citation? • Is a comma or period used after the author names? (Confidential) 27
  • 28. The Story with DITA and Module-Based DTDs • Allows for modularization of your content with Topics, and easy re-use in multiple outputs • Pre-packaged & ready to use XML (almost) • Ready-to-go for techdocs (mostly) • Infrastructure included - taxonomy (DTD and schema); printing stylesheets; lots of tools • Printable with standard tools • Extensible with specializations • Further specializations for publishing, testing, and other specialized areas • Content-based • What do you when things don’t fit (Confidential) 28
  • 29. What Makes DITA Conversions Difficult “Getting there using DITA is like building with prefabricated modular components that can be quickly assembled into a suitable structure.” - Doug Henschen, intelligententerprise.com • DITA is a conceptual departure from linear information – and is difficult for many to get used to • Turns the traditional book into a collection of Topics • Topics can be thought of as interchangeable parts – to be reassembled in multiple ways – to be repurposed for multiple outputs – to be reused across multiple products • …but your documents weren’t likely to have been designed to do this. (Confidential) 29
  • 30. Structuring a Book into Topics in DITA Book 3 Reference 4 Book A Book 1 Concept 1 Reference 1 Concept 4 Reference 1 DITA Content Reference 2 Task 3 Concept 3 Management System Concept 5 Reference 3 Reference 1 Task 2 Task 1 Concept 2 Concept 1 Reference 1 Task 2 Reference 5 Concept 2 Reference 2 Task 1 Book 4 Concept 3 Reference 3 Book B Concept 2 Concept 4 Reference 4 Concept 1 Task 1 Concept 5 Reference 5 Book 2 Reference 1 Concept 3 Task 1 Task 1 Concept 1 Task 2 Task 2 Concept 2 Reference 2 Reference 3 Reference 2 Concept 5 Task 3 Task 3 Task 3 Task 2 Task 1 “Getting there using DITA is like building with prefabricated modular Reference 5 components that can be quickly assembled into a suitable structure.” Concept 2 – Doug Henschen, intelligententerprise.com (Confidential) 30
  • 31. Further Complications in DITA Conversions • There’s the usual conversion issues – Accuracy of the transferred text – Tables – Math – Special Characters • There’s also the structuring issues – Identifying topics – Identifying reusable content • And the people issues – Deciding what needs re-authoring – Getting used to a new “document” paradigm – Getting rugged individualists to collaborate more (Confidential) 31
  • 32. Overview of Typical DITA Technical Conversion Issues • Architectural constraints of DITA – the square pegs – Multiple steps within a single task topic – TaskProcedure authored as a table in the source – Presence of untitled tasks/topics in the source – References to page numbers (irrelevant cross-references) – Having more than two levels of steps • How your rendering system will handle XML – Figures – Steps • Other conversion considerations: – Hierarchy in Map Files – Metadata in Map Files and Topics – Index Terms – Conditional Text – Glossary Terms – Content Terms (Confidential) 32
  • 33. Square Peg 1 - Task / Procedure Authored As a Table Issue: Tasks are done as tables rather than numbered lists. If there’s no clear consistent pattern, then automated conversion keeps the tables as tables, and steps are not tagged as steps. 1 Overview In general, backup and recovery refers to the various strategies and procedures involved in protecting a system against data loss. 2 Backup strategy and A backup is a copy of key files. Files included frequency in the backup are: • A logical backup of the database 1. Key system files • Network files • Timezone 2. Configuration files … (Confidential) 33
  • 34. Square Peg 2 - Multiple Steps In A Single Task Issue: Only one set of steps is allowed in a single task topic. When a task has two sets of steps within a topic, such as for two different scenarios, only one of the scenarios can be tagged as <steps> as per the DTD. Example: Replacing an XYZ Module Use this procedure to replace an XYZ module Remove XYZ Module 1. Loosen the screws. 2. Disengage the ejectors 3. Pull the module straight out Insert Replacement XYZ Module 1. Align the module. 2. Insert the module, pressing in firmly 3. Engage the ejectors 4. Securely tighten the screws (Confidential) 34
  • 35. Square Peg 3 - Irrelevant Cross-References Issue: Conversion to DITA may make some source cross-references irrelevant. For example, assuming all empty chapter headings are dropped, a reference to a chapter is no longer valid. In these cases, a <required- cleanup> tag is inserted to flag these occurrences for clean-up. See Chapter 1, Introduction on page 2 Would be tagged as: See <required-cleanup><xref href=”chap1”> Chapter 1, Introduction</xref></required-cleanup> NOTE: Hard-keyed page numbers are typically dropped from the cross- reference string since they are no longer relevant in DITA. (Confidential) 35
  • 36. So … Maybe You Shouldn’t Bother Converting Your Content? • It seems like such a pain to go through all the old luggage in the attic. • There is always a need for some rewriting - few writers have the clairvoyance to author content with the intent that be converted in the future – might as well rewrite it all. • My writers aren’t very busy right now anyway. • It’s more fun and seems like less trouble to author anew. (Confidential) 36
  • 37. In Reality … Converting Your Content is Worth the Bother • Throwing it out and starting over is an expensive option – In DITA, rewriting at $25/page vs. converting at $3-$4/page – The hidden costs of redoing index entries, links and other features you’ve built in – The hidden cost of reviewing, reproofing, and recertifying it all • It’s usually easier to use what you have as a base, and convert over – Needs planning – Needs time • Planning for a good conversion experience – Which content will you need? – Which content is worth converting? – Which content is suitable for re-use in multiple places? – What tools are available? – How to specify the conversion to get it right? – When do you start all this planning? (Confidential) 37
  • 38. Conversion Scope Options Option 1: Convert nothing • No conversion costs 2 • Delayed ROI 1 Option 2: Convert everything cost • High conversion costs • Reduced ROI 3 Option 3: Convert ‘frequently used’ documents • Some conversion costs • Maximized ROI time (Confidential) 38
  • 39. What to Convert, and in What Order • Categorizing – Active documents in good shape – Active documents that need a lot of work – Somewhat inactive document that will likely be retired – Archival materials • Prioritizing – Documents that are most used – Documents that are customer favorites – Documents with longest product life – Start with most recent documents and go back • Identifying the process – Can be converted as is – Can be converted with some work – Needs to be rewritten – Don’t convert – just keep archival copies (Confidential) 39
  • 40. Closing Thoughts • Know the scope of what you want to accomplish – Are you trying to get eBooks quickly, or are you changing your publishing process – Are you moving everything, or will a phased approach work – Will your content work naturally with the selected DTD • Start the conversion process early – Shifts the critical path; speeds the process; reduces cleanup – Organizing early lets more of the work be done by the content owners – eases the training and change acceptance burdens – setting up collaborative teams sets the tone and allows one to “divide and conquer” • Converting legacy data is not trivial – …but faster, safer and less expensive than rewriting – Each DTD has special considerations to be taken into account – Much can be automated, but it needs planning (Confidential) 40
  • 41. Questions... & Answers Data Conversion Laboratory Mark Gross, President 61-18 190th St., 2nd Floor mgross@dclab.com Fresh Meadows, NY 11365 718-307-5711 Telephone: (718) 357-8700 Linda Morone, Sr. VP of Sales & Marketing Fax: (718) 357-8776 lmorone@dclab.com Web: http://www.dclab.com 718-307-5728 (Confidential) 41