SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Archival Storage
Strategies & Technologies
   AIIM Presentation
    January 25, 2006

                 Porter-Roth Associates 1
Bud Porter-Roth
Porter-Roth Associates
     415-381-6217
  budpr@erms.com
 http://www.erms.com

                Porter-Roth Associates 2
Agenda

  Introduction
  The Preservation Problem
  Recommendations




                             Porter-Roth Associates 3
Introduction




                           Disaster
                           Recovery
 Basic Need




   Compliance
                Porter-Roth Associates 4
Introduction
          Paper     e-Mail Servers                     Business
          Files
                                                       Systems


                      Electronic         Photographs
        Microfilm     Document
                      Repositories
    Flash Drives
                                                         PDAs
                           Imaging
                           Repositories
  Local Drives


                                     Video Libraries
     File Systems      Web
                       Servers

                                 Porter-Roth Associates 5
The Preservation Problem

 The problem is actually two separate, sort
 of unrelated issues:
   Hardware and software to store and read
   documents
     Hardware, OS, applications
   The format that the documents are in
     Word, PDF, XML



                                  Porter-Roth Associates 6
The Preservation Problem

 The “Problem” in brief
   Software formats change and become non-supported
   Software formats fall out of favor over time and disappear
   Hardware drives change and become non-supported
   Storage media changes overtime and becomes obsolete
      Floppy disks
      Optical disks (WORM, CD, DVD)
      Tape (many flavors of)
      Portable storage media like the “Memory Stick” in use today
 With all of the above issues, for digital documents, it
 means that there is a strong chance that you will be forced
 to convert something to something else over time – as a,
 in the foreseeable future, continuing process.

                                               Porter-Roth Associates 7
The Preservation Problem

 TIFF (Tagged Image File Format) usually with ITG Group 4
 compression
 JPEG (Joint Photographic Experts Group)
 GIF (Graphic Interchange Format)
 PNG (Portable Network Graphics)
 Native file formats (Word, Excel, etc) also known as “Born
 Digital” documents
 PDF, PDF/A, PDF/X
 Many other proprietary electronic formats
 Paper
 Film


                                     Porter-Roth Associates 8
The Preservation Problem

 What is the best option for preserving electronic documents over
 archival time spans? (Disregarding the hardware storage issues)
    TIFF? A “digital picture” of your page
        Widely adopted standard for document imaging
        Not human readable without the software
        No access to underlying text without OCR
    XML? A format description of the page – a style sheet
        Good for describing logical structure, but not appearance
        Many incompatible domain-specific schemas
    Native Format (e.g., MS Word)?
        Several ubiquitous, but closed proprietary formats
        Can you spell WordPerfect?
    PDF? PDF/A?
    Microsoft Metro renamed XPS?


                                                       Porter-Roth Associates 9
Desirable Properties of a
Format

 Device independence
   Can be reliably and consistently rendered without
   regard to the hardware/software platform
 Self-contained
   Contains all resources necessary for rendering
 Self-documenting
   Contains its own description
 Transparency
   Amenable to direct analysis with basic tools


                                     Porter-Roth Associates 10
Adobe PDF and PDF/A

 PDF is a ubiquitous open format for electronic
 documents
   Proprietary, but with publicly available specification
   Companies, other than Adobe, make PDF products
 Many statutory, regulatory, and institutional
 policies mandate the retention of PDF-based
 documents over multiple generations of
 technology
 The feature-rich nature of PDF can complicate
 preservation efforts
                                      Porter-Roth Associates 11
PDF/A

 PDF/A is intended to address three primary
 issues:
   Define a file format that preserves the static visual
   appearance of electronic documents over time
   Provide a framework for recording metadata about
   electronic documents
   Provide a framework for defining the logical structure
   and semantic properties of electronic documents


                                      Porter-Roth Associates 12
PDF/A

 PDF/A constraints include:
   Audio and video content are forbidden
   Javascript and executable file launches are prohibited
   All fonts must be embedded and also must be legally
   embeddable for unlimited, universal rendering
   Colorspaces specified in a device-independent manner
   Encryption is disallowed
   Use of standards-based metadata is mandated



                                    Porter-Roth Associates 13
PDF/A

 However…
  PDF/A alone does not guarantee preservation
  PDF/A alone does not guarantee exact replication of
  source material
  The intent of PDF/A is not to claim that PDF-based
  solutions are the best way to preserve electronic
  documents
  But once you have decided to use a PDF-based
  approach, PDF/A defines an archival profile of PDF that
  is more amenable to long-term preservation

                                   Porter-Roth Associates 14
PDF/A ….Nevertheless

 PDF/A may not be the last preservation
 format you will use or need
 However, proper application of PDF/A
 should result in reliable, predictable, and
 unambiguous access to the full information
 content of electronic documents


                            Porter-Roth Associates 15
Microsoft XPS

 XPS is an abbreviation for the XML Paper Specification
 The XML Paper Specification describes the XPS Document format. A
 document in XPS Document format (XPS Document) is a paginated
 representation of electronic paper described in an XML-based format.
 The XPS Document format is an open, cross-platform document format
 that allows customers to effortlessly create, share, print, and archive
 paginated documents.
 XPS Documents use a file container that conforms to the Open
 Packaging Conventions. The new file formats in the next version of the
 Microsoft Office System, codenamed Office quot;12,quot; also use the Open
 Packaging Conventions for organizing data into files, allowing
 businesses to be able to manage Office quot;12quot; documents and XPS
 Documents in the same manner.
 The XPS Document format is both a fixed-layout document
 interchange format, a native Windows Vista spool file format, and a
 PDL (Page Description Language, used by printing devices).
 http://www.microsoft.com/whdc/xps/default.mspx
                                             Porter-Roth Associates 16
Recommendations

 This is still a wild frontier, with no certain outcome or
 single standard “The good thing about standards is that
 there are so many of them….”
 When in doubt about long-term storage of vital
 documents, paper or film is still a good answer
 Beware of new technologies, even ones that are
 “standards”
 TIFF, JPEG, PDF, PDF/A are recommended.
 The weight of in-place document formats will mean that
 change will be very slow and may stop change unless a
 dramatic “out of the blue” technology appears

                                      Porter-Roth Associates 17
Conclusion & Questions


                          Finally!




                         Questions?


                     Porter-Roth Associates 18

Weitere ähnliche Inhalte

Ähnlich wie Archival Storage Strategies & Technologies for Electronic Documents

PDF/Archive - Preserving Electronic Documents
PDF/Archive - Preserving Electronic DocumentsPDF/Archive - Preserving Electronic Documents
PDF/Archive - Preserving Electronic DocumentsBetsy Fanning
 
File Formats for Preservation
File Formats for PreservationFile Formats for Preservation
File Formats for PreservationStephen Gray
 
PRESENTATION: Challenges of Digitization (November 2012)
PRESENTATION: Challenges of Digitization (November 2012)PRESENTATION: Challenges of Digitization (November 2012)
PRESENTATION: Challenges of Digitization (November 2012)Adlib - The PDF Experts
 
PDF/A: A Preservation Format
PDF/A: A Preservation Format PDF/A: A Preservation Format
PDF/A: A Preservation Format Geof Huth
 
The importance of standards
The importance of standardsThe importance of standards
The importance of standardsiText Group nv
 
E resources
E resourcesE resources
E resourcesavid
 
20110428 ARMA Amarillo Managing Your Records in 5, 50, 500 Years
20110428 ARMA Amarillo Managing Your Records in 5, 50, 500 Years20110428 ARMA Amarillo Managing Your Records in 5, 50, 500 Years
20110428 ARMA Amarillo Managing Your Records in 5, 50, 500 YearsJesse Wilkins
 
Different file types
Different file typesDifferent file types
Different file typesDeftPDF
 
A strategic view of document and digital object management
A strategic view of document and digital object managementA strategic view of document and digital object management
A strategic view of document and digital object managementDerek Keats
 
Open Document Format
Open Document FormatOpen Document Format
Open Document FormatKhan Mostafa
 
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML Alexandro Colorado
 
Digitization of Physical Assets
Digitization of Physical AssetsDigitization of Physical Assets
Digitization of Physical AssetsDaniel Novak
 
Approaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic dataApproaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic dataChris Freeland
 

Ähnlich wie Archival Storage Strategies & Technologies for Electronic Documents (20)

PDF/Archive - Preserving Electronic Documents
PDF/Archive - Preserving Electronic DocumentsPDF/Archive - Preserving Electronic Documents
PDF/Archive - Preserving Electronic Documents
 
File Formats for Preservation
File Formats for PreservationFile Formats for Preservation
File Formats for Preservation
 
PRESENTATION: Challenges of Digitization (November 2012)
PRESENTATION: Challenges of Digitization (November 2012)PRESENTATION: Challenges of Digitization (November 2012)
PRESENTATION: Challenges of Digitization (November 2012)
 
PDF/A: A Preservation Format
PDF/A: A Preservation FormatPDF/A: A Preservation Format
PDF/A: A Preservation Format
 
Introduction to Document Management
Introduction to Document ManagementIntroduction to Document Management
Introduction to Document Management
 
PDF/A: A Preservation Format
PDF/A: A Preservation Format PDF/A: A Preservation Format
PDF/A: A Preservation Format
 
Beyond TIFF and JPEG2000
Beyond TIFF and JPEG2000Beyond TIFF and JPEG2000
Beyond TIFF and JPEG2000
 
The importance of standards
The importance of standardsThe importance of standards
The importance of standards
 
E resources
E resourcesE resources
E resources
 
20110428 ARMA Amarillo Managing Your Records in 5, 50, 500 Years
20110428 ARMA Amarillo Managing Your Records in 5, 50, 500 Years20110428 ARMA Amarillo Managing Your Records in 5, 50, 500 Years
20110428 ARMA Amarillo Managing Your Records in 5, 50, 500 Years
 
Different file types
Different file typesDifferent file types
Different file types
 
What is PDF/A?
What is PDF/A?What is PDF/A?
What is PDF/A?
 
A strategic view of document and digital object management
A strategic view of document and digital object managementA strategic view of document and digital object management
A strategic view of document and digital object management
 
Star 2013-pdfa-pdfa
Star 2013-pdfa-pdfaStar 2013-pdfa-pdfa
Star 2013-pdfa-pdfa
 
Open Document Format
Open Document FormatOpen Document Format
Open Document Format
 
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
A Technical Comparison: ISO/IEC 26300 vs Microsoft Office Open XML
 
Digitization of Physical Assets
Digitization of Physical AssetsDigitization of Physical Assets
Digitization of Physical Assets
 
Pdfa 2 rome-fanning
Pdfa 2 rome-fanningPdfa 2 rome-fanning
Pdfa 2 rome-fanning
 
Right Solution for PDF
Right Solution for PDFRight Solution for PDF
Right Solution for PDF
 
Approaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic dataApproaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic data
 

Kürzlich hochgeladen

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Archival Storage Strategies & Technologies for Electronic Documents

  • 1. Archival Storage Strategies & Technologies AIIM Presentation January 25, 2006 Porter-Roth Associates 1
  • 2. Bud Porter-Roth Porter-Roth Associates 415-381-6217 budpr@erms.com http://www.erms.com Porter-Roth Associates 2
  • 3. Agenda Introduction The Preservation Problem Recommendations Porter-Roth Associates 3
  • 4. Introduction Disaster Recovery Basic Need Compliance Porter-Roth Associates 4
  • 5. Introduction Paper e-Mail Servers Business Files Systems Electronic Photographs Microfilm Document Repositories Flash Drives PDAs Imaging Repositories Local Drives Video Libraries File Systems Web Servers Porter-Roth Associates 5
  • 6. The Preservation Problem The problem is actually two separate, sort of unrelated issues: Hardware and software to store and read documents Hardware, OS, applications The format that the documents are in Word, PDF, XML Porter-Roth Associates 6
  • 7. The Preservation Problem The “Problem” in brief Software formats change and become non-supported Software formats fall out of favor over time and disappear Hardware drives change and become non-supported Storage media changes overtime and becomes obsolete Floppy disks Optical disks (WORM, CD, DVD) Tape (many flavors of) Portable storage media like the “Memory Stick” in use today With all of the above issues, for digital documents, it means that there is a strong chance that you will be forced to convert something to something else over time – as a, in the foreseeable future, continuing process. Porter-Roth Associates 7
  • 8. The Preservation Problem TIFF (Tagged Image File Format) usually with ITG Group 4 compression JPEG (Joint Photographic Experts Group) GIF (Graphic Interchange Format) PNG (Portable Network Graphics) Native file formats (Word, Excel, etc) also known as “Born Digital” documents PDF, PDF/A, PDF/X Many other proprietary electronic formats Paper Film Porter-Roth Associates 8
  • 9. The Preservation Problem What is the best option for preserving electronic documents over archival time spans? (Disregarding the hardware storage issues) TIFF? A “digital picture” of your page Widely adopted standard for document imaging Not human readable without the software No access to underlying text without OCR XML? A format description of the page – a style sheet Good for describing logical structure, but not appearance Many incompatible domain-specific schemas Native Format (e.g., MS Word)? Several ubiquitous, but closed proprietary formats Can you spell WordPerfect? PDF? PDF/A? Microsoft Metro renamed XPS? Porter-Roth Associates 9
  • 10. Desirable Properties of a Format Device independence Can be reliably and consistently rendered without regard to the hardware/software platform Self-contained Contains all resources necessary for rendering Self-documenting Contains its own description Transparency Amenable to direct analysis with basic tools Porter-Roth Associates 10
  • 11. Adobe PDF and PDF/A PDF is a ubiquitous open format for electronic documents Proprietary, but with publicly available specification Companies, other than Adobe, make PDF products Many statutory, regulatory, and institutional policies mandate the retention of PDF-based documents over multiple generations of technology The feature-rich nature of PDF can complicate preservation efforts Porter-Roth Associates 11
  • 12. PDF/A PDF/A is intended to address three primary issues: Define a file format that preserves the static visual appearance of electronic documents over time Provide a framework for recording metadata about electronic documents Provide a framework for defining the logical structure and semantic properties of electronic documents Porter-Roth Associates 12
  • 13. PDF/A PDF/A constraints include: Audio and video content are forbidden Javascript and executable file launches are prohibited All fonts must be embedded and also must be legally embeddable for unlimited, universal rendering Colorspaces specified in a device-independent manner Encryption is disallowed Use of standards-based metadata is mandated Porter-Roth Associates 13
  • 14. PDF/A However… PDF/A alone does not guarantee preservation PDF/A alone does not guarantee exact replication of source material The intent of PDF/A is not to claim that PDF-based solutions are the best way to preserve electronic documents But once you have decided to use a PDF-based approach, PDF/A defines an archival profile of PDF that is more amenable to long-term preservation Porter-Roth Associates 14
  • 15. PDF/A ….Nevertheless PDF/A may not be the last preservation format you will use or need However, proper application of PDF/A should result in reliable, predictable, and unambiguous access to the full information content of electronic documents Porter-Roth Associates 15
  • 16. Microsoft XPS XPS is an abbreviation for the XML Paper Specification The XML Paper Specification describes the XPS Document format. A document in XPS Document format (XPS Document) is a paginated representation of electronic paper described in an XML-based format. The XPS Document format is an open, cross-platform document format that allows customers to effortlessly create, share, print, and archive paginated documents. XPS Documents use a file container that conforms to the Open Packaging Conventions. The new file formats in the next version of the Microsoft Office System, codenamed Office quot;12,quot; also use the Open Packaging Conventions for organizing data into files, allowing businesses to be able to manage Office quot;12quot; documents and XPS Documents in the same manner. The XPS Document format is both a fixed-layout document interchange format, a native Windows Vista spool file format, and a PDL (Page Description Language, used by printing devices). http://www.microsoft.com/whdc/xps/default.mspx Porter-Roth Associates 16
  • 17. Recommendations This is still a wild frontier, with no certain outcome or single standard “The good thing about standards is that there are so many of them….” When in doubt about long-term storage of vital documents, paper or film is still a good answer Beware of new technologies, even ones that are “standards” TIFF, JPEG, PDF, PDF/A are recommended. The weight of in-place document formats will mean that change will be very slow and may stop change unless a dramatic “out of the blue” technology appears Porter-Roth Associates 17
  • 18. Conclusion & Questions Finally! Questions? Porter-Roth Associates 18