NISO Two-Part Webinar: Sustainable Information
Part 2: Digital Preservation of Audio-Visual Content
About the Webinar
Audio-visual resources in digital formats present even more challenges to preservation than do digital text resources. Reformatting information to a common file format can be difficult and may require specialists to ensure it is done with no loss in integrity. While digital text may still be usable if done imperfectly (e.g. skewed but still readable pages), even small errors in digital A/V files could render the material unusable.
This webinar will share the experiences of several projects that are working to ensure that A/V files can be preserved with their full integrity ensured.
Agenda
Introduction
Todd Carpenter, Executive Director, NISO
Planning for Video Preservation Services at Harvard
Andrea Goethals, Manager of Digital Preservation and Repository Services, Harvard University Library
David Ackerman, Head of Media Preservation, Harvard University Library
AXF: Finally a Storage and Preservation Standard for the Ages
Brian Campanotti, Chief Technical Officer, Front Porch Digital
An Open-Source Preservation Solution: Hydra/Blacklight
Tom Cramer, Chief Technology Strategist & Associate Director, Digital Library Systems & Services, Stanford University Libraries
NISO Two-Part Webinar: Sustainable Information Part 2: Digital Preservation of Audio-Visual Content
1. NISO Two-Part Webinar
Sustainable Information, Part 2:
Digital Preservation for Audio-Visual Content
Wednesday, December 17, 2014
Speakers:
Andrea Goethals, Manager of Digital Preservation and Repository Services,
Harvard University Library
David Ackerman, Head of Media Preservation, Harvard University Library
Brian Campanotti, Chief Technical Officer, Front Porch Digital
Tom Cramer, Chief Technology Strategist & Associate Director,
Digital Library Systems & Services, Stanford University Libraries
http://www.niso.org/news/events/2014/webinars/text_preservation/
2. Planning for Video Preservation
Services at Harvard
NISO Webinar, Dec. 17, 2014
David Ackerman & Andrea Goethals, Harvard Library
3. Agenda
• Preserving video – analysis and decisions
(Andrea)
• Reformatting video – workflows and
challenges (David)
5. Video
23%
Vector Graphics
16%
Office Documents
14%
DNG
6%
3D Models
5%
Software
6%
Other Still Images
6%
Databases
6%
Ebooks
5%
Datasets
5%
Web Sites
3%
Other OCR Text
1%
Newspaper
2% GIS
2%
DRS Format Requests (2004 – Present)
Chart last updated 12/8/2014
6. Building Blocks Across Harvard
• Service providers
– Preservation Services (Digital Preservation, Media
Preservation)
– IT (HUIT / Library Technology Services / Digital Video
Services)
• Infrastructure
– Digital Repository Service
– Media Preservation’s digitization studio
• Users, collectors, creators
– Harvard repositories and schools
– HarvardX, DCE and other RTL video generators
– Current & future researchers, teachers, learners
7. Building Blocks – Beyond
• Kaltura (video management and delivery)
• MediaSite (lecture capture)
• 3Play Video (video captioning)
• AVPreserve
8. Format Analysis: Criteria
Availability online
Backward/Forward Compatibility
Community/3rd Party Support
Complexity
Compression
Cost
Developer/Corporate Support
Domain Specificity
Ease of Identification
Ease of Validation
Error-tolerance
Expertise Available
Geographic Spread
Institutional Policies
Legal Restrictions
Lifetime
Metadata Support
Rendering Software Available
Revision Rate
Specifications Available
Specification Quality
Standardization
Storage Space
Technical Dependencies
Technical Protection Mechanism
Ubiquity
Value
Viruses
9. Format Analysis: Criteria
Availability online Browser Support
Backward/Forward Compatibility
Community/3rd Party Support
Complexity Level of Format Complexity
Compression Degree to which Compression is Understood
Cost to Maintain Environment for Access and Processing
Developer/Corporate Support
Domain Specificity
Ease of Identification
Ease of Validation Accurate Validation
Error-tolerance
Expertise Available
Geographic Spread
Institutional Policies
Legal Restrictions Affecting Use Now or Long-Term
Lifetime
Metadata Support Descriptive Metadata Support; Technical Metadata Support
Rendering Software Available Quantity and Availability of Rendering Software
Revision Rate
Specifications Available
Specification Quality Degree to Which Specification is Complete and Understandable
Standardization Standardized
Storage Space Storage Requirements Relative to Other Similar Formats
Technical Dependencies Dependence on Particular HW/SW
Technical Protection Mechanism Support for Technical Protection Mechanisms
Ubiquity Widespread Use by Consumers; Widespread Use by Professionals
Value
Viruses Malware
10. Format Analysis: Criteria
Availability online Browser Support
Backward/Forward Compatibility
Community/3rd Party Support
Complexity Level of Format Complexity
Compression Degree to which Compression is Understood
Cost to Maintain Environment for Access and Processing
Developer/Corporate Support
Domain Specificity
Ease of Identification
Ease of Validation Accurate Validation
Error-tolerance
Expertise Available
Geographic Spread
Institutional Policies
Legal Restrictions Affecting Use Now or Long-Term
Lifetime
Metadata Support Descriptive Metadata Support; Technical Metadata Support
Rendering Software Available Quantity and Availability of Rendering Software
Revision Rate
Specifications Available
Specification Quality Degree to Which Specification is Complete and Understandable
Standardization Standardized
Storage Space Storage Requirements Relative to Other Similar Formats
Technical Dependencies Dependence on Particular HW/SW
Technical Protection Mechanism Support for Technical Protection Mechanisms
Ubiquity Widespread Use by Consumers; Widespread Use by Professionals
Value
Viruses Malware
Dependency on a Single Organization or Company
Archival Use
Ability to Encode in True Lossless Compression
Ability to Encode in Visually Lossless Compression
Max Chroma Subsampling
Max Resolution
Highest Bit Resolution
Highest Supported Bitrate
Compression Ratio
11. Format Analysis: Criteria
Availability online Browser Support
Backward/Forward Compatibility
Community/3rd Party Support
Complexity Level of Format Complexity
Compression Degree to which Compression is Understood
Cost to Maintain Environment for Access and Processing
Developer/Corporate Support
Domain Specificity
Ease of Identification
Ease of Validation Accurate Validation
Error-tolerance
Expertise Available
Geographic Spread
Institutional Policies
Legal Restrictions Affecting Use Now or Long-Term
Lifetime
Metadata Support Descriptive Metadata Support; Technical Metadata Support
Rendering Software Available Quantity and Availability of Rendering Software
Revision Rate
Specifications Available
Specification Quality Degree to Which Specification is Complete and Understandable
Standardization Standardized
Storage Space Storage Requirements Relative to Other Similar Formats
Technical Dependencies Dependence on Particular HW/SW
Technical Protection Mechanism Support for Technical Protection Mechanisms
Ubiquity Widespread Use by Consumers; Widespread Use by Professionals
Value
Viruses Malware
Dependency on a Single Organization or Company
Archival Use
Ability to Encode in True Lossless Compression
Ability to Encode in Visually Lossless Compression
Max Chroma Subsampling
Max Resolution
Highest Bit Resolution
Highest Supported Bitrate
Compression Ratio
High importance
Medium importance
Low importance
12. Format Analysis: Criteria
Cost to Maintain Environment for Access and Processing
Expertise Available
Legal Restrictions Affecting Use Now or Long-Term
Quantity and Availability of Rendering Software
Specifications Available
Dependence on Particular HW/SW
Widespread Use by Consumer
Widespread Use by Professionals
Dependency on a Single Organization or Company
Ability to Encode in True Lossless Compression
Ability to Encode in Visually Lossless Compression
Max Chroma Subsampling
Max Resolution
Highest Bit Resolution
Highest Supported Bitrate
Compression Ratio
13. Preferred Formats
• Archival formats
– Uncompressed in QT, 8 or 10 bit
– JPEG 2000 in MXF or QT, recommend lossless
– DV in QT, only if from DV tape), many variations
– MPEG-2 in MPEG-2 or QT
• Delivery formats
– H.264 in QT, many profiles
14. Accepted Formats
• Archival formats
– DNxHD in MXF or QT
– ProRes in QT
• Delivery formats
– Theora in QT or Matroska
15. Metadata Analysis
• Technical metadata
– Chose EBU Core 1.5 (aligns well with AES-60, structure
mirrors MediaInfo’s output)
– Considered PBCore
• Source metadata
– Chose a revised UTVideoSrc (native suitability to
physical media, right amount of detail)
– Considered EBU Core
• Process history
– Chose a revised reVTMD (specific, simple, sufficient)
16. Tool Analysis
• Chose: MediaInfo
– Raw output could map to metadata schemas
– Currently supported
– Widely adopted
• Others considered:
– ExifTool
– FFProbe
17. Video Content Model
OBJECT =
1 Object Descriptor
1..n Video Files
0..n Video Files
0..n Video Files
HAS_SOURCE
HAS_SOURCE
1 metadata file and
1 or more derivative
video files
18. Video Object & Auxiliary Objects
OBJECT
Content model = VIDEO
1 or more derivative video files:
FILE
...
FILE
...
FILE
OBJECT
Content model = TEXT
Object-level role= VIDEO EDIT DECISION LIST
1 text file
OBJECT
Content model = AUDIO
Object-level role= DOUBLE SYSTEM AUDIO
1 or more derivative audio files
OBJECT
Content model = TEXT
Object-level role= CLOSED CAPTION DATA
1 text file
OBJECT
Content model = TEXT
Object-level role= SUBTITLE DATA
1 text file
OBJECT
Content model = STILL IMAGE
Object-level role= POSTER FRAME
1 or more derivative image files
OBJECT
Content model = DISK IMAGE
Object-level role= (TBD)
TBD files)
HAS_DOCUMENTATION
HAS_LARGER_CONTEXT
HAS_SUPPLEMENT
HAS_SUPPLEMENT
28. Phase 1
• Video Reformatting Service
• Enhanced DRS to support:
– Ingest of Video
• Enhance FITS to identify formats, extract metadata
– Metadata editing
– Video storage & preservation
– Basic video delivery service
29. Phase 2
• Citations
• User annotations
• Closed captioning
• Multi-lingual audio
• Descriptive audio
• Playlists created by faculty, students, librarians
• Other deposit streams (e.g. from Kaltura to the
DRS)
31. Hydra-Blacklight: An Open
Source Stack for AV
Preservation (and More)
December 2014
Tom Cramer
Chief Technology Strategist
Stanford University Libraries
@tcramer
32. What Is Hydra?
• A robust repository fronted by feature-rich,
tailored applications and workflows (“heads”)
➭ One body, many heads
• Collaboratively built “solution bundles” that
can be adapted and modified to suit local
needs.
• A community of developers and adopters
extending and enhancing the core
➭ If you want to go fast, go alone. If you
want to go far, go together.
33. Fundamental Assumption #1
No single system can provide the full range
of repository-based solutions for a given
institution’s needs,
…yet sustainable solutions require a
common repository infrastructure.
42. A Note on Ruby on Rails
• Rapid application development for web
applications: “Convention over configuration”
– 10x productivity
• Supportable: MVC (Model-View-Controller) and
Rails framework make code well-structured,
predictable
• Testable: Rspec and Cucumber give powerful,
automatable, testing tools
• Learnable: Stanford went from 1 to 8 Ruby savvy
developers in one year (no new hires)
– 1 week learning curve to basic proficiency
43. A Note on Fedora
• Flexible, Extensible, Durable Object
Repository Architecture
– Flexible: model and store any content types
– Extensible: easy to augment with apps and services
– Durable: foundation of preservation repository
• Proven, sustained and successful digital
repository
– 100’s of adopters; 13 years of development, 4 releases
– Vibrant community & funding under DuraSpace
• Fedora 4.0 released this month;
co-evolving with Hydra
44. Fedora 4 Preservation-Friendly Feature Set
• Auditing, versioning & fixity services
• Clustering & scalability
• Event-driven architecture
• Advanced storage capabilities
– Including support for very large files
• “Projection” over remote file stores
• Native RDF support
45. A Note on Blacklight
• Repository-agnostic, feature-rich, content-
aware, turnkey access interface
• Vibrant, multi-institutional, open source
community on its own
• Can be used independently, or as the first
component of, Hydra
• 100s of adopters worldwide; ~450 members of
the blacklight-development list
46. Rock & Roll Hall of Fame: Blacklight for Catalog, EAD and Media
51. Fundamental Assumption #2
No single institution can resource the
development of a full range of solutions on
its own,
…yet each needs the flexibility to tailor
solutions to local demands and workflows.
52. Hydra Philosophy -- Community
• An open architecture, with many
contributors to a common core
• Collaboratively built “solution bundles” that
can be adapted and modified to suit local
needs
• A community of developers and adopters
extending and enhancing the core
• “If you want to go fast, go alone. If you
want to go far, go together.”
One body, many heads
53. Community
• Conceived & executed as a distributed, collaborative,
open source effort from the start
• Initially a joint development project between Stanford,
Univ of Virginia, and Univ of Hull
• Hydra Partners are the backbone of the project
• Coalition of the willing
• No fees or dues
• Apache-style consensus and governance
• Steering Group provides administration, continuity, and
serves as backstop when needed
• But no central planning, no Project Director, no “one” architect
54. Hydra Partners…
…are individuals, institutions, corporations or
other groups that have committed to contributing
to the Hydra community; they not only use the
Hydra technical framework, but also add to it in
at least one of many ways: code, analysis,
design, support, funding, or other resources.
Hydra Partners collectively advance the project
and the community for the benefit of all
participants.
https://wiki.duraspace.org/display/hydra/Hydra+Community+Framework
55.
56. Code Licensing
• All Hydra code is available under Apache
License, Version 2.0
• All code commitments are being managed
through Contributor License Agreements
• Individual – so each developer is clear about
what they are contributing
• Corporate – so each institution is clear about
what it is contributing
• Code contributors maintain ownership of
their IP
• …and grant a non-exclusive license to the project
and its users
67. HydraDAM2
• NEH just funded Indiana & WGBH for 2nd
round of HydraDAM development
• Exercise HydraDAM on Fedora 4
• RDF-based data models
• Flexible storage
• Integrate HydraDAM (back-end) with Avalon and
OpenVault (front-ends)
• Integrate with mass digitization workflows
• 2 year effort
69. NISO Webinar • December 17, 2014
Questions?
All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2014/webinars/av_preservation/
NISO Two-Part Webinar
Sustainable Information, Part 2:
Digital Preservation for Audio Visual Content
70. Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU