5. • An unprecedented and historic collection
of American public radio and television
content
• Dates back through the 1950s
• Preserved and made available to the
public
What is it?
9. • 40,000 hours of digital material
initially from over 100 stations
– 5,000 hours from born digital files
• 2.5 million inventory records from
120 stations
• Identified over 3 million items kept
at stations, archives, producers,
university collections across the
country
Initial
Collection
10. Challenges of born digital media
• Varying file formats
• File failure when moving the files from one
location to another – file corruption
• File naming
• Wrappers, codecs, bits, and more
11. Current Status
• Accession the 40,000 hours of digitized
files into the LOC systems
• Launched a website for public access to
the 2.5 million records from the inventory
project
• Public access to all proxy files on location
at WGBH and LOC
12. Launched New Projects
• NET Catalog
– Build a national inventory of titles created for
National Educational Television (pre-PBS)
• National Digital Stewardship Residency
– Expand the NDSR program to include
geographically diverse residencies at
organizations with pubic media collections
13. Newly Funded
• Collaboration with Pop-Up Archive
– Create transcripts of all 40,000 hours using
speech to text tools
– Create metadata games to improve
transcripts via crowdsourcing
– Create audio fingerprint database
14. Long term goals
• Grow the collection by adding new
inventory records and digitized materials
• Help public media organizations with
archiving, digitizing, and access to their
collections
• Build a consortium for preservation and
access of public media archive content
15. And PBCore!
• Responsible for further development of
PBCore
• AMIA PBCore Advisory Subcommittee
– Schema
– Documentation
– Education
– Website
– Communications
16.
17. PBCore
• Continue the development of PBCore as a
standard for media materials
• Re-engage the PBCore community for input
in the continued development
• Out reach to new adopters of PBCore
• Collaborate with EBUCore to develop an
RDF ontology for PBCore
18. Next steps
• Allow public on-line access to as much as
possible, rights permitting, to the proxy files
– Check out website Nov 1!!!!
• Develop Sustainability plan
– Fundraise!
• Enhance website
22. Website Development
• Initial launch in April 2015, with access to 2.5
million inventory records
• Responsive design, accessible on desktop, mobile
and tablet
• Further developing site to allow for streaming
video and audio in our Online Reading Room
• Now, all video and audio is accessible at WGBH &
Library of Congress through implementation of IP
restrictions
28. Archival Management System
(AMS) github.com/avpreserve/ams
• Developed by AVPreserve
• LAMP stack application
• Licensed under GPLv3
• Create, store and manage descriptive, technical and preservation
metadata
• Stations can log-in and access their metadata and stream content
• Digitization vendor sends technical metadata to server using BagIt
specification; preservation metadata sent to server through Google
Spreadsheets API
• Provides PBCore and CSV import and export
• PBCore and PREMIS REST API
29. To fully catalog the AAPB collection, it would take one person
32 years working full time. So now we’re focusing on
normalization & MVC.
• AAPB Phases of Cataloging
– Inventory (item-level)
– Normalization: formatting dates, titles, splitting out types of titles;
formatting existing data at a high level in CSV
– Minimum Viable Cataloging (MVC): one-by-one cataloging, 15
minutes per item, spend most time reviewing opening and closing
credits; adding dates, titles, creators, contributors, publishers,
copyright information, topic, genre (format), and copyright
information. This will take approx. 6 years.
– Full Cataloging: I’ll come back as a volunteer and do this after I retire
30. Metadata
• Titles
• Contributing
Organization
• Identifiers
• Description
• Date
• Asset Type
• Genres
• Creators
• Contributors
• Publishers
• Media Type
• Copyright
• Duration
This is the
metadata we
expose to
users on the
public-facing
website at
americanarchiv
e.org.
37. Steps to Determining ORR Access
• October 2014 Rights Meeting: reviewed many of the challenges
facing access to moving image and sound archives. Read about it
here:
https://americanarchivepb.wordpress.com/2014/11/04/rightsme
eting/
• Reviewed existing agreements with stations
• Sent quit claims to donors – 75% signed
• Series-level “bucket analysis” – identified buckets, or categories, of
content, i.e., news magazine, live music performance, interviews
with public figures, event coverage, etc.
Decision: ORR or On Location
40. Access and Use:
Place
• Massachusetts: 27%
• Maryland: 11%
• Minnesota: 8%
• New York: 7%
• California: 6%
• New Jersey: 5%
• Montana: 4%
• South Carolina: 4%
• Illinois: 4%
• Wisconsin: 3%
• North Carolina: 2%
• Connecticut: 2%
• 38 states and 2 non-
states (DC and Guam)
submitted materials for
digitization
• Average percentage of
total per state (and non-
states): 1.5%
• 12 states (on right)
contributed above
average amounts
• 12 states did not
participate
41. Access and Use:
Place Region Assets %
• Northeast 19,202
31.1
• Mid-Atlantic 14,974 24.2
• Midwest 11,262 18.2
• West 8,187 13.3
• South 7,545 12.2
• Pacific (AK, HI, GU) 905
1.5
By Region
42. Access and Use: Time
Dates of creation or broadcast in 45% of 62,000 records
• nearly 1,000 files from the 1950s
• around 3,400 from the 1960s
• 2,900 from the 1970s
• 6,300 from the 1980s
• 7,800 from the 1990s
• and 6,800 from the 21st century
43. AAPB
• national history
• regional history
• local history
• news
• public affairs
• civic affairs
• religion
• education
• environmental issues
• music
• art
• literature
• filmmaking
• dance
• poetry
documents
44. AAPB can be of value for scholarship because of ...
Geographical breadth
• to uncover ways that
national and global
processes played out on
the local scene
Chronological reach
• to document change (or
stasis) over time
45. Scholars’
• “I have long been frustrated ... gaining
access to the vast audiovisual record of my
period”
• “content [is] held in relative obscurity by the
TV and radio networks and the public TV
stations”
• “programs remain almost impossible to
locate and access ... locked in the collections
of its many member stations”
• “Working to document recent American
history without access to the pictures has
been a real challenge”
• “key historical moments and events are lost
to us forever”
complaints
46. AAPB can be of value for scholarship because ...
• scholarship pertaining to the period of 1973 onwards is “limited,
fragmentary, and politically conflicted”
• for the 1980s, “the archival and monographic work … has not yet
been done”
• accounts about the 1990s and later have “not really been history”
Kim Phillips-Fein, “1973 to the Present,”
in American History Now (2011)
47. The Importance of
Local History ... • “emphasis on diversity”
• “the history of the nation is many
different stories, no one of which
can be considered the ‘main’
story”
• a “skepticism about finding
common definitions of American
nationalism or discovering
common values” among many
historians of the 1960s and 1970s
History from the
bottom up
(quotes from
Alan Brinkley)
48. The Importance of Local History for ...
• relating “national experiences to larger processes and
local resolutions.”
Thomas Bender
Rethinking American History in a Global Age (2002)
49. The Importance of Local Civil Rights History ...
• “publication of local and state studies ... marked a major shift in the
field”
• “called into question many of the top-down generalizations”
• “studying the importance of the movement’s local, indigenous base
fundamentally alters our picture of the movement and its
significance”
• a “bottom-up perspective” can expose “students to a world beyond
their immediate experience”
Emilye Crosby, ed.
Civil Rights History from the Ground Up (2011)
50. Civil Rights Movement
• Charleston, SC 1960
• Tallahassee, FL 1959-61
• McComb, MS 1961
• Baton Rogue, LS 1961
• Monroe, NC 1961
• Birmingham, AL 1963
• Yellow Springs, OH 1963
• Savannah, GA 1963
• St. Augustine, FL 1964
• MS Freedom Summer 1964
• Natchez, MS 1965
radio stories in AAPB from
...
51. Improving Scholarly Access to AAPB
• Create transcripts
• Enlist subject specialists to determine
searchable topics, key words, and phrases
• Enlist IT specialists to query transcripts using
subject specialist vocabulary
• Enhance display to relate programs from a
variety of localities and time periods
Editor's Notes
It’s a collection of radio and tv materials created by or for public tv and radio in the US dating back to the 1950’s to be preserved for historical purposes and for access by the public.
Who are we? WGBH is Boston’s Public television station. We produce fully one third of the content broadcast on PBS, including the series you see here, as well as Downton Abbey and Sherlock. In addition to television, we have 2 radio stations and a large, award winning Interactive department that is the number one producer for the sites you’ll find on PBS.org. As you can see, we produce a wide variety of programming from public affairs, to history and science, to children’s program, arts, culture, drama and how to’s. We have been on the air since 1951 with radio and 1955 with television.
At heart and through our mission we are an educational and cultural institution. We originated out of a consortium of academic universities in the Boston area. Because we have produced so much we have a large archive of educational programming that is of interest to scholars and researchers, in addition to the public.
Project is a collaboration between the Library of Congress and WGBH. The Library will oversee the long term preservation of the digital files.
Our vision
CPB then funded a large digitization project. Stations that participated in the inventory had the opportunity to choose items to be digitized – items important to them, or items that the only way they might find out what it is is by digitizing and watching or listening to it. CPB chose a single vendor – Crawford Media – to do all the digitization. Tapes were sent to Crawford in Atlanta. In addition about 5,000 hours of already digital content was identified to be added to the collection. So in the end, the initial collection consists of 40,000 hours of content from about 97 organizations. In 2013 CPB chose the collaboration between the Library of Congress and WGBH to be the future stewards of the Archive.
Roughly 5,000 hours of the 40,000 are not tapes that were digitized but already digital files. CPB wanted to test and develop a workflow for collection growth as digital files. Since these files were coming from a number of different sources, this really was a difficult task and we are still working out some of the kinks. The digital files were transcoded into the format that all the analog to digital tapes were in. At WGBH pulling digital files from our DAM system and various sources, we had a 50% file failure rate. We tracked down a number of sources of the file corruption and corrected for it. But in the end, there are a number of files that we didn’t know where the corruption happened because no check sum had been created at the beginning. So it was impossible to tell where the file became corrupted. A lesson – make sure you check sum or fixity check your files before moving them anywhere and each time they are moved.
NET was the precursor to PBS and about 30 stations were part of a loose network of distributing programs. Many of the programs are on film or 2” or 1” and are at risk of becoming obsolete or deteriorating. Collections of these titles produced during this time are spread around the country. With this project we aim to build a national title catalog, and identify where copies exist so that we can collaborate on how best to preserve best copy.
The NDSR project is an extension of the current NDSR fellowship programs in Wash DC, NYC and Boston. The program aims to give recent graduates a paid fellowship position and more specific education in digital preservation, and hands on experience at a host institution working on a digital preservation project. It is a paid fellowship position. The AAPB NDSR program will place 8 paid fellows at institutions across the country who curate, collect. preserve, or create public media content and will focus on digital preservation projects for A/V materials.
As part of the American Archive we are also revitalizing PBCore – yes that’s a media data schema geared toward a/v collections.
As part of the American Archive we are also revitalizing PBCore – yes that’s a media data schema geared toward a/v collections.
Initial launch in February, access 2.5 million records, further developing site to allow streaming video. At this point, video is accessible on location at WGBH and the Library of Congress by implementing ip restrictions. In developing the site, we have one digital archive developer and a designer. Responsive design, accessible on desktop, mobile and tablet. Most of the users so far have accessed through desktop. Featuring content on homepage that 1) has good records, available online,
Initial launch in February, access 2.5 million records, further developing site to allow streaming video. At this point, video is accessible on location at WGBH and the Library of Congress by implementing ip restrictions. In developing the site, we have one digital archive developer and a designer. Responsive design, accessible on desktop, mobile and tablet. Most of the users so far have accessed through desktop. Featuring content on homepage that 1) has good records, available online,
AMS – metadata creation and management, stations can log in and access their material, stores technical and preservation metadata, allows digitization vendor to import technical and preservation metadata, provides PBCore and PREMIS REST API
AMS – metadata creation and management, stations can log in and access their material, stores technical and preservation metadata, allows digitization vendor to import technical and preservation metadata, provides PBCore and PREMIS REST API