SlideShare ist ein Scribd-Unternehmen logo
1 von 3
Downloaden Sie, um offline zu lesen
Presentation – Victoria Sloyan – 07/07/2010

                            Archiving Digital Audio Files

Introduction

Hello and welcome to my presentation “Archiving digital audio files”.

I am the trainee for the futureArch Project, which aims to expand the Bodleian’s
capability to archive and preserve born-digital material. As part of this we have been
developing a procedure for extracting digital files from deposited media such as
floppy disks and moving them into our secure repository. This procedure involves the
use of forensic imaging equipment to create bit-by-bit duplicates of files, but it is only
suitable for data files. Therefore my project brief was to research an effective way of
extracting and preserving audio files.

Audio material requires its own method for three reasons:
   1. Unlike data disks, audio CDs do not hold the audio data within a file system,
      and so forensic imaging kits have difficulty creating an image file.
   2. Audio files have different metadata. Some of the metadata fields are the same,
      such as ‘title’ and ‘creation date’, but other metadata is specific to audio files,
      such as ‘duration’ and ‘file format’.
   3. The final difference involves delivery to users: the way you access a word
      document is different from the way you access a sound recording.

Therefore, the overall aim of this project can be divided into three objectives:
   1. First is selecting the most appropriate format to store the audio in. There are
       many formats available such as MP3, FLAC and WAVE files and many
       others. So, the first decision to be made was which to use.
   2. Secondly, since imaging audio disks does not work I had to find a way to
       extract the files from their original media and move them onto the secure
       server.
   3. Finally, I had to devise an effective way of delivering digital audio to users.


Selecting a format

The most important consideration when choosing a format is finding one which is
uncompressed. Basically, formats fall into three categories: uncompressed formats
are, as the name suggests, uncompressed, which means sound and silence are encoded
at the same bit/time rate. Lossless compression is where the file is compressed to
shrink the file size, but done without reducing the sound quality. This is typically
achieved by compressing any silence. Lossy compression is where the whole file is
compressed. This can significantly reduce the file size, though often the reduction in
quality is unnoticeable by ear. For archiving purposes you want an exact replica of the
original data, therefore an uncompressed format must be used.

The second consideration is to find an open source format in order to aid accessibility,
because open source means you do not have to worry about licensing and legal issues.
Also, a standardised format is preferred, as this tends to mean it has been thoroughly
reviewed and is more likely to be longer lasting.

After evaluating all the formats I concluded the best one to use is WAVE. Crucially it
is an uncompressed format, also it is open source and standardised and it is
recommended by the International Association of Sound and Audiovisual Archives,
the Library of Congress and the British Library Sound Archive.


Capture audio and extract metadata

So, once the format was decided I needed a way of extracting files from a CD and the
best way is one you may well be familiar with: ripping, although the technical term
for it is ‘digital audio extraction’. However, normal ripping, like that done by
Windows Media Player is what is known as ‘fast ripping’, whereas for archiving we
wanted a ‘secure ripper’. The main difference between the two is that secure rippers
perform various validation tests to ensure maximum accuracy.

There are quite a few secure ripping programmes available but the one I decided to
use is Exact Audio Copy. It is well regarded by audio professionals, it works with
Windows OS, and it is free to download and rips to WAVE.

Here is a screen shot of EAC.
   • Down the left-hand side you can see the available ripping options including
        ripping as a MP3 and burning to a CD, but for my project I was only interested
        in the first icon – ripping as a WAVE file.
   • Along the toolbar there are various options to further specify the format. For
        instance you can state whether the recording is ripped in mono or stereo.
   • You can also specify the sample rate to use, although the maximum sample
        rate available for WAVE files is 44.1 kHz. This is because Red Book Audio
        states that audio on CDs should be recorded at a sample rate of 44.1 kHz with
        a 16 bit-depth, thus there is little benefit to ripping audio at a higher rate.
        Moreover, from an archiving perspective, the British Library stipulates that
        audio transferred from one medium to another should retain the same sample
        rate.

Once EAC has ripped the files it will produce a report recording the rip result. Under
each track it will say either OK or Finished. If it says Finished you know the rip was
achieved but the resulting file is not identical to the original. This could occur for
several reasons, the most common being if the disk is dirty or scratched.

Once the disk has been ripped, metadata needs recording in a spreadsheet. This is
based on the futureArch project’s metadata spreadsheet for digital files, but has been
modified to suit audio material. Here you can see all the fields that need to be
completed. The ones in bold are ones that have been added for audio files.


Delivering audio
Archives exist not only to preserve material, but to also make it available to
researchers. Therefore, audio files need to be delivered in an efficient and effective
way. There are two potential problems with audio files. Firstly, they can be extremely
large, particularly if they are in an uncompressed format and secondly, the quality of
the recording can be quite poor. So, in order to combat this two issues I created two
versions of each file: I already had the master file so I processed this to create a
processed WAVE file and an optimised MP3 file. The table illustrates the intended
use for each derivative. MP3 files are lossy; therefore the file size is significantly
reduced.

Processing was done using Audacity, which looks like this. The processing done to a
file will depend on its content and quality, but the tools I most often used were:
     • Silencing and cutting to trim the beginnings and ends of recordings and
         remove long pauses.
     • Noise removal tool to either remove or reduce the volume of background
         noise, like high pitched hissing on poor quality recordings.
It is very important to record every change that is made to the master file, so I created
a Process History Spreadsheet to record these changes. This includes the name of the
original file (22cd), all actions done to it in detail (such as two second noise cut at
22:54) and the name of the resulting file (22cd_mp3).

After processing was finished each file was exported from Audacity, first as in the
WAVE format and then the MP3 format.

Once the files are processed the MP3 versions and possibly the processed WAVE
files can be made available to listen to in the reading room. The master WAVE files
will be stored in the repository and will not be touched be users. All digital material,
both data and audio, will be accessed via a specific laptop in the reading room. This
laptop will have a specially designed interface similar in feel to an internet browser
and audio will be streamed, so accessing audio will be a similar experience to using
something like MySpace.


Conclusion

So, to sum up very briefly, if we go back to the three aims you can see I’ve pretty
much answered them:
    1. The best format to use is WAVE
    2. The way to capture audio is by securely ripping it
    3. Audio will be processed and compressed and will be accessed by streaming
        the files through a self-contained interface within the reading room.

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Chap67
Chap67Chap67
Chap67
 
Chap66
Chap66Chap66
Chap66
 
Chap72&73
Chap72&73Chap72&73
Chap72&73
 
Chap62
Chap62Chap62
Chap62
 
Sound Formats
Sound FormatsSound Formats
Sound Formats
 
Chap70
Chap70Chap70
Chap70
 
Ppt on audio file formats
Ppt on audio file formatsPpt on audio file formats
Ppt on audio file formats
 
Codecs
CodecsCodecs
Codecs
 
Ig2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copyIg2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copy
 
Audio Compression
Audio CompressionAudio Compression
Audio Compression
 
Ig2task1worksheet
Ig2task1worksheetIg2task1worksheet
Ig2task1worksheet
 
IG2 Task 1
IG2 Task 1 IG2 Task 1
IG2 Task 1
 
Ig2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copyIg2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copy
 
Ig2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copyIg2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copy
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Digital audio
Digital audioDigital audio
Digital audio
 
Digital audio formats
Digital audio formatsDigital audio formats
Digital audio formats
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Bl ig2 url edit
Bl ig2 url editBl ig2 url edit
Bl ig2 url edit
 

Andere mochten auch

A Recipe To Add Social Media To Your Marketing Mix
A Recipe To Add Social Media To Your Marketing MixA Recipe To Add Social Media To Your Marketing Mix
A Recipe To Add Social Media To Your Marketing MixDebkanyaD
 
6 10 10 Revised Marcs Published Work Presentation (2)[1]
6 10 10 Revised Marcs Published Work Presentation (2)[1]6 10 10 Revised Marcs Published Work Presentation (2)[1]
6 10 10 Revised Marcs Published Work Presentation (2)[1]thompm60
 
Vodaplex ppt presentation 060110
Vodaplex ppt presentation 060110Vodaplex ppt presentation 060110
Vodaplex ppt presentation 060110eagle472
 
Project showcase handout
Project showcase handoutProject showcase handout
Project showcase handoutOxford Trainees
 

Andere mochten auch (8)

Helen m
Helen mHelen m
Helen m
 
A Recipe To Add Social Media To Your Marketing Mix
A Recipe To Add Social Media To Your Marketing MixA Recipe To Add Social Media To Your Marketing Mix
A Recipe To Add Social Media To Your Marketing Mix
 
6 10 10 Revised Marcs Published Work Presentation (2)[1]
6 10 10 Revised Marcs Published Work Presentation (2)[1]6 10 10 Revised Marcs Published Work Presentation (2)[1]
6 10 10 Revised Marcs Published Work Presentation (2)[1]
 
Victoria
VictoriaVictoria
Victoria
 
Vodaplex ppt presentation 060110
Vodaplex ppt presentation 060110Vodaplex ppt presentation 060110
Vodaplex ppt presentation 060110
 
Sarah h
Sarah hSarah h
Sarah h
 
Sam
SamSam
Sam
 
Project showcase handout
Project showcase handoutProject showcase handout
Project showcase handout
 

Ähnlich wie Victoria presentation notes

Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheetluisfvazquez1
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work Sheetwallinplanet
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work SheetNathan_West
 
Sound recording glossary improved
Sound recording glossary improvedSound recording glossary improved
Sound recording glossary improvedItsLiamOven
 
anthony is Audio formats
anthony is Audio formatsanthony is Audio formats
anthony is Audio formatshaverstockmedia
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheetthomasmcd6
 
Ig2 task 1 work sheet (glossary) steph hawkins revisited
Ig2 task 1 work sheet (glossary) steph hawkins revisitedIg2 task 1 work sheet (glossary) steph hawkins revisited
Ig2 task 1 work sheet (glossary) steph hawkins revisitedstephlizahawkins123
 
Ig2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkinsIg2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkinsstephlizahawkins123
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossaryamybrockbank
 
Sound recording glossary improved mk2
Sound recording glossary improved mk2Sound recording glossary improved mk2
Sound recording glossary improved mk2davidhall1415
 
Ian definitions 3rd try 2
Ian definitions 3rd try 2Ian definitions 3rd try 2
Ian definitions 3rd try 2thomasmcd6
 
Jordan smith ig2 task 1 revisited v2
Jordan smith ig2 task 1 revisited v2Jordan smith ig2 task 1 revisited v2
Jordan smith ig2 task 1 revisited v2JordanSmith96
 
Jordan smith ig2 task 1 revisited
Jordan smith ig2 task 1 revisitedJordan smith ig2 task 1 revisited
Jordan smith ig2 task 1 revisitedJordanSmith96
 
Beginning html5 media, 2nd edition
Beginning html5 media, 2nd editionBeginning html5 media, 2nd edition
Beginning html5 media, 2nd editionser
 

Ähnlich wie Victoria presentation notes (20)

Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work Sheet
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work Sheet
 
Sound recording glossary improved
Sound recording glossary improvedSound recording glossary improved
Sound recording glossary improved
 
Ig2 task 1 work sheet (1)
Ig2 task 1 work sheet (1)Ig2 task 1 work sheet (1)
Ig2 task 1 work sheet (1)
 
Chap12
Chap12Chap12
Chap12
 
anthony is Audio formats
anthony is Audio formatsanthony is Audio formats
anthony is Audio formats
 
Audio formats
Audio formatsAudio formats
Audio formats
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 
Sound Recording Glossary
Sound Recording GlossarySound Recording Glossary
Sound Recording Glossary
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Ig2 task 1 work sheet (glossary) steph hawkins revisited
Ig2 task 1 work sheet (glossary) steph hawkins revisitedIg2 task 1 work sheet (glossary) steph hawkins revisited
Ig2 task 1 work sheet (glossary) steph hawkins revisited
 
Ig2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkinsIg2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkins
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 
Sound recording glossary improved mk2
Sound recording glossary improved mk2Sound recording glossary improved mk2
Sound recording glossary improved mk2
 
Ian definitions 3rd try 2
Ian definitions 3rd try 2Ian definitions 3rd try 2
Ian definitions 3rd try 2
 
Jordan smith ig2 task 1 revisited v2
Jordan smith ig2 task 1 revisited v2Jordan smith ig2 task 1 revisited v2
Jordan smith ig2 task 1 revisited v2
 
Call audio
Call audioCall audio
Call audio
 
Jordan smith ig2 task 1 revisited
Jordan smith ig2 task 1 revisitedJordan smith ig2 task 1 revisited
Jordan smith ig2 task 1 revisited
 
Beginning html5 media, 2nd edition
Beginning html5 media, 2nd editionBeginning html5 media, 2nd edition
Beginning html5 media, 2nd edition
 

Mehr von Oxford Trainees (8)

Laurel Burn
Laurel BurnLaurel Burn
Laurel Burn
 
Laurel Burn project handout
Laurel Burn  project handoutLaurel Burn  project handout
Laurel Burn project handout
 
User education at the law bod
User education at the law bodUser education at the law bod
User education at the law bod
 
Lucy
LucyLucy
Lucy
 
Helen
HelenHelen
Helen
 
Charlotte
CharlotteCharlotte
Charlotte
 
Jess
JessJess
Jess
 
Alice And Susan
Alice And SusanAlice And Susan
Alice And Susan
 

Kürzlich hochgeladen

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Kürzlich hochgeladen (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Victoria presentation notes

  • 1. Presentation – Victoria Sloyan – 07/07/2010 Archiving Digital Audio Files Introduction Hello and welcome to my presentation “Archiving digital audio files”. I am the trainee for the futureArch Project, which aims to expand the Bodleian’s capability to archive and preserve born-digital material. As part of this we have been developing a procedure for extracting digital files from deposited media such as floppy disks and moving them into our secure repository. This procedure involves the use of forensic imaging equipment to create bit-by-bit duplicates of files, but it is only suitable for data files. Therefore my project brief was to research an effective way of extracting and preserving audio files. Audio material requires its own method for three reasons: 1. Unlike data disks, audio CDs do not hold the audio data within a file system, and so forensic imaging kits have difficulty creating an image file. 2. Audio files have different metadata. Some of the metadata fields are the same, such as ‘title’ and ‘creation date’, but other metadata is specific to audio files, such as ‘duration’ and ‘file format’. 3. The final difference involves delivery to users: the way you access a word document is different from the way you access a sound recording. Therefore, the overall aim of this project can be divided into three objectives: 1. First is selecting the most appropriate format to store the audio in. There are many formats available such as MP3, FLAC and WAVE files and many others. So, the first decision to be made was which to use. 2. Secondly, since imaging audio disks does not work I had to find a way to extract the files from their original media and move them onto the secure server. 3. Finally, I had to devise an effective way of delivering digital audio to users. Selecting a format The most important consideration when choosing a format is finding one which is uncompressed. Basically, formats fall into three categories: uncompressed formats are, as the name suggests, uncompressed, which means sound and silence are encoded at the same bit/time rate. Lossless compression is where the file is compressed to shrink the file size, but done without reducing the sound quality. This is typically achieved by compressing any silence. Lossy compression is where the whole file is compressed. This can significantly reduce the file size, though often the reduction in quality is unnoticeable by ear. For archiving purposes you want an exact replica of the original data, therefore an uncompressed format must be used. The second consideration is to find an open source format in order to aid accessibility, because open source means you do not have to worry about licensing and legal issues.
  • 2. Also, a standardised format is preferred, as this tends to mean it has been thoroughly reviewed and is more likely to be longer lasting. After evaluating all the formats I concluded the best one to use is WAVE. Crucially it is an uncompressed format, also it is open source and standardised and it is recommended by the International Association of Sound and Audiovisual Archives, the Library of Congress and the British Library Sound Archive. Capture audio and extract metadata So, once the format was decided I needed a way of extracting files from a CD and the best way is one you may well be familiar with: ripping, although the technical term for it is ‘digital audio extraction’. However, normal ripping, like that done by Windows Media Player is what is known as ‘fast ripping’, whereas for archiving we wanted a ‘secure ripper’. The main difference between the two is that secure rippers perform various validation tests to ensure maximum accuracy. There are quite a few secure ripping programmes available but the one I decided to use is Exact Audio Copy. It is well regarded by audio professionals, it works with Windows OS, and it is free to download and rips to WAVE. Here is a screen shot of EAC. • Down the left-hand side you can see the available ripping options including ripping as a MP3 and burning to a CD, but for my project I was only interested in the first icon – ripping as a WAVE file. • Along the toolbar there are various options to further specify the format. For instance you can state whether the recording is ripped in mono or stereo. • You can also specify the sample rate to use, although the maximum sample rate available for WAVE files is 44.1 kHz. This is because Red Book Audio states that audio on CDs should be recorded at a sample rate of 44.1 kHz with a 16 bit-depth, thus there is little benefit to ripping audio at a higher rate. Moreover, from an archiving perspective, the British Library stipulates that audio transferred from one medium to another should retain the same sample rate. Once EAC has ripped the files it will produce a report recording the rip result. Under each track it will say either OK or Finished. If it says Finished you know the rip was achieved but the resulting file is not identical to the original. This could occur for several reasons, the most common being if the disk is dirty or scratched. Once the disk has been ripped, metadata needs recording in a spreadsheet. This is based on the futureArch project’s metadata spreadsheet for digital files, but has been modified to suit audio material. Here you can see all the fields that need to be completed. The ones in bold are ones that have been added for audio files. Delivering audio
  • 3. Archives exist not only to preserve material, but to also make it available to researchers. Therefore, audio files need to be delivered in an efficient and effective way. There are two potential problems with audio files. Firstly, they can be extremely large, particularly if they are in an uncompressed format and secondly, the quality of the recording can be quite poor. So, in order to combat this two issues I created two versions of each file: I already had the master file so I processed this to create a processed WAVE file and an optimised MP3 file. The table illustrates the intended use for each derivative. MP3 files are lossy; therefore the file size is significantly reduced. Processing was done using Audacity, which looks like this. The processing done to a file will depend on its content and quality, but the tools I most often used were: • Silencing and cutting to trim the beginnings and ends of recordings and remove long pauses. • Noise removal tool to either remove or reduce the volume of background noise, like high pitched hissing on poor quality recordings. It is very important to record every change that is made to the master file, so I created a Process History Spreadsheet to record these changes. This includes the name of the original file (22cd), all actions done to it in detail (such as two second noise cut at 22:54) and the name of the resulting file (22cd_mp3). After processing was finished each file was exported from Audacity, first as in the WAVE format and then the MP3 format. Once the files are processed the MP3 versions and possibly the processed WAVE files can be made available to listen to in the reading room. The master WAVE files will be stored in the repository and will not be touched be users. All digital material, both data and audio, will be accessed via a specific laptop in the reading room. This laptop will have a specially designed interface similar in feel to an internet browser and audio will be streamed, so accessing audio will be a similar experience to using something like MySpace. Conclusion So, to sum up very briefly, if we go back to the three aims you can see I’ve pretty much answered them: 1. The best format to use is WAVE 2. The way to capture audio is by securely ripping it 3. Audio will be processed and compressed and will be accessed by streaming the files through a self-contained interface within the reading room.