2. Are there standards for digitization or digital archiving?
Yes, but limited to certain aspects only.
3. ISO/TR 13028:2010 -Information and documentation -Implementation guidelines for digitization of records.
Not applicable to: technical specifications for the digital capture of records; technical specifications for the long-term preservation of digital records; or digitization of existing archival holdings for preservation purposes, etc.
4. ISO/TR 19005-1:2005; ISO/TR 19005- 2:2011; ISO/TR 19005-3: 2012, (underdevelopment) -Document management -Electronic document file format for long-term preservation.
Specifies how to use the Portable Document Format (PDF) for long-term preservation of electronic documents.
Standard is known as PDF/A.
5. Unlike preservation microfilming and photocopying, there are no formal standards that govern the capture, processing, and storage of digital images.
There are, however, a number of projects and publications that have set forth best practices for creating high-quality digital images, access systems, and storage systems.
6. Also known as imaging or scanning, is the means of converting hard-copy, or non- digital, records into digital format.
Hard-copy or non-digital records include audio, visual, image or text.
Digitization may also be undertaken by taking digital photographs of the source records, where appropriate.
Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
7. A process by which digital data is preserved in digital form in order to ensure the usability, durability and intellectual integrity of the information contained therein.
A more precise definition is: the storage, maintenance, and accessibility of a digital object over the long term, usually as a consequence of applying one or more digital preservation strategies.
These strategies may include technology preservation, technology emulation or data migration.
Source: The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials (2002).
8. Born Digital -Digital materials which are created and retained in digital form.
May or may not have a non-digital equivalent.
Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
9. Digital Repository / Archive -a digital repository is where digital content, assets, are stored and can be searched and retrieved for later use.
A repository supports mechanisms to import, export, identify, store and retrieve digital assets.
Putting digital content into a repository enables staff and institutions to then manage and preserve it, and therefore derive maximum value from it.
Digital repositories may include research outputs and journal articles, theses, elearningobjects and teaching materials or research data.
Source: Digital Repositories: Helping universities and colleges. JISC, August 2005.
10. Master-A faithful digital reproduction of a document, optimized for longevity and for production of a range of delivery versions (derivatives).
Masters are captured at the highest practicable quality or resolution and stored for long-term usage.
Typically, masters are stored in an off-line mode on tape or CD and are accessed only for the production of derivative images.
Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
11. Derivative-an image created from the master image, through some kind of image editing process to create a user or working copy.
The process usually involves a loss of information to reduce the size by sampling it to a lower resolution, using lossycompression techniques, or altering an image using image processing techniques.
Typically, derivatives are made for purposes such as web access, including “thumbnail” images, or as “reference” or “service” images that should fit completely within an average monitor.
Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
12. Digital images-electronic snapshots taken of a scene or scanned from documents, such as photographs, manuscripts, printed texts, and artwork.
The digital image is sampled and mapped as a grid of dots or picture elements (pixels).
Each pixel is assigned a tonal value (black, white, shades of gray or color), which is represented in binary code (zeros and ones).
13. Resolution -a measure of the ability to capture detail in the original work.
The spatial frequency at which a digital image is sampled (the sampling frequency) is often a good indicator of resolution.
Dots-per-inch (dpi) or pixels-per-inch (ppi) are common and synonymous terms used to express resolution for digital images.
14. Pixel Dimensions -the horizontal and vertical measurements of an image expressed in pixels.
May be determined by multiplying both the width and the height by the dpi.
Example: an 8" x 10" document scanned at 300 dpi has thepixel dimensions of 2,400 pixels (8" x 300 dpi) by 3,000 pixels (10" x 300 dpi).
15. Bit Depth-determined by the number of bits used to define each pixel.
The greater the bit depth, the greater the number of tones (grayscale or color) that can be represented.
Digital images may be produced in black and white (bitonal), grayscale, or color.
16. Bit Depth
Abitonalimageis represented by pixels consisting of 1 bit each, which can represent two tones (typically black and white), using the values 0 for black and 1 for white or vice versa.
Agrayscaleimageis composed of pixels represented by multiple bits of information, typically ranging from 2 to 8 bits or more.
17. Bit Depth
Acolorimageis typically represented by a bit depth ranging from 8 to 24 or higher.
With a 24-bit image, the bits are often divided into three groupings: 8 for red, 8 for green, and 8 for blue. Combinations of those bits are used to represent other colors.
A 24-bit image offers 16.7 million (224) color values.
18. File Size-calculated by multiplying the surface area of a document (height x width) to be scanned by the bit depth and the dpi2.
Because image file size is represented in bytes, which are made up of 8 bits, divide this figure by 8.
Formula 1 for File Size FS = (height x width x bit depth x dpi2) / 8
19. File Size
Example: Compute the file size of a US-Letter size page captured in 8-bit Grayscale at 100dpi.
FS = (8.5 x 11 x 8 x 1002)/8
FS = 935,000 bytes.
20. File Size
If the pixel dimensions are given, multiply them by each other and the bit depth to determine the number of bits in an image file.
Formula 2 for File Size FS=(pixel dimensions x bit depth) / 8
21. File Size
Example: Compute the file size of a 24-bit image captured with a digital camera with pixel dimensions of 2,048 x 3,072.
FS = (2048 x 3072 x 24)/8
FS = 18,874,368 bytes.
22. Compression-algorithms designed to reduce the size of the image for storage or transmission.
Losslessschemes (e.g., ITU-T6) abbreviate the binary code without discarding any information, so that when the image is "decompressed" it is bit for bit identical to the original. Most often used with bitonalscanning of textual material.
Lossyschemes (e.g., JPEG) utilize a means for averaging or discarding the least significant information, based on an understanding of visual perception.Typically used with tonal images.
23. File Formats-consist of both the bits that comprise the image and header information on how to read and interpret the file.
File formats vary in terms of resolution, bit- depth, color capabilities, and support for compression and metadata.
24. Optical Character Recognition(OCR) -a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data.
Source: http://finereader.abbyy.com/about_ocr/whatis_ocr/
25. Quality (usability, functionality)
Persistence (long-term access)
Interoperability (e.g., across platforms and software environments)
Storage Space (file size)
Storage Hardware
Storage Media (e.g., DVDs, CDs)
26. Master copies should be created to the highest technical standards achievable.
Image formats should be open-source (non proprietary), have published technical specifications available in the public domain.
Image formats should be widely supported by many software applications and operating systems.
27. Digitize an original or first generation (i.e., print rather than microfilm) of the source material to achieve the best quality image possible.
Create backup copies of all files on servers and storage media (e.g., DVDs) and have an off-site backup strategy.
Create meaningful metadata for image files or collections.
28. Prior to digitization, consideration of third party copyright or other constraints inherent in the record should be resolved.
OCR should be performed on all digital reproductions where the content is primarily textual and computer processed. Collections that are photographic in nature and those not computer processed need not require OCR.
Plan for future technological developments and migration.
29. Tagged Image File Format(TIFF)
Extensions: .tif, .tiff
Bit-depths: 1-bit bitonal; 4-or 8-bit. grayscale or palette color; up to 64-bit color.
Compression: Uncompressed
◦Lossless: ITU-T.6, LZW, etc.
◦Lossy: JPEG
Standard/ Proprietary: De facto standard.
Web Support: plug-in or external application.
Supports multiple images/file (multi-page).
30. Joint Photographic Expert Group(JPEG) / JPEG File Interchange Format (JFIF)
Extensions: .jpg, .jpeg, .jif, .jfif
Bit-depths: 8-bit grayscale; 24-bit color.
Compression: Lossless; Lossy: JPEG.
Standard/ Proprietary: JPEG: ISO 10918-1/2; JFIF: de Facto Standard.
Web Support: Native since MicrosoftInternet Explorer 2, Netscape Navigator 2.
31. JP2-JPX/ JPEG 2000
Extensions: .jp2, .jpx, .j2k, .j2c
Bit-depths: supports up to 214 channels, each with 1-38 bits; gray or color.
Compression: Uncompressed
◦Lossless/Lossy: Wavelet.
Standard/ Proprietary: JPEG: ISO/IEC 15444 parts 1-6, 8-11.
Web Support: Plug-in.
32. Portable Document Format (PDF)
Extension: .pdf
Bit-depths: 4-bit grayscale; 8-bit color; up to 64-bit color support.
Compression: Uncompressed
◦Lossless: ITU-T.6, LZW, JBIG
◦Lossy: JPEG
Standard/ Proprietary: De facto standard.
Web Support: Plug-in or external application.
Contains OCR text layer.
34. DjVu
High quality image compression technique:
◦Scanned bitonal: 300dpi: 5-40K per page (3-10 times better than TIFF/G4).
◦5-10 times better than thanJPEG or PDF
36. Image Masters
◦TIFF
◦JPEG (if using digital cameras)
Derivatives / Deliverables
◦Text/ Documents: PDF, DjVu
◦Photographs: PNG, DjVu
37. Black and White
◦File Format: TIFF
◦Compression: Uncompressed or Lossless compressed using CCITT Group 4 (ITU-T6)
◦Bit Depth: 600dpi, bitonal
Grayscale
◦File Format: TIFF
◦Compression: Uncompressed or Lossless compressed using LZW or JPEG2000
◦Bit Depth: 300dpi, 8-bit grayscale
38. Color
◦File Format: TIFF
◦Compression: Uncompressed or Lossless Compressed using LZW or JPEG2000
◦Bit Depth: 300dpi, 24-bit color
41. Sheet Feed Scanner
◦Use the same basic technology as flatbeds, but maximize throughput, usually at the expense of quality.
◦Designed for high-volume scanning
42. Overhead Scanner
◦High speed book scanner.
◦Sometimes referred to as “Planetary scanner”
◦Bound volumes can be placed face up for scanning
43. V-Shaped Book Scanner
◦Uses Digital SLR Cameras and a unique v-shaped, auto-adjusting book cradle and platen to capture sharp images at up to 700 pages an hour.
◦Natively captures flat images. No need for page curvature correction.
46. Document(s) or other materials are captured in digital form using a scanner or digital camera.
Guidelines and Procedures:
◦Pre-scanning
Preparing item level inventory list
◦Copyright Statement
Should accompany each digital file.
If accessed from the web, copyright statement can be displayed on the website (if the same rights apply to all items on the site).
47. Image editing (if necessary)
◦Compression of files, sharpening of images, deskewing, image rotation, cropping, deleting and reordering pages.
Optical Character Recognition
Creating Derivatives
Adding Watermarks
Adding Security (e.g., restrictions on copying, printing, or extraction, and password protection)
Creation of metadata describing the scanned materials.
48. What to look for when checking digital images for quality:
◦Missing pages.
◦Incorrect order of pages.
◦Pages of different sizes.
◦Readability of text.
◦Black or white areas on some parts of the page that is covering the content.
◦Image not the correct size
◦Image in wrong resolution
◦Image in wrong file format
49. What to look for when checking digital images for quality:
◦Image in wrong mode or bit-depth
◦Overall light problems (e.g., too dark)
◦Loss of detail in highlights or shadows
◦Poor contrasts
◦Uneven tone or flares
◦Missing scan lines or dropped-out pixels
◦Lack of sharpness
◦Excessive sharpening
◦Image in wrong orientation
50. What to look for when checking digital images for quality:
◦Image not centeredor skewed
◦Incomplete or cropped images
◦Excessive noise (see dark areas)
◦Misaligned colorchannels
◦Image processing and scanner artifacts(e.g., extraneous lines, noise, banding)
51. The process of getting the scanned images to the user through computer networks/Web, monitors, and printers.
Delivery Methods
◦Removable Storage Devices
◦Optical Media (CDs, DVDs)
◦Static Web Pages
◦Digital Repositories
53. Strategies for storage and backup may include:
◦Dedicated server or shared storage solution.
Database Systems
File-based Systems (FTP, WebDav, Shared Folders)
◦Writing the digitized records to magnetic tape.
◦Writing the digitized records to optical media (e.g., CD, DVD).