Presentation given by Tim Keefe, Head of Digital Resources and Imaging Services (DRIS) at Trinity College Dublin, on March 15th, 2016 in the Royal Irish Academy, Dublin, as part of the DRI Training Series 'Preparing Your Collection for DRI'. This seminar introduces attendees to the basics of digitising heritage material, efficient workflows and some information on equipment requirements, as well as file format compatibility with DRI.
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Tim Keefe - DRI Training Series: 2. Digitising Your Collection
1. DRI Training: Preparing Your Collection for DRI
2. Digitising Your Collection
Digital Imaging – Introduction, components, process.
Tim Keefe, Head of Digital Resources and Imaging Services, Trinity
College Dublin
keefet@tcd.ie
2. Questions we all need to ask ??
When beginning a digitization project it is
easy to ignore the basic questions, those
questions that we all assume we know the
answers to … however these questions are
often the most important, and need to
addressed formally.
3. Questions to ask
What is the purpose of this project?
What is the scope of the digitization
activity?
What is the intended lifetime of the
digital files?
Who is the intended audience?
4. Purpose
What is the purpose of this project?
Why are we digitizing the material
Need/Trend
Access
Research
Education
Who are the champions for this project
Local
External
Who or what are the barriers to the
implementation of this project
Human
Resource
Procedural/Political
5. Scope
What is the scope of the digitization activity
What is to be digitized
What is not to be digitized
Why?
Who is likely to demand operation outside of
these criteria
6. Intended Audience
Who is the intended audience for the digital
resources
What are their needs
How will they access the material
Who else will be interested
Are you prepared for a new audience (known or
unknown) to self select to become the primary audience
Do you wish to prevent any audience from having access
to the resources
7. Image Lifetime
What is the intended lifetime for the
digital records
This question is critical to the appropriate
development of the digitization activity
Significant resource implications
Significant planning implications
Significant digitization process implications
8. So Why Digitize?
Access
Electronic mediums provide the most dynamic assess
Digital data structures offer the opportunity for truly
dynamic new research and educational models offering
unique new capabilities to existing methodologies
Preservation
Digital files designed to proper specifications can be true
surrogates for delicate source materials for all but a hand
full of advanced research needs
Manipulation
Non Linear
Digital resources allow for easy modification to image
characteristics
Digital files easily cross medium boundaries providing
opportunities for new use models
9. Problems with digitization
Pace of technological change is constantly
increasing the digital attributes bar
Not human readable
Lack of best practices / attribute recommendations
Long term digital preservation is a newly
emerging field, solutions just beginning to emerge
Much more complex than having IS Services make a
backup copy
Extremely costly activity
TCO not well understood, few models
10. Capture for What?
In TCD we designate the capture activity on
the object intent
Capturing for Content
Speed and cost most important
Quality less important
Capturing the Object
Quality most important
Meeting the needs of the researcher… researching
anything
11. Components
The primary components of an average
imaging system:
Digital capture device
Light source if not included in the capture system
Optics if not included in the capture system
Color Calibration System
Image Capture/Image Processing Computer
System(s)
Software packages
Data Storage Systems
13. Digital Capture Systems
Flatbed
Reflective /transmissive capabilities
Infra red dust and scratch removal systems (ICE)
Linear/Tri linear or CCD systems
Low productivity
Inclusive of software
14. Digital Capture Systems
Flatbed (limitations)
Works best with two-dimensional materials.
Not recommended for use with fragile or tightly
bound material.
Limited scan area.
Very slow
16. Digital Capture Systems
Digital Photographic systems
35mm format
CCD / CMOS digital capture sensors
Full Frame or Reduced frame sensors
1.5 to 1.33 avg. magnification values
High productivity
Limited resolution
Limited bit depth (8-14 bit)
Cost effective
Good starting solution
18. Digital Capture Systems
Medium format (MF digital back)
CCD sensors
6 x 4.5cm to 6 x 7cm sensor size
With and with/out micro-lenses
High bit depth (16bit)
High productivity
High Cost
Requires high level of studio
photographic experience
Additional software needs.
Associated Equipment also expensive
19. Digital Capture Systems
Dedicated Book Scanning Systems
One size fits all… and all its limitations
Limited source material input
Material handling and support
Possible automation
page turning ,
image management
Linear or CCD based
Digital Camera based
High to very high productivity
20. Digital Capture Systems
Dedicated Book Scanning Systems
Linear CCD based, generally with included
software. (flatbed in different form factor)
23. Computer Technology
What to buy
Image processing is one of the more intensive
computing tasks
Recommendation is to buy the fastest most
modern computer that you can afford right now
Memory requirements are often more critical than
processor speed (multi core technology is not being
fully advantaged by software yet)
Graphics Card often more important than processor
Have a minimum RAM of 4x your largest file
size… 8x recommended
Will cost 2-5x more than normal office computer
24. Computer Technology
Consider the software needs of the digital
capture system you have chosen.
Is software for generating the files required by
your Project Scenario or device type?
Some MF camera systems require unique software
Will it be necessary to purchase additional
image editing software packages (e.g. Adobe
Creative Suite/ Photoshop) or file management
software (Lightroom, Bridge, etc.)
Many of these software packages are now
subscription based
25. Storage Technology
RAID (Redundant array of inexpensive disks)
Level 0 (striped) – Speed and performance increases
Data is broken up and is written across several disks, taking
advantage of multiple writing heads to improve data
throughput (often used for video processing)
Level 1 (mirrored) – Security through redundancy
Data is identically written to more than one disk, allowing
for backup protection should any single disk fail
The overall all data storage volume of the system is halved
when a level one raid is activated
Local Hard drive (under the desk solution)
Low cost, lowest preservation (use only when required)
27. File Types
Tiff (Tagged Image File Format)
Large file size
Standard format
Lossless compression LZW (and lossy options)
Jpeg (Joint Photographic Experts Group)
Smaller file sizes
Lossy compression in most cases but newest
versions support lossless (Rarely supported)
Standard format
Jpeg 2000 (Lossless and or Lossy)
Multiple file sizes embedded within single
digital record
Emerging format (adoption very slow, caution)
28. File Types cont.
PDF (Portable Document Format - Adobe Acrobat)
Advanced Cross Platform Compatibility
Ability to support complex document generation
Text, images, notes, embedded graphics, etc,
Support for advanced printing
Support for sharing and dissemination
Standard file type
Caution as there are a wide variety of versions and variants
Digital preservation ISO standard acrobat type A files
Adoption rate very low
Some believe that this standard had political / corporate influence
driving recommendation
GIF
Dying file format, not recommended
29. File Compression
Two basic types of compression Lossy and Lossless
Lossy
Image structure is changed (damaged) by the compression
activity, but not in a perceptual way
Jpeg is the most common format using lossy compression
Every file save increases the damage
file conversion/save into a lossy format should always be the final step in
the digitization and image processing process
Large reduction in file size
30. File Saving
Save Order
When working with files that use or will use a
lossy compression (Jpeg) it is important that the
very last step in the process is the file save
Each save recompresses the data and causes
further image degradation
It is best practice to work in a lossless format such
as Tiff, and save out the final Jpeg as a last step.
This workflow will minimize the impact of the
compression artifacts
31. Compression cont.
Lossless
Image file structure is not changed in any way
by the compression activity
The Tiff file format with LZW compression is
the most widely used lossless compression
format
Note, the tiff file format can be also generated with
no compression or lossy compression
33. Resolution
This metric is generally stated as pixels per inch
(ppi), or the total number of individual picture
elements that will fit in a 1 x 1 inch sample
This is sometimes confused with dots per inch (dpi)
which is a printing specific metric
Spatial resolution requires dimensional
measurements and ppi sample rate
Screen resolution is 72 ppi (newest technology screens
now exceeding 125ppi)
High resolution commercial printing requires 300-650
ppi image files
General internet jpg files 72-150ppi
34. Bit Depth
Bit depth is the number of samples provided
within each image channel (RGB, CMYK)
This term is often confused with dynamic range
They are not the same however there is an
interaction between them
The number of discrete steps between black and
white
35. Bit Depth
Bit depth is stated in the number of bits of data per
channel
Bit depth is 2 (binary measure) raised to the power of
the bit depth number so 4 bit color will have 16 steps
between the black and white values
** note that bit depth is stated in either the number of bits
per channel as in 8 bit color or by the sum of all the
channels combined (R+G+B) = 24bit color… this can
be confusing
36. Bit Depth
8 bits per channel (or 24 bit color)
256 value steps in each channel
16.8 million possible colors
16 bit per channel (or 48 bit color)
65536 value steps in each channel
281.5 trillion possible colors
Many manufacturers talk about interim bit
depths (12- 14), but the final output is often
reduced to 8 bits per channel
you cannot add missing data by moving to a
higher bit depth
37. Dynamic range
Dynamic range is the ability of a sensor to
simultaneously capture dark detail, and light detail
This is an inherent weakness of digital capture
Decisions are made to set device to support either a greater
tonal range of dark densities(more common) or light
Commonly confused with bit depth
They are separate characteristics despite all the contrary
information out there (much of it from reputable sources)… I
promise
Greater bit depth will not automatically provide greater
Dynamic Range (however improvements in bit depth often
accompany other sensor improvements that include increased
DR)
38. Dynamic Range
Clipping
Clipping is a failure state of a digital image as
the limited dynamic range of a device is
unable to correctly capture either very light
or very dark tones
39. Color Mode
RGB (Red/Green/Blue color channels)
Additive color
Most common color mode for digital images
Mimics human visual system
40. Color Mode
CMYK (Cyan/Magenta/Yellow/Black)
Subtractive color
Commercial Printing standard
Most desktop color printers support RGB color files
(CMYK conversion is internally managed)
Limited color gamut
41. Color Mode
Lab color
Single luminance (grey scale
channel) and 2 opposing color
channels
Loosely represents the range of
human vision
Good for transforms
42. Color Profile Standards
The user defined color profile assigned to the
image files supports several informal standard
configurations
sRGB
Profile developed more than a decade ago by HP and
Microsoft. Represents the Gamut of an average CRT
monitor
Very Limited color palette
New output devices currently capable of exceeding this space
Most commonly used profile (usually the default if not stated)
43. Color Profile Standards
Adobe RGB 1998
Newer profile designed to support wider palette of
colors to support higher quality printing
Lower use than sRGB, but well recognized
Maintains a color appearance consistent with sRGB devices
ProPhoto RGB
A wide gamut color space designed for very high
quality printing of photographic images
Color appearance is highly inconsistent when use with devices
not color managed, or set to sRGB standards
Despite the benefits of this color space, its use is quite limited
due to the setup and management requirements
Caution in its use, as inaccurate color characteristics can occur
with improperly managed devices
45. The Controversy
Two primary schools of thought
The digital master image files should remain
untouched as they emerge from the capture
device and all subsequent processing should
occur only on the surrogates
Image processing will occur on the master
capture file with the intent of matching the
original source material as closely as
possible at the time of capture
46. Color Mode
RGB
Standard image space for files
Common, not likely to change
CMYK
Avoid this space for all but specific commercial
printing activities (even then try to ignore it)
Lab
Great for processing transforms that can benefit
from a luminance channel
Sharpening
Noise removal
No color profile
47. File Formats
Master
This is the high quality large image
generated from the capture device
Surrogates
These are secondary files generated from the
master file to be used for specific purposes
48. File format Sets
Master
Tiff
This is intended to be the highest quality image
Represents the asset derived from the € spent
Lossless compression recommended
Compressed Jpg’s
File size reduced for easier management, and
dissemination, and to manage costs
Lossy compression is acceptable within the use cases
Often several sizes (Large, small, thumbnail)
Used for public display
49. Image Manipulations
Tone Scale
To adjust tone scale you need to push or pull
predetermined black and white values to defined
positions on the histogram
This requires the use of a calibrated reference target placed
within the image
50. Image Manipulations
Sharpening
Sharpening works by increasing the contrast between edges in an
image. This change in contrast fools the human visual system into
believing that the image is sharper
52. Cropping
Cropping
Cropping is the permanent removal of unwanted parts
of the image
Formally determine where the boarders of your images
should be
For research purposes the entire page should be represented
For access and content related scanning cropping to the
textural areas of the page may be desired
Failure modes
What determines a crop or image capture that is unacceptable
requiring reprocessing or a new capture
Formalize this
53. Skew/Rotation
Skew/Rotation
When the source material is not perpendicular
to the edges of the digital image
Failure mode
Determine what percent is unacceptable
Formalize this criteria
54. White Balance
White balance is a color balancing function used
to address the color differences imparted by
varying light sources.
The human visual system does this automatically in the
brain, removing the real color cast imparted by source
illuminant and giving us the perception that most lights
are white.
Think of the differences evident when you have a
desktop incandescent bulb in a room lit by fluorescent
This is also important in the environment where your image
processing occurs
55. White Balance
Most white balance is preset within the capture system, however fine
tuning or custom profiles can be applied in the processing stage
Neutral 18% grey references are used to generate a custom balance
When adjusting tone scale in Photoshop, neutral grey adjustment can be
used to correct White Balance inconsistencies
56. Quality Control/Assurance
Imaging and image processing are a highly
repetitive, human dependent set of processes
and are therefore highly susceptible to
regular error
57. Control vs. Assurance
Control is in process activities to ensure
quality in the creation of the products
( digital images)
Assurance is focused on an evaluation of
the processes used and generally takes place
outside of the creation process
58. Quality Control
Processes built into the imaging work flow
to ensure that the creation of digital images
is
Consistent
Accurate
Repeatable
Often automated these processes are
inherently part of the imaging workflow
59. Quality Assurance
The Quality Assurance Audit
Formal.. Informal just does not work
Existing toolsets developed for a variety of
manufacturing based industries are highly
effective
TQM
Six Sigma
Etc.
Takes place fully outside of the imaging
processes
60. Quality Assurance Testing
What to test for
Imaging
File structure metrics
Naming, page counts
System/Network (positioning, backup
etc.)
Metadata
Structure
Accuracy
Completeness
61. Color Management
One of the most critical, and often
ignored, components of a successful
digitization project is a well planned
color management strategy
62. Color Management
Within any imaging and processing system you need
to ensure that consistent color is displayed from
device to device, and that a files color metrics are
electronically recognized
Technology Required
Capture reference targets
Color profiles / icc
63. Color Reference Targets
Allows a formal measured reverence to be
associated with the image (future proofing)
64. Color Management Technology
Color meters (Basic screen calibration)
Absorptive measurements
Less dynamic than Spectrophotometers
Spectrophotometers (Advanced CM)
Can measure the intensity of light as a
function of the wavelength of the light
Light absorption
Diffuse
Specular
65. CM Standards
ICC (international color consortium)
Works through a standardized Color Matching
Module (CMM) connection space
Not an ideal solution, but one that has been very
well adopted by most imaging related hardware
and software vendors
ColorSync (Apple Computer)
Apple solution to color management
Part of the Macintosh system software
Generally plays well with others, occasionally
some fiddling is necessary (ICC integrated)
Hands off approach
66. Further Reading and Resources
DRI and Digital File Format Choices Factsheet:
http://dri.ie/sites/default/files/files/dri-factsheets-file-formats.pdf
DRI Long-Term Digital Preservation Factsheet:
http://tinyurl.com/hbp28xe
Online Resources for Digitisation Projects:
http://dri.ie/digitisation-resources
- includes resources for Project Planning, File Formats, Audio &
Audiovisual, Hardware, Metadata & Vocabularies and Policy.
Trinity College Dublin Digital Collections Repository:
https://www.tcd.ie/Library/dris/digital.php