2. Sistemi Mul+mediali ‐ DIS 2011
5.1 Types of Video Signals
Component video
• Component video: Higher‐end video systems make use of three separate
video signals for the red, green, and blue image planes. Each color channel
is sent as a separate video signal.
(a) Most computer systems use Component Video, with separate signals for R, G,
and B signals.
(b) For any color separaHon scheme, Component Video gives the best color
reproducHon since there is no “crosstalk” between the three channels.
(c) This is not the case for S‐Video or Composite Video, discussed next.
Component video, however, requires more bandwidth and good
synchronizaHon of the three components.
2 Li & Drew
3. Sistemi Mul+mediali ‐ DIS 2011
Composite Video — 1 Signal
• Composite video: color (“chrominance”) and intensity (“luminance”) signals are
mixed into a single carrier wave.
a) Chrominance is a composiHon of two color components (I and Q, or U and V).
b) In NTSC TV, e.g., I and Q are combined into a chroma signal, and a color subcarrier is then
employed to put the chroma signal at the high‐frequency end of the signal shared with the
luminance signal.
c) The chrominance and luminance components can be separated at the receiver end and then
the two color components can be further recovered.
d) When connecHng to TVs or VCRs, Composite Video uses only one wire and video color signals
are mixed, not sent separately. The audio and sync signals are addiHons to this one signal.
• Since color and intensity are wrapped into the same signal, some interference
between the luminance and chrominance signals is inevitable.
3 Li & Drew
4. Sistemi Mul+mediali ‐ DIS 2011
S‐Video — 2 Signals
• S‐Video: as a compromise, (separated video, or Super‐video, e.g., in S‐VHS)
uses two wires, one for luminance and another for a composite
chrominance signal.
• As a result, there is less crosstalk between the color informaHon and the
crucial gray‐scale informaHon.
• The reason for placing luminance into its own part of the signal is that
black‐and‐white informaHon is most crucial for visual percepHon.
– In fact, humans are able to differenHate spaHal resoluHon in grayscale images
with a much higher acuity than for the color part of color images.
– As a result, we can send less accurate color informaHon than must be sent for
intensity informaHon — we can only see fairly large blobs of color, so it makes
sense to send less color detail.
4 Li & Drew
5. Sistemi Mul+mediali ‐ DIS 2011
5.2 Analog Video
• An analog signal f(t) samples a Hme‐varying image. So‐called
“progressive” scanning traces through a complete picture (a frame)
row‐wise for each Hme interval.
• In TV, and in some monitors and mulHmedia standards as well,
another system, called “interlaced” scanning is used:
a) The odd‐numbered lines are traced first, and then the even‐numbered
lines are traced. This results in “odd” and “even” fields — two fields
make up one frame.
b) In fact, the odd lines (starHng from 1) end up at the middle of a line
at the end of the odd field, and the even scan starts at a half‐way point.
5 Li & Drew
6. Sistemi Mul+mediali ‐ DIS 2011
• Table 5.2 gives a comparison of the three major analog
broadcast TV systems.
Table 5.2: Comparison of Analog Broadcast TV Systems
Total
Frame # of Bandwidth
Scan
Channel Alloca0on (MHz)
TV System Rate
Width
(fps) Lines Y I or U Q or V
(MHz)
NTSC 29.97 525 6.0 4.2 1.6 0.6
PAL 25 625 8.0 5.5 1.8 1.8
SECAM 25 625 8.0 6.0 2.0 2.0
6 Li & Drew
7. Sistemi Mul+mediali ‐ DIS 2011
5.3 Digital Video
• The advantages of digital representaHon for video are many.
For example:
(a) Video can be stored on digital devices or in memory, ready to
be processed (noise removal, cut and paste, etc.), and
integrated to various mulHmedia applicaHons;
(b) Direct access is possible, which makes nonlinear video ediHng
achievable as a simple, rather than a complex, task;
(c) Repeated recording does not degrade image quality;
(d) Ease of encrypHon and beier tolerance to channel noise.
7 Li & Drew
8. Sistemi Mul+mediali ‐ DIS 2011
CCIR Standards for Digital Video
• CCIR is the ConsultaHve Commiiee for
InternaHonal Radio, and one of the most
important standards it has produced is
CCIR‐601, for component digital video.
– This standard has since become standard ITU‐
R‐601, an internaHonal standard for professional
video applicaHons
— adopted by certain digital video formats including the
popular DV video.
8 Li & Drew
9. Sistemi Mul+mediali ‐ DIS 2011
HDTV (High Defini0on TV)
• The main thrust of HDTV (High DefiniHon TV) is not to increase the
“definiHon” in each unit area, but rather to increase the visual field
especially in its width.
(a) The first generaHon of HDTV was based on an analog technology developed
by Sony and NHK in Japan in the late 1970s.
(b) MUSE (MUlHple sub‐Nyquist Sampling Encoding) was an improved NHK HDTV
with hybrid analog/digital technologies that was put in use in the 1990s. It has
1,125 scan lines, interlaced (60 fields per second), and 16:9 aspect raHo.
(c) Since uncompressed HDTV will easily demand more than 20 MHz bandwidth,
which will not fit in the current 6 MHz or 8 MHz channels, various
compression techniques are being invesHgated.
(d) It is also anHcipated that high quality HDTV signals will be transmiied using
more than one channel even amer compression.
9 Li & Drew
10. Sistemi Mul+mediali ‐ DIS 2011
• A brief history of HDTV evoluHon:
(a) In 1987, the FCC decided that HDTV standards must be compaHble with the
exisHng NTSC standard and be confined to the exisHng VHF (Very High
Frequency) and UHF (Ultra High Frequency) bands.
(b) In 1990, the FCC announced a very different iniHaHve, i.e., its preference for a
full‐resoluHon HDTV, and it was decided that HDTV would be simultaneously
broadcast with the exisHng NTSC TV and eventually replace it.
(c) Witnessing a boom of proposals for digital HDTV, the FCC made a key decision
to go all‐digital in 1993. A “grand alliance” was formed that included four main
proposals, by General Instruments, MIT, Zenith, and AT&T, and by Thomson,
Philips, Sarnoff and others.
(d) This eventually led to the formaHon of the ATSC (Advanced Television Systems
Commiiee) — responsible for the standard for TV broadcasHng of HDTV.
(e) In 1995 the U.S. FCC Advisory Commiiee on Advanced Television Service
recommended that the ATSC Digital Television Standard be adopted.
10 Li & Drew
11. Sistemi Mul+mediali ‐ DIS 2011
• The standard supports video scanning formats shown in
Table 5.4. In the table, “I” mean interlaced scan and “P”
means progressive (non‐interlaced) scan.
Table 5.4: Advanced Digital TV formats supported by ATSC
# of Ac0ve
# of Ac0ve Aspect Ra0o Picture Rate
Pixels per line Lines
1,920 1,080 16:9 60I 30P 24P
1,280 720 16:9 60P 30P 24P
704 480
16:9 & 4:3 60I 60P 30P 24P
640 480 4:3 60I 60P 30P 24P
11 Li & Drew
12. Sistemi Mul+mediali ‐ DIS 2011
• For video, MPEG‐2 is chosen as the compression
standard. For audio, AC‐3 is the standard. It supports
the so‐called 5.1 channel Dolby surround sound, i.e.,
five surround channels plus a subwoofer channel.
• The salient difference between convenHonal TV and
HDTV:
(a) HDTV has a much wider aspect raHo of 16:9 instead of
4:3.
(b) HDTV moves toward progressive (non‐interlaced) scan.
The raHonale is that interlacing introduces serrated edges
to moving objects and flickers along horizontal edges.
12 Li & Drew
13. Sistemi Mul+mediali ‐ DIS 2011
• The FCC has planned to replace all analog
broadcast services with digital TV broadcasHng by
the year 2009. The services provided will include:
– SDTV (Standard Defini0on TV): the current NTSC TV
or higher.
– EDTV (Enhanced Defini0on TV): 480 acHve lines or
higher, i.e., the third and fourth rows in Table 5.4.
– HDTV (High Defini0on TV): 720 acHve lines or higher.
13 Li & Drew
14. Sistemi Mul+mediali ‐ DIS 2011
6.1 Digi0za0on of Sound
What is Sound?
• Sound is a wave phenomenon like light, but is macroscopic
and involves molecules of air being compressed and
expanded under the acHon of some physical device.
(a) For example, a speaker in an audio system vibrates back and
forth and produces a longitudinal pressure wave that we
perceive as sound.
(b) Since sound is a pressure wave, it takes on conHnuous values,
as opposed to digiHzed ones.
14 Li & Drew
15. Sistemi Mul+mediali ‐ DIS 2011
(c) Even though such pressure waves are
longitudinal, they sHll have ordinary wave
properHes and behaviors, such as reflecHon
(bouncing), refracHon (change of angle when
entering a medium with a different density)
and diffracHon (bending around an obstacle).
(d) If we wish to use a digital version of sound
waves we must form digiHzed representaHons
of audio informaHon.
15 Li & Drew
16. Sistemi Mul+mediali ‐ DIS 2011
Digi0za0on
• Digi0za0on means conversion to a stream of
numbers, and preferably these numbers
should be integers for efficiency.
• Fig. 6.1 shows the 1‐dimensional nature of
sound: amplitude values depend on a 1D
variable, Hme. (And note that images depend
instead on a 2D set of variables, x and y).
16 Li & Drew
18. Sistemi Mul+mediali ‐ DIS 2011
• The graph in Fig. 6.1 has to be made digital in both Hme and
amplitude. To digiHze, the signal must be sampled in each
dimension: in Hme, and in amplitude.
(a) Sampling means measuring the quanHty we are interested in, usually
at evenly‐spaced intervals.
(b) The first kind of sampling, using measurements only at evenly spaced
Hme intervals, is simply called, sampling. The rate at which it is
performed is called the sampling frequency (see Fig. 6.2(a)).
(c) For audio, typical sampling rates are from 8 kHz (8,000 samples per
second) to 48 kHz. This range is determined by the Nyquist theorem,
discussed later.
(d) Sampling in the amplitude or voltage dimension is called
quan0za0on. Fig. 6.2(b) shows this kind of sampling.
18 Li & Drew
19. Sistemi Mul+mediali ‐ DIS 2011
Signal to Noise Ra0o (SNR)
• The raHo of the power of the correct signal and the noise is
called the signal to noise ra+o (SNR) — a measure of the
quality of the signal.
• The SNR is usually measured in decibels (dB), where 1 dB is
a tenth of a bel. The SNR value, in units of dB, is defined in
terms of base‐10 logarithms of squared voltages, as
follows:
2
Vsignal Vsignal
SNR = 10 log10 2 = 20 log10
Vnoise Vnoise (6.2)
19 Li & Drew
20. Sistemi Mul+mediali ‐ DIS 2011
a) The power in a signal is proporHonal to the
square of the voltage. For example, if the
signal voltage Vsignal is 10 Hmes the noise,
then the SNR is 20 ∗ log10(10) = 20dB.
b) In terms of power, if the power from ten
violins is ten Hmes that from one violin
playing, then the raHo of power is 10dB, or
1B.
c) To know: Power — 10; Signal Voltage — 20.
20 Li & Drew
22. Sistemi Mul+mediali ‐ DIS 2011
Audio Filtering
• Prior to sampling and AD conversion, the audio signal is also usually filtered
to remove unwanted frequencies. The frequencies kept depend on the
applicaHon:
(a) For speech, typically from 50Hz to 10kHz is retained, and other frequencies
are blocked by the use of a band‐pass filter that screens out lower and higher
frequencies.
(b) An audio music signal will typically contain from about 20Hz up to 20kHz.
(c) At the DA converter end, high frequencies may reappear in the output —
because of sampling and then quanHzaHon, smooth input signal is replaced by
a series of step funcHons containing all possible frequencies.
(d) So at the decoder side, a lowpass filter is used amer the DA circuit.
22 Li & Drew
23. Sistemi Mul+mediali ‐ DIS 2011
Audio Quality vs. Data Rate
• The uncompressed data rate increases as more bits are used for
quanHzaHon. Stereo: double the bandwidth. to transmit a digital
audio signal.
Table 6.2: Data rate and bandwidth in sample audio applicaHons
Quality Sample Rate Bits per
Mono / Data Rate Frequency Band
(Khz) Sample
Stereo (uncompressed) (KHz)
(kB/sec)
Telephone 8 8
Mono 8 0.200‐3.4
AM Radio 11.025 8
Mono 11.0 0.1‐5.5
FM Radio 22.05 16
Stereo 88.2 0.02‐11
CD 44.1 16 Stereo 176.4 0.005‐20
DAT 48 16 Stereo 192.0 0.005‐20
DVD Audio 192 (max) 24(max) 6 channels 1,200 (max) 0‐96 (max)
23 Li & Drew
24. Sistemi Mul+mediali ‐ DIS 2011
6.2 MIDI: Musical Instrument Digital Interface
• Use the sound card’s defaults for sounds: use a simple
scripHng language and hardware setup called MIDI.
• MIDI Overview
(a) MIDI is a scripHng language — it codes “events” that stand for
the producHon of sounds. E.g., a MIDI event might include
values for the pitch of a single note, its duraHon, and its
volume.
(b) MIDI is a standard adopted by the electronic music industry for
controlling devices, such as synthesizers and sound cards, that
produce music.
24 Li & Drew
25. Sistemi Mul+mediali ‐ DIS 2011
(c) The MIDI standard is supported by most
synthesizers, so sounds created on one
synthesizer can be played and manipulated on
another synthesizer and sound reasonably close.
(d) Computers must have a special MIDI interface,
but this is incorporated into most sound cards.
The sound card must also have both D/A and A/D
converters.
25 Li & Drew
26. Sistemi Mul+mediali ‐ DIS 2011
MIDI Concepts
• MIDI channels are used to separate messages.
(a) There are 16 channels numbered from 0 to 15. The
channel forms the last 4 bits (the least significant bits) of
the message.
(b) Usually a channel is associated with a parHcular
instrument: e.g., channel 1 is the piano, channel 10 is the
drums, etc.
(c) Nevertheless, one can switch instruments midstream, if
desired, and associate another instrument with any
channel.
26 Li & Drew
27. Sistemi Mul+mediali ‐ DIS 2011
• System messages
(a) Several other types of messages, e.g. a general message
for all instruments indicaHng a change in tuning or Hming.
(b) If the first 4 bits are all 1s, then the message is
interpreted as a system common message.
• The way a syntheHc musical instrument responds to a
MIDI message is usually by simply ignoring any play
sound message that is not for its channel.
– If several messages are for its channel, then the instrument
responds, provided it is mul0‐voice, i.e., can play more
than a single note at once.
27 Li & Drew
28. Sistemi Mul+mediali ‐ DIS 2011
• It is easy to confuse the term voice with the term 0mbre — the
laier is MIDI terminology for just what instrument that is trying to
be emulated, e.g. a piano as opposed to a violin: it is the quality of
the sound.
(a) An instrument (or sound card) that is mul0‐0mbral is one that is
capable of playing many different sounds at the same Hme, e.g., piano,
brass, drums, etc.
(b) On the other hand, the term voice, while someHmes used by
musicians to mean the same thing as Hmbre, is used in MIDI to mean
every different Hmbre and pitch that the tone module can produce at
the same Hme.
• Different Hmbres are produced digitally by using a patch — the set
of control sevngs that define a parHcular Hmbre. Patches are
omen organized into databases, called banks.
28 Li & Drew
30. Sistemi Mul+mediali ‐ DIS 2011
Hardware Aspects of MIDI
• The MIDI hardware setup consists of a 31.25 kbps serial
connecHon. Usually, MIDI‐capable units are either
Input devices or Output devices, not both.
• A tradiHonal synthesizer is shown in Fig. 6.10:
Fig. 6.10: A MIDI synthesizer
30 Li & Drew
31. Sistemi Mul+mediali ‐ DIS 2011
• The physical MIDI ports consist of 5‐pin connectors for
IN and OUT, as well as a third connector called THRU.
(a) MIDI communicaHon is half‐duplex.
(b) MIDI IN is the connector via which the device receives all
MIDI data.
(c) MIDI OUT is the connector through which the device
transmits all the MIDI data it generates itself.
(d) MIDI THRU is the connector by which the device echoes
the data it receives from MIDI IN. Note that it is only the
MIDI IN data that is echoed by MIDI THRU — all the data
generated by the device itself is sent via MIDI OUT.
31 Li & Drew
33. Sistemi Mul+mediali ‐ DIS 2011
Structure of MIDI Messages
• MIDI messages can be classified into two types: channel
messages and system messages, as in Fig. 6.12:
Fig. 6.12: MIDI message taxonomy
33 Li & Drew
34. Sistemi Mul+mediali ‐ DIS 2011
• A. Channel messages: can have up to 3 bytes:
a) The first byte is the status byte (the opcode, as it were); has its most significant bit set to 1.
b) The 4 low‐order bits idenHfy which channel this message belongs to (for 16 possible channels).
c) The 3 remaining bits hold the message. For a data byte, the most significant bit is set to 0.
• A.1. Voice messages:
a) This type of channel message controls a voice, i.e., sends informaHon specifying which note to
play or to turn off, and encodes key pressure.
b) Voice messages are also used to specify controller effects such as sustain, vibrato, tremolo, and
the pitch wheel.
c) Table 6.3 lists these operaHons.
34 Li & Drew
35. Sistemi Mul+mediali ‐ DIS 2011
Table 6.3: MIDI voice messages
Voice Message Status Byte
Data Byte1 Data Byte2
Note Off &H8n
Key number Note Off velocity
Note On &H9n Key number Note On velocity
Poly. Key Pressure &HAn Key number Amount
Control Change &HBn Controller num. Controller value
Program Change &HCn Program number None
Channel Pressure &HDn Pressure value None
Pitch Bend &HEn MSB LSB
(** &H indicates hexadecimal, and ‘n’ in the status byte hex
value stands for a channel number. All values are in 0..127
except Controller number, which is in 0..120)
35 Li & Drew
36. Sistemi Mul+mediali ‐ DIS 2011
General MIDI
• General MIDI is a scheme for standardizing the assignment
of instruments to patch numbers.
a) A standard percussion map specifies 47 percussion sounds.
b) Where a “note” appears on a musical score determines what percussion instrument is being
struck: a bongo drum, a cymbal.
c) Other requirements for General MIDI compaHbility: MIDI device must support all 16 channels; a
device must be mulHHmbral (i.e., each channel can play a different instrument/program); a
device must be polyphonic (i.e., each channel is able to play many voices); and there must be
a minimum of 24 dynamically allocated voices.
• General MIDI Level2: An extended general MIDI has recently been defined, with a
standard .smf “Standard MIDI File” format defined — inclusion of extra
character informaHon, such as karaoke lyrics.
36 Li & Drew
37. Sistemi Mul+mediali ‐ DIS 2011
MIDI to WAV Conversion
• Some programs, such as early versions of
Premiere, cannot include .mid files — instead,
they insist on .wav format files.
a) Various shareware programs exist for approximaHng
a reasonable conversion between MIDI and WAV
formats.
b) These programs essenHally consist of large lookup
files that try to subsHtute pre‐defined or shimed WAV
output for MIDI messages, with inconsistent success.
37 Li & Drew
38. Sistemi Mul+mediali ‐ DIS 2011
7.1 Introduc0on
• Compression: the process of coding that will
effecHvely reduce the total number of bits
needed to represent certain informaHon.
Fig. 7.1: A General Data Compression Scheme.
38 Li & Drew
39. Sistemi Mul+mediali ‐ DIS 2011
Introduc0on (cont’d)
• If the compression and decompression processes
induce no informaHon loss, then the compression
scheme is lossless; otherwise, it is lossy.
• Compression ra0o:
B0
compression ratio = (7.1)
B1
B0 – number of bits before compression
B1 – number of bits amer compression
39 Li & Drew
40. Sistemi Mul+mediali ‐ DIS 2011
7.2 Basics of Informa0on Theory
• The entropy η of an informaHon source with alphabet S =
{s1, s2, . . . , sn} is:
n
1
η = H (S ) = pi log 2
∑ (7.2)
i =1 pi
n
= −∑ pi log 2 pi (7.3)
i =1
pi – probability that symbol si will occur in S.
log 1 – indicates the amount of informaHon ( self‐
2 pi
informaHon as defined by Shannon) contained in si, which
corresponds to the number of bits needed to encode si.
40 Li & Drew
41. Sistemi Mul+mediali ‐ DIS 2011
Distribu0on of Gray‐Level Intensi0es
Fig. 7.2 Histograms for Two Gray‐level Images.
• Fig. 7.2(a) shows the histogram of an image with uniform distribuHon of
gray‐level intensiHes, i.e., i pi = 1/256. Hence, the entropy of this image
is:
log2256 = 8 (7.4)
• Fig. 7.2(b) shows the histogram of an image with two possible values. Its
entropy is 0.92.
41 Li & Drew
42. Sistemi Mul+mediali ‐ DIS 2011
Entropy and Code Length
• As can be seen in Eq. (7.3): the entropy η is a weighted‐sum
log 1
of terms ; hence it represents the average amount of
2 pi
informaHon contained per symbol in the source S.
• The entropy η specifies the lower bound for the average
number of bits to code each symbol in S, i.e.,
η≤l (7.5)
l
‐ the average length (measured in bits) of the codewords
produced by the encoder.
42 Li & Drew
43. Sistemi Mul+mediali ‐ DIS 2011
7.3 Run‐Length Coding
• Memoryless Source: an informaHon source that is
independently distributed. Namely, the value of the
current symbol does not depend on the values of the
previously appeared symbols.
• Instead of assuming memoryless source, Run‐Length Coding
(RLC) exploits memory present in the informaHon source.
• Ra0onale for RLC: if the informaHon source has the
property that symbols tend to form conHnuous groups,
then such symbol and the length of the group can be
coded.
43 Li & Drew
44. Sistemi Mul+mediali ‐ DIS 2011
7.4 Variable‐Length Coding (VLC)
Shannon‐Fano Algorithm — a top‐down approach
1. Sort the symbols according to the frequency count of their
occurrences.
2. Recursively divide the symbols into two parts, each with
approximately the same number of counts, unHl all parts contain
only one symbol.
An Example: coding of “HELLO”
Symbol H E L O
Count 1 1 2 1
Frequency count of the symbols in ”HELLO”.
44 Li & Drew
45. Sistemi Mul+mediali ‐ DIS 2011
Huffman Coding
ALGORITHM 7.1 Huffman Coding Algorithm— a boiom‐up approach
1. IniHalizaHon: Put all symbols on a list sorted according to their frequency counts.
2. Repeat unHl the list has only one symbol lem:
(1) From the list pick two symbols with the lowest frequency counts. Form a Huffman subtree
that has these two symbols as child nodes and create a parent node.
(2) Assign the sum of the children’s frequency counts to the parent and insert it into the list such
that the order is maintained.
(3) Delete the children from the list.
3. Assign a codeword for each leaf based on the path from the root.
45 Li & Drew
47. Sistemi Mul+mediali ‐ DIS 2011
Huffman Coding (cont’d)
In Fig. 7.5, new symbols P1, P2, P3 are created
to refer to the parent nodes in the Huffman
coding tree. The contents in the list are
illustrated below:
Amer iniHalizaHon: L H E O
Amer iteraHon (a): L P1 H
Amer iteraHon (b): L P2
Amer iteraHon (c): P3
47 Li & Drew
48. Sistemi Mul+mediali ‐ DIS 2011
Proper0es of Huffman Coding
1. Unique Prefix Property: No Huffman code is a prefix of any other Huffman
code ‐ precludes any ambiguity in decoding.
2. Op0mality: minimum redundancy code ‐ proved opHmal for a given data
model (i.e., a given, accurate, probability distribuHon):
• The two least frequent symbols will have the same length for their Huffman
codes, differing only at the last bit.
• Symbols that occur more frequently will have shorter Huffman codes than
symbols that occur less frequently.
• The average code length for an informaHon source S is strictly less than η + 1.
Combined with Eq. (7.5), we have:
l < η +1 (7.6)
48 Li & Drew
49. Sistemi Mul+mediali ‐ DIS 2011
7.7 Lossless Image Compression
• Approaches of Differen0al Coding of Images:
– Given an original image I(x, y), using a simple difference operator
we can define a difference image d(x, y) as follows:
d(x, y) = I(x, y) − I(x − 1, y) (7.9)
or use the discrete version of the 2‐D Laplacian operator to
define a difference image d(x, y) as
d(x, y) = 4 I(x, y) − I(x, y − 1) − I(x, y +1) − I(x+1, y) − I(x − 1, y)
(7.10)
• Due to spa+al redundancy existed in normal images I, the
difference image d will have a narrower histogram and
hence a smaller entropy, as shown in Fig. 7.9.
49 Li & Drew
50. Sistemi Mul+mediali ‐ DIS 2011
Fig. 7.9: DistribuHons for Original versus DerivaHve Images. (a,b): Original
gray‐level image and its parHal derivaHve image; (c,d): Histograms for original
and derivaHve images.
(This figure uses a commonly employed image called “Barb”.)
50 Li & Drew
51. Sistemi Mul+mediali ‐ DIS 2011
8.1 Introduc0on
• Lossless compression algorithms do not deliver
compression ra+os that are high enough. Hence,
most mulHmedia compression algorithms are
lossy.
• What is lossy compression?
– The compressed data is not the same as the original
data, but a close approximaHon of it.
– Yields a much higher compression raHo than that of
lossless compression.
51 Li & Drew
52. Sistemi Mul+mediali ‐ DIS 2011
8.2 Distor0on Measures
• The three most commonly used distorHon measures in image compression are:
– mean square error (MSE) σ2,
N
2 1 2
σ =
N ∑ (x
n =1
n − yn ) (8.1)
where xn, yn, and N are the input data sequence, reconstructed data sequence, and length of the
data sequence respecHvely.
– signal to noise ra+o (SNR), in decibel units (dB),
σ x2
SNR = 10 log10 2 (8.2)
σd
2 2
where is the average square value of the original data sequence and is the MSE.
σx σd
– peak signal to noise ra+o (PSNR),
x2
peak (8.3)
PSNR = 10 log10 2
σ d
52 Li & Drew
53. Sistemi Mul+mediali ‐ DIS 2011
Spa0al Frequency and DCT
• Spa+al frequency indicates how many Hmes pixel
values change across an image block.
• The DCT formalizes this noHon with a measure of how
much the image contents change in correspondence to
the number of cycles of a cosine wave per block.
• The role of the DCT is to decompose the original signal
into its DC and AC components; the role of the IDCT is
to reconstruct (re‐compose) the signal.
53 Li & Drew
54. Sistemi Mul+mediali ‐ DIS 2011
Defini0on of DCT:
Given an input funcHon f(i, j) over two integer variables i and j
(a piece of an image), the 2D DCT transforms it into a new
funcHon F(u, v), with integer u and v running over the same
range as i and j. The general definiHon of the transform is:
2 C (u) C (v) M −1 N −1 (2i + 1)·uπ (2 j + 1)·vπ
F (u, v) = ∑∑ cos ·cos · f (i, j ) (8.15)
MN 2M 2N
i =0 j =0
where i, u = 0, 1, . . . ,M − 1; j, v = 0, 1, . . . ,N − 1; and the
constants C(u) and C(v) are determined by
2
if ξ = 0, (8.16)
C (ξ ) = 2
1
otherwise.
54 Li & Drew
55. Sistemi Mul+mediali ‐ DIS 2011
2D Discrete Cosine Transform (2D DCT):
C (u) C (v) 7 7 (2i + 1)uπ (2 j + 1)vπ (8.17)
F (u, v) = ∑∑ cos 16 cos 16 f (i, j)
4 i =0 j =0
where i, j, u, v = 0, 1, . . . , 7, and the constants C(u) and C(v) are determined
by Eq. (8.5.16).
2D Inverse Discrete Cosine Transform (2D IDCT):
The inverse funcHon is almost the same, with the roles of f(i, j) and F(u, v)
reversed, except that now C(u)C(v) must stand inside the sums:
7 7 (8.18)
% C (u ) C (v) (2i + 1)uπ (2 j + 1)vπ
f (i, j ) = ∑∑ cos cos F (u, v)
4
where i, j, u, v = 0, 1, . . . , 7.
u =0 v =0
16 16
55 Li & Drew
57. Sistemi Mul+mediali ‐ DIS 2011
The Cosine Basis Func0ons
• FuncHon Bp(i) and Bq(i) are orthogonal, if
i
∑ [ B (i)·B (i)] = 0
p q if p ≠ q (8.22)
• FuncHon Bp(i) and Bq(i) are orthonormal, if they are orthogonal and
[ Bp (i)·Bq (i )] = 1
∑ if p = q (8.23)
i
• It can be shown that:
7
cos
(2i + 1)· pπ
·cos
(2i + 1)·qπ
∑ 16 16 =0 if p ≠ q
i =0
7
C ( p) (2i + 1)· pπ C (q) (2i + 1)·qπ
∑ 2 cos · cos =1 if p = q
i =0 16 2 16
57 Li & Drew
59. Sistemi Mul+mediali ‐ DIS 2011
2D Separable Basis
• The 2D DCT can be separated into a sequence of two,
1D DCT steps:
7
G(i, v) = 1 C (v) cos (2 j + 1)vπ f (i, j ) (8.24)
2 ∑ 16
j =0
7
1 C (u ) cos (2i + 1)uπ G (i, v)
(8.25)
F (u, v) = ∑
2 i =0
16
• It is straigh•orward to see that this simple change
saves
many arithmeHc steps. The number of iteraHons
required is reduced from 8 × 8 to 8+8.
59 Li & Drew
60. Sistemi Mul+mediali ‐ DIS 2011
9.1 The JPEG Standard
• JPEG is an image compression standard that was developed
by the “Joint Photographic Experts Group”. JPEG was
formally accepted as an internaHonal standard in 1992.
• JPEG is a lossy image compression method. It employs a
transform coding method using the DCT (Discrete Cosine
Transform).
• An image is a funcHon of i and j (or convenHonally x and y)
in the spa+al domain. The 2D DCT is used as one step in
JPEG in order to yield a frequency response which is a
funcHon F(u, v) in the spa+al frequency domain, indexed
by two integers u and v.
60 Li & Drew
61. Sistemi Mul+mediali ‐ DIS 2011
Observa0ons for JPEG Image Compression
• The effecHveness of the DCT transform coding
method in JPEG relies on 3 major observaHons:
Observa0on 1: Useful image contents change
relaHvely slowly across the image, i.e., it is unusual
for intensity values to vary widely several Hmes in
a small area, for example, within an 8×8 image
block.
• much of the informaHon in an image is repeated,
hence “spaHal redundancy”.
61 Li & Drew
62. Sistemi Mul+mediali ‐ DIS 2011
Observa0ons for JPEG Image Compression
(cont’d)
Observa0on 2: Psychophysical experiments suggest that
humans are much less likely to noHce the loss of very high
spaHal frequency components than the loss of lower
frequency components.
• the spaHal redundancy can be reduced by largely reducing
the high spaHal frequency contents.
Observa0on 3: Visual acuity (accuracy in disHnguishing closely
spaced lines) is much greater for gray (“black and white”)
than for color.
• chroma subsampling (4:2:0) is used in JPEG.
62 Li & Drew
65. Sistemi Mul+mediali ‐ DIS 2011
DCT on image blocks
• Each image is divided into 8 × 8 blocks. The 2D
DCT is applied to each block image f(i, j), with
output being the DCT coefficients F(u, v) for each
block.
• Using blocks, however, has the effect of isolaHng
each block from its neighboring context. This is
why JPEG images look choppy (“blocky”) when a
high compression ra+o is specified by the user.
65 Li & Drew
66. Sistemi Mul+mediali ‐ DIS 2011
Quan0za0on
F (u , v)
ˆ
F (u , v) = round (9.1)
Q(u, v)
• F(u, v) represents a DCT coefficient, Q(u, v) is a “quanHzaHon matrix” entry,
and represents the quan+zed DCT coefficients which
JPEG will use in the succeeding entropy coding.
ˆ
F (u, v)
– The quan0za0on step is the main source for loss in JPEG compression.
– The entries of Q(u, v) tend to have larger values towards the lower right corner.
This aims to introduce more loss at the higher spaHal frequencies — a pracHce
supported by ObservaHons 1 and 2.
– Table 9.1 and 9.2 show the default Q(u, v) values obtained from psychophysical
studies with the goal of maximizing the compression raHo while minimizing
perceptual losses in JPEG images.
66 Li & Drew
69. Sistemi Mul+mediali ‐ DIS 2011
9.1.2 Four Commonly Used JPEG Modes
• SequenHal Mode — the default JPEG mode,
implicitly assumed in the discussions so far.
Each graylevel image or color image
component is encoded in a single lem‐to‐right,
top‐to‐boiom scan.
• Progressive Mode.
• Hierarchical Mode.
• Lossless Mode
69 Li & Drew
71. Sistemi Mul+mediali ‐ DIS 2011
9.2 The JPEG2000 Standard
• Design Goals:
– To provide a beier rate‐distorHon tradeoff and
improved subjecHve image quality.
– To provide addiHonal funcHonaliHes lacking in the
current JPEG standard.
• The JPEG2000 standard addresses the following
problems:
– Lossless and Lossy Compression: There is currently
no standard that can provide superior lossless
compression and lossy compression in a single
bitstream.
71 Li & Drew
72. Sistemi Mul+mediali ‐ DIS 2011
– Low Bit‐rate Compression: The current JPEG standard
offers excellent rate‐distorHon performance in mid
and high bit‐rates. However, at bit‐rates below 0.25
bpp, subjecHve distorHon becomes unacceptable. This
is important if we hope to receive images on our web‐
enabled ubiquitous devices, such as web‐aware
wristwatches and so on.
– Large Images: The new standard will allow image
resoluHons greater than 64K by 64K without Hling. It
can handle image size up to 232 − 1.
– Single Decompression Architecture: The current JPEG
standard has 44 modes, many of which are
applicaHon specific and not used by the majority of
JPEG decoders.
72 Li & Drew
73. Sistemi Mul+mediali ‐ DIS 2011
– Transmission in Noisy Environments: The new
standard will provide improved error resilience for
transmission in noisy environments such as wireless
networks and the Internet.
– Progressive Transmission: The new standard provides
seamless quality and resoluHon scalability from low to
high bit‐rate. The target bit‐rate and reconstrucHon
resoluHon need not be known at the Hme of
compression.
– Region of Interest Coding: The new standard allows
the specificaHon of Regions of Interest (ROI) which
can be coded with superior quality than the rest of
the image. One might like to code the face of a
speaker with more quality than the surrounding
furniture.
73 Li & Drew
74. Sistemi Mul+mediali ‐ DIS 2011
– Computer Generated Imagery: The current JPEG
standard is opHmized for natural imagery and does
not perform well on computer generated imagery.
– Compound Documents: The new standard offers
metadata mechanisms for incorporaHng addiHonal
non‐image data as part of the file. This might be
useful for including text along with imagery, as one
important example.
• In addiHon, JPEG2000 is able to handle up to 256
channels of informaHon whereas the current
JPEG standard is only able to handle three color
channels.
74 Li & Drew
75. Sistemi Mul+mediali ‐ DIS 2011
Proper0es of JPEG2000 Image Compression
• Uses Embedded Block Coding with OpHmized TruncaHon
(EBCOT) algorithm which parHHons each subband LL, LH,
HL, HH produced by the wavelet transform into small
blocks called “code blocks”.
• A separate scalable bitstream is generated for each code
block improved error resilience.
Fig. 9.7: Code block structure of EBCOT.
75 Li & Drew
77. Sistemi Mul+mediali ‐ DIS 2011
Region of Interest Coding in JPEG2000
• Goal:
– ParHcular regions of the image may contain important
informaHon, thus should be coded with beier quality than
others.
• Usually implemented using the MAXSHIFT method which
scales up the coefficients within the ROI so that they are
placed into higher bit‐planes.
• During the embedded coding process, the resulHng bits are
placed in front of the non‐ROI part of the image. Therefore,
given a reduced bit‐rate, the ROI will be decoded and
refined before the rest of the image.
77 Li & Drew
81. Sistemi Mul+mediali ‐ DIS 2011
(b)
(c)
Fig. 9.13 (Cont’d): Comparison of JPEG and JPEG2000. (b) JPEG (lem) and JPEG2000 (right) images
compressed at 0.75 bpp. (c) JPEG (lem) and JPEG2000 (right) images compressed at 0.25 bpp.
81 Li & Drew
82. Sistemi Mul+mediali ‐ DIS 2011
9.3 The JPEG‐LS Standard
• JPEG‐LS is in the current ISO/ITU standard for lossless or “near
lossless” compression of conHnuous tone images.
• It is part of a larger ISO effort aimed at beier compression of
medical images.
• Uses the LOCO‐I (LOw COmplexity LOssless Compression for Images)
algorithm proposed by Hewlei‐Packard.
• MoHvated by the observaHon that complexity reducHon is omen
more important than small increases in compression offered by
more complex algorithms.
Main Advantage: Low complexity!
82 Li & Drew
83. Sistemi Mul+mediali ‐ DIS 2011
10.1 Introduc0on to Video Compression
• A video consists of a Hme‐ordered sequence of frames, i.e., images.
• An obvious soluHon to video compression would be predic+ve coding
based on previous frames.
Compression proceeds by subtracHng images: subtract in Hme order and
code the residual error.
• It can be done even beier by searching for just the right parts of the
image to subtract from the previous frame.
83 Li & Drew
84. Sistemi Mul+mediali ‐ DIS 2011
10.2 Video Compression with Mo0on Compensa0on
• ConsecuHve frames in a video are similar — temporal redundancy exists.
• Temporal redundancy is exploited so that not every frame of the video needs to be
coded independently as a new image.
The difference between the current frame and other frame(s) in the sequence will be
coded — small values and low entropy, good for compression.
• Steps of Video compression based on Mo#on Compensa#on (MC):
1. MoHon EsHmaHon (moHon vector search).
2. MC‐based PredicHon.
3. DerivaHon of the predicHon error, i.e., the difference.
84 Li & Drew
85. Sistemi Mul+mediali ‐ DIS 2011
Mo0on Compensa0on
• Each image is divided into macroblocks of size N x N.
‐ By default, N = 16 for luminance images. For chrominance images,
N = 8 if 4:2:0 chroma subsampling is adopted.
• MoHon compensaHon is performed at the macroblock level.
‐ The current image frame is referred to as Target Frame.
‐ A match is sought between the macroblock in the Target Frame and the
most similar macroblock in previous and/or future frame(s) (referred to
as Reference frame(s)).
‐ The displacement of the reference macroblock to the target macroblock is
called a mo+on vector MV.
‐ Figure 10.1 shows the case of forward predic+on in which the Reference
frame is taken to be a previous frame.
85 Li & Drew
87. Sistemi Mul+mediali ‐ DIS 2011
10.3 Search for Mo0on Vectors
• The difference between two macroblocks can then be measured by their Mean
Absolute Difference (MAD):
N −1 N −1
MAD(i, j ) = 12 ∑∑ C ( x + k , y + l ) − R( x + i + k , y + j + l ) (10.1)
N k =0 l =0
N — size of the macroblock,
k and l — indices for pixels in the macroblock,
i and j — horizontal and verHcal displacements,
C ( x + k, y + l ) — pixels in macroblock in Target frame,
R ( x + i + k, y + j + l ) — pixels in macroblock in Reference frame.
• The goal of the search is to find a vector (i, j) as the moHon vector MV = (u, v),
such that MAD(i, j) is minimum:
(u, v) = [ (i, j ) | MAD(i, j ) is minimum, i ∈[− p, p], j ∈[− p, p] ] (10.2)
87 Li & Drew
88. Sistemi Mul+mediali ‐ DIS 2011
Sequen0al Search
• Sequen0al search: sequenHally search the whole (2p + 1) x (2p + 1)
window in the Reference frame (also referred to as Full search).
‐ a macroblock centered at each of the posiHons within the window is
compared to the macroblock in the Target frame pixel by pixel and their
respecHve MAD is then derived using Eq. (10.1).
‐ The vector (i, j) that offers the least MAD is designated as the MV (u, v)
for the macroblock in the Target frame.
‐ sequenHal search method is very costly — assuming each pixel
comparison requires three operaHons (subtracHon, absolute value,
addiHon), the cost for obtaining a moHon vector for a single macroblock
is (2p + 1) (2p + 1) N 2 3 O ( p 2 N 2 ).
88 Li & Drew
89. Sistemi Mul+mediali ‐ DIS 2011
PROCEDURE 10.1 Mo0on‐vector:sequen0al‐search
begin
min_MAD = LARGE NUMBER; /* IniHalizaHon */
for i = −p to p
for j = −p to p
{
cur_MAD = MAD(i, j);
if cur_MAD < min_MAD
{
min_MAD = cur_MAD;
u = i; /* Get the coordinates for MV. */
v = j;
}
}
end
89 Li & Drew
90. Sistemi Mul+mediali ‐ DIS 2011
2D Logarithmic Search
• Logarithmic search: a cheaper version, that is subopHmal but sHll
usually effecHve.
• The procedure for 2D Logarithmic Search of moHon vectors takes
several iteraHons and is akin to a binary search:
‐ As illustrated in Fig.10.2, iniHally only nine locaHons in the search window
are used as seeds for a MAD‐based search; they are marked as 1 .
‐ Amer the one that yields the minimum MAD is located, the center of the
new search region is moved to it and the step‐size ( offset ) is reduced
to half.
‐ In the next iteraHon, the nine new locaHons are marked as 2 and so on.
90 Li & Drew
91. Sistemi Mul+mediali ‐ DIS 2011
Hierarchical Search
• The search can benefit from a hierarchical (mulHresoluHon) approach
in which iniHal esHmaHon of the moHon vector can be obtained from
images with a significantly reduced resoluHon.
• Figure 10.3: a three‐level hierarchical search in which the original
image is at Level 0, images at Levels 1 and 2 are obtained by down‐
sampling from the previous levels by a factor of 2, and the iniHal
search is conducted at Level 2.
Since the size of the macroblock is smaller and p can also be
proporHonally reduced, the number of operaHons required is greatly
reduced.
91 Li & Drew
93. Sistemi Mul+mediali ‐ DIS 2011
Hierarchical Search (Cont'd)
• Given the esHmated moHon vector (uk, vk) at Level k, a 3 x 3 neighborhood
centered at (2 ∙ uk, 2 ∙ vk) at Level k − 1 is searched for the refined moHon
vector.
• the refinement is such that at Level k − 1 the moHon vector (uk−1 , vk−1)
saHsfies:
(2uk − 1 ≤ uk−1 ≤ 2uk +1, 2vk − 1 ≤ vk−1 ≤ 2vk +1)
• Let (xk0, yk0) denote the center of the macroblock at Level k in the Target
frame. The procedure for hierarchical moHon vector search for the
macroblock centered at (x00, y00) in the Target frame can be outlined as
follows:
93 Li & Drew
94. Sistemi Mul+mediali ‐ DIS 2011
10.4 H.261
• H.261: An earlier digital video compression standard, its principle of
MC‐based compression is retained in all later video compression
standards.
‐ The standard was designed for videophone, video conferencing and other
audiovisual services over ISDN.
‐ The video codec supports bit‐rates of p x 64 kbps, where p ranges from 1 to
30 (Hence also known as p * 64).
‐ Require that the delay of the video encoder be less than 150 msec so that
the video can be used for real‐Hme bidirecHonal video conferencing.
94 Li & Drew
95. Sistemi Mul+mediali ‐ DIS 2011
ITU Recommendations & H.261 Video Formats
• H.261 belongs to the following set of ITU recommendaHons for visual
telephony systems:
1. H.221 — Frame structure for an audiovisual channel supporHng 64 to
1,920 kbps.
2. H.230 — Frame control signals for audiovisual systems.
3. H.242 — Audiovisual communicaHon protocols.
4. H.261 — Video encoder/decoder for audiovisual services at p x 64 kbps.
5. H.320 — Narrow‐band audiovisual terminal equipment for p x 64 kbps
transmission.
95 Li & Drew
98. Sistemi Mul+mediali ‐ DIS 2011
H.261 Frame Sequence
• Two types of image frames are defined: Intra‐frames (I‐frames) and Inter‐
frames (P‐frames):
‐ I‐frames are treated as independent images. Transform coding method similar
to JPEG is applied within each I‐frame, hence Intra .
‐ P‐frames are not independent: coded by a forward predicHve coding method
(predicHon from a previous P‐frame is allowed — not just from a previous I‐
frame).
‐ Temporal redundancy removal is included in P‐frame coding, whereas I‐frame
coding performs only spa0al redundancy removal.
‐ To avoid propagaHon of coding errors, an I‐frame is usually sent a couple of
Hmes in each second of the video.
• MoHon vectors in H.261 are always measured in units of full pixel and
they have a limited range of ± 15 pixels, i.e., p = 15.
98 Li & Drew
99. Sistemi Mul+mediali ‐ DIS 2011
Intra‐frame (I‐frame) Coding
Fig. 10.5: I‐frame Coding.
• Macroblocks are of size 16 x 16 pixels for the Y frame, and 8 x 8 for Cb
and Cr frames, since 4:2:0 chroma subsampling is employed. A
macroblock consists of four Y, one Cb, and one Cr 8 x 8 blocks.
• For each 8 x 8 block a DCT transform is applied, the DCT coefficients
then go through quanHzaHon zigzag scan and entropy coding.
99 Li & Drew
100. Sistemi Mul+mediali ‐ DIS 2011
Inter-frame (P-frame) Predictive Coding
• Figure 10.6 shows the H.261 P‐frame coding scheme
based on moHon compensaHon:
‐ For each macroblock in the Target frame, a moHon vector is
allocated by one of the search methods discussed earlier.
‐ Amer the predicHon, a difference macroblock is derived to measure
the predic+on error.
‐ Each of these 8 x 8 blocks go through DCT, quanHzaHon, zigzag scan
and entropy coding procedures.
100 Li & Drew
101. Sistemi Mul+mediali ‐ DIS 2011
• The P‐frame coding encodes the difference macroblock (not the
Target macroblock itself).
• SomeHmes, a good match cannot be found, i.e., the predicHon error
exceeds a certain acceptable level.
‐ The MB itself is then encoded (treated as an Intra MB) and in this case it is
termed a non‐mo+on compensated MB.
• For a moHon vector, the difference MVD is sent for entropy coding:
MVD = MVPreceding − MVCurrent (10.3)
101 Li & Drew
103. Sistemi Mul+mediali ‐ DIS 2011
11.1 Overview
• MPEG: Moving Pictures Experts Group, established in
1988 for the development of digital video.
• It is appropriately recognized that proprietary interests
need to be maintained within the family of MPEG
standards:
– Accomplished by defining only a compressed bitstream
that implicitly defines the decoder.
– The compression algorithms, and thus the encoders, are
completely up to the manufacturers.
103 Li & Drew
104. Sistemi Mul+mediali ‐ DIS 2011
11.2 MPEG‐1
• MPEG‐1 adopts the CCIR601 digital TV format also known as
SIF (Source Input Format).
• MPEG‐1 supports only non‐interlaced video. Normally, its
picture resoluHon is:
– 352 × 240 for NTSC video at 30 fps
– 352 × 288 for PAL video at 25 fps
– It uses 4:2:0 chroma subsampling
• The MPEG‐1 standard is also referred to as ISO/IEC 11172.
It has five parts: 11172‐1 Systems, 11172‐2 Video, 11172‐3
Audio, 11172‐4 Conformance, and 11172‐5 Somware.
104 Li & Drew
105. Sistemi Mul+mediali ‐ DIS 2011
Mo0on Compensa0on in MPEG‐1
• MoHon CompensaHon (MC) based video encoding in H.
261 works as follows:
– In MoHon EsHmaHon (ME), each macroblock (MB) of the
Target P‐frame is assigned a best matching MB from the
previously coded I or P frame ‐ predic0on.
– predic0on error: The difference between the MB and its
matching MB, sent to DCT and its subsequent encoding
steps.
– The predicHon is from a previous frame — forward
predic0on.
105 Li & Drew
106. Sistemi Mul+mediali ‐ DIS 2011
Fig 11.1: The Need for BidirecHonal Search.
The MB containing part of a ball in the Target frame
cannot find a good matching MB in the previous frame
because half of the ball was occluded by another object.
A match however can readily be obtained from the next
frame.
106 Li & Drew
107. Sistemi Mul+mediali ‐ DIS 2011
Mo0on Compensa0on in MPEG‐1 (Cont’d)
• MPEG introduces a third frame type — B‐frames, and its accompanying bi‐
direcHonal moHon compensaHon.
• The MC‐based B‐frame coding idea is illustrated in Fig. 11.2:
– Each MB from a B‐frame will have up to two moHon vectors (MVs) (one from
the forward and one from the backward predicHon).
– If matching in both direcHons is successful, then two MVs will be sent and the
two corresponding matching MBs are averaged (indicated by ‘%’ in the figure)
before comparing to the Target MB for generaHng the predicHon error.
– If an acceptable match can be found in only one of the reference frames, then
only one MV and its corresponding MB will be used from either the forward or
backward predicHon.
107 Li & Drew
111. Sistemi Mul+mediali ‐ DIS 2011
11.3 MPEG‐2
• MPEG‐2: For higher quality video at a bit‐rate of more than
4 Mbps.
• Defined seven profiles aimed at different applicaHons:
– Simple, Main, SNR scalable, Spa0ally scalable, High, 4:2:2,
Mul0view.
– Within each profile, up to four levels are defined (Table 11.5).
– The DVD video specificaHon allows only four display resoluHons:
720×480, 704×480, 352×480, and 352×240
— a restricted form of the MPEG‐2 Main profile at the Main
and Low levels.
111 Li & Drew
112. Sistemi Mul+mediali ‐ DIS 2011
Table 11.5: Profiles and Levels in MPEG‐2
Level Simple Main SNR Spa0ally High 4:2:2 Mul0view
profile profile Scalable Scalable Profile Profile Profile
profile profile
High * *
High 1440 * * *
Main * * * * * *
Low * *
Table 11.6: Four Levels in the Main Profile of MPEG‐2
Level Max. Max Max
Max coded Applica0on
Resolu0on fps pixels/sec Data Rate
(Mbps)
High 1,920 × 1,152 60 62.7 × 106
80 film producHon
High 1440 1,440 × 1,152 60 47.0 × 106
60 consumer HDTV
Main 720 × 576 30 10.4 × 106
6 15 studio TV
Low 352 × 288 30 3.0 × 10 4 consumer tape equiv.
112 Li & Drew
113. Sistemi Mul+mediali ‐ DIS 2011
Suppor0ng Interlaced Video
• MPEG‐2 must support interlaced video as well since this is
one of the opHons for digital broadcast TV and HDTV.
• In interlaced video each frame consists of two fields,
referred to as the top‐field and the boZom‐field.
– In a Frame‐picture, all scanlines from both fields are interleaved
to form a single frame, then divided into 16×16 macroblocks
and coded using MC.
– If each field is treated as a separate picture, then it is called
Field‐picture.
113 Li & Drew
114. Sistemi Mul+mediali ‐ DIS 2011
Fig. 11.6: Field pictures and Field‐predicHon for Field‐pictures in MPEG‐2.
(a) Frame−picture vs. Field−pictures, (b) Field PredicHon for Field−pictures
114 Li & Drew
115. Sistemi Mul+mediali ‐ DIS 2011
Five Modes of Predic0ons
• MPEG‐2 defines Frame Predic0on and Field
Predic0on as well as five predicHon modes:
1. Frame Predic0on for Frame‐pictures: IdenHcal to
MPEG‐1 MC‐based predicHon methods in both P‐
frames and B‐frames.
2. Field Predic0on for Field‐pictures: A macroblock
size of 16 × 16 from Field‐pictures is used. For
details, see Fig. 11.6(b).
115 Li & Drew
116. Sistemi Mul+mediali ‐ DIS 2011
3. Field Predic0on for Frame‐pictures: The top‐field and
boiom‐field of a Frame‐picture are treated separately.
Each 16 × 16 macroblock (MB) from the target Frame‐
picture is split into two 16 × 8 parts, each coming from one
field. Field predicHon is carried out for these 16 × 8 parts
in a manner similar to that shown in Fig. 11.6(b).
4. 16×8 MC for Field‐pictures: Each 16×16 macroblock (MB)
from the target Field‐picture is split into top and boiom
16 × 8 halves. Field predicHon is performed on each half.
This generates two moHon vectors for each 16×16 MB in
the P‐Field‐picture, and up to four moHon vectors for each
MB in the B‐Field‐picture.
This mode is good for a finer MC when moHon is rapid and
irregular.
116 Li & Drew
117. Sistemi Mul+mediali ‐ DIS 2011
5. Dual‐Prime for P‐pictures: First, Field predicHon from each
previous field with the same parity (top or boiom) is
made. Each moHon vector mv is then used to derive a
calculated moHon vector cv in the field with the opposite
parity taking into account the temporal scaling and verHcal
shim between lines in the top and boiom fields. For each
MB the pair mv and cv yields two preliminary predicHons.
Their predicHon errors are averaged and used as the final
predicHon error.
This mode mimics B‐picture predicHon for P‐pictures
without adopHng backward predicHon (and hence with
less encoding delay).
This is the only mode that can be used for either Frame‐
pictures or Field‐pictures.
117 Li & Drew
118. Sistemi Mul+mediali ‐ DIS 2011
Alternate Scan and Field DCT
• Techniques aimed at improving the effecHveness of DCT on
predicHon errors, only applicable to Frame‐pictures in interlaced
videos:
– Due to the nature of interlaced video the consecuHve rows in the 8×8
blocks are from different fields, there exists less correlaHon between
them than between the alternate rows.
– Alternate scan recognizes the fact that in interlaced video the verHcally
higher spaHal frequency components may have larger magnitudes and
thus allows them to be scanned earlier in the sequence.
• In MPEG‐2, Field_DCT can also be used to address the same issue.
118 Li & Drew