3. Data Compression
• It is the process of reducing the amount of
data required to represent a given quantity of
information.
• Save time and help to optimize resources
i. In sending data over communication line:
less time to transmit and less storage to
host.
• Reduce length of data transmission time over
the network.
3
4. Types of Data Compression
1. Lossless Compression.
2. Lossy Compression.
4
5. Data Compression Algorithm
• Lossless Compression
If the compressed data is reconstructed and
the original data is obtain without any loss of
information then this technique is called
lossless compression.
5
6. Types of Data Compression
• Lossy Compression
If the compressed data is reconstructed and
the approximation of original data is obtain
and loss of information occurs then this
technique is called lossy compression.
The original message can never be recovered
exactly as it was before it was compressed.
6
7. Types of Data Compression
Lossless
Original Data
Compressed
Data
Original Data
Approximated
7
9. Lossless Algorithm
• Huffman Encoding
There’s no unique Huffman code.
Algorithm:
①
②
③
Make a leaf node for each code symbol
Add the generation probability of each symbol to the leaf node
Take the two leaf nodes with the smallest probability and connect them into
a new node
Add 1 or 0 to each of the two branches
The probability of the new node is the sum of the probabilities of the
two connecting nodes
If there is only one node left, the code construction is completed. If not, go
back to (2)
9
12. Lossless Algorithm
• Lempel Ziv Walsh (LZW) Encoding
• It is dictionary based encoding
• Basic idea:
Create a dictionary or a code table of strings used during
communication.
If both sender and receiver have a copy of the dictionary, then
previously-encountered strings can be substituted by their index
in the dictionary.
• LZW to compress text, executable code and similar
data files to about one half their original size.
12
13. LZW Coding
Have 2 phases:
Building an indexed dictionary
Compressing a string of symbols
• Algorithm:
Extract the smallest substring that cannot be found in the
remaining uncompressed string.
Store that substring in the dictionary as a new entry and
assign it an index value
Substring is replaced with the index found in the dictionary
Insert the index and the last character of the substring into
the compressed string
13
18. Lossless Algorithm
• Run Length Encoding
• Simplest method of compression.
• Replace consecutive repeating occurrences of
a symbol by 1 occurrence of the symbol itself,
then followed by the number of occurrences.
18
19. Run Length Coding
• The method can be more efficient if the data
uses only 2 symbols (0s and 1s) in bit patterns
and 1 symbol is more frequent than another.
19
20. Lossy Algorithm
• Used for compressing images and video files (our
eyes cannot distinguish subtle changes, so lossy data
is acceptable).
• These methods are cheaper, less time and space.
• Several methods:
JPEG: compress pictures and graphics
MPEG: compress video
MP3: compress audio
20
21. Lossy Algorithm
• JPEG Encoding
• Used to compress pictures and graphics.
• In JPEG, a grayscale picture is divided into 8x8
pixel blocks to decrease the number of calculations.
• Basic idea:
Change the picture into a linear (vector) sets of numbers that
reveals the redundancies.
The redundancies is then removed by one of lossless
21
23. JPEG Encoding
• DCT(Discrete Cosine Transform)
• Image is divided into blocks of 8*8 pixels.
• For grey-scale images, pixel is represented by 8-bits.
• For color images, pixel is represented by 24-bits or
three 8-bit groups.
23
24. JPEG Encoding
• DCT takes an 8*8 matrix and produces another 8*8
matrix and its values represent by this equation:
T[i][j] = 0.25 C(i) C(j) ∑ ∑ P[x][y] Cos(2x+1)iπ/16
* Cos (2y+1)jπ/16
i = 0, 1, …7, j = 0, 1, …7
C(i) = 1/√2, i =0
= 1 otherwise
T contains values called ‘Spatial frequencies’
24
25. JPEG Encoding
• T[0][0] is called the DC coefficient, related to
average values in the array, Cos 0 = 1
• Other values of T are called AC coefficients, cosine
functions of higher frequencies
Case 1: all P’s are same, image of single color with no variation
at all, AC coefficients are all zeros.
Case 2: little variation in P’s, many, not all, AC coefficients are
zeros.
Case 3: large variation in P’s, a few AC coefficients are zeros.
25
29. JPEG Encoding
• Quantization
• Provides an way of ignoring small differences in an
image that may not be perceptible.
• Another array Q is obtained by dividing each
element of T by some number and rounding-off to
nearest integer ( The loss is occur in this step)
29
31. JPEG Encoding
• Dividing T-elements by the same number is not
practical, may result in too much loss.
• Retain the effects of lower frequencies as much as
possible.
• To define a quantization array Q, then
Q[i][j] = Round (T[i][j] ÷ U[i][j]),
i = 0, 1,.........7, j = 0, 1, …7
31
33. JPEG Encoding
• Compression
Quantized values are read from the matrix and redundant
0s are removed.
To cluster the 0s together, the matrix is read diagonally
in an zigzag fashion. The reason is if the matrix doesn’t
have fine changes, the bottom right corner of the matrix
is all 0s.
JPEG usually uses lossless run-length encoding at the
compression phase.
33
36. MPEG Encoding
Used to compress video.
Basic idea:
Each video is a rapid sequence of a set of frames. Each
frame is a spatial combination of pixels, or a picture.
Compressing video =
spatially compressing each frame
+
temporally compressing a set of frames.
36
37. MPEG Encoding
• Spatial Compression
Each frame is spatially compressed by JPEG.
• Temporal Compression
Redundant frames are removed.
Usually little difference from one to the next frame
is suitable for compression.
{For example, in a static scene in which someone is
talking, most frames are the same except for the
segment around the speaker’s lips, which changes
from one frame to the next}.
Three frames: I, P, B.
37
38. MPEG Encoding
• I (intra-picture) frame: Just a JPEG encoded
image.
• P (predicted) frame: Encoded by computing
the differences between a current and a
previous frame.
• B (bidirectional) frame: Similar to P-frame
except that it is interpolated between previous
and future frame.
38
39. MPEG Encoding
• order in which to transmit: I – P – I
P is sandwiched between groups of B
P is the difference from the previous I-frame
B is interpolated from the nearest I and P
order in which to display I – B – B – P – B – B – I
39