4. Data Compression
Muhammad Raza Master (B12101085)
Muhammad Ali Mehmood (B12101065)
Syed Faraz Naqvi (B12101123)
-Department of Computer Science, University of Karachi
12. • Image Compression
• Audio Compression
• Video compression
• All Sort of Data
Compression
13.
14. TREE
• Sum of children’s frequency
• Reference of B-Tree(0/1)
* Char variable * Frequency * Reference of B-Tree(0/1)
15. APPLICATION
• Find an object with a certain property in a collection of
objects of a certain type
• Items in a list be stored so that an item can be easily
located
• Efficient encoding of set of characters by bit strings
16. TRAVERSING IN TREE
• IN-ORDER TRAVERSAL
• PREORDER TRAVERSAL
• POSTORDER TRAVERSAL
17. 4 12 18 24
10 22
31 44 66 90
35 70
15 50
25
Pre-Order In-Order Post-order
1. Visit the root Traverse the left subtree Traverse the left subtree
2. Traverse the left subree Visit the root Traverse the right subtree
3. Traverse the right subtree Traverse the right subtree Visit the root
Pre-Order: 25, 15, 10, 4, 12, 22, 18, 24, 50, 35, 31, 44, 70, 66, 90
In-Order: 4, 10, 12, 15, 18, 22, 24, 25, 31, 35, 44, 50, 66, 70, 90
Post Order: 4, 12, 10, 18, 24, 22, 15, 31, 44, 35, 66, 90, 70, 50, 25
18.
19. • By Dr. David Huffman (1952)
• First data compression algorithm
• An example of ‘LOSSLESS DATA COMPRESSION’
• Binary tree is used to construct Huffman encoding
algorithm
Introduction
20. Basic Idea
Largest occurring char has the least encoded bit.
Save bits by encoding frequently used characters with
fewer bits than rarely used characters
21.
22. HUFFMAN(X)
• Compute frequency f(c) for each character c in X.
• Let Q be an empty priority queue
• Insert every character c into Q as singleton trees
with key f(c)
• while Q.SIZE() > 1
– Do
• f1 ← Q.MIN-KEY()
• T1 ← Q.REMOVE-MIN()
• f2 ← Q.MIN-KEY()
• T2 ← Q.REMOVE-MIN()
• Let T be a new tree with left subtree T1 and right subtree T2
• Q.INSERT(T, f1 + f2)
• Return Q.REMOVE-MIN()
23. it was the best of times it was the worst of times.
Symbol Count
LF 1
b 1
r 1
f 2
h 2
m 2
a 2
w 3
o 3
i 4
e 5
s 6
t 8
space 11
(full stop) = LF
Example:
24.
25. Symbol Bits
LF 101010
b 101011
r 10100
f 11000
h 11001
m 11010
a 11011
w 0010
o 0011
i 1011
e 000
s 100
t 111
space 01
28. m = HumeraTariq
Symbol Bits
H 0000
u 0001
m 0010
e 0011
r 10
a 11
T 0100
i 0101
q 0110
Compressed Bit-stream
C(m) = 000000010010001110110100111001010110
29.
30. The length of the encoded bit-stream is the sum over all
letters of the number of occurrences times the number of
bits per occurrence
Compressed bit-stream = frequency * Distance
31.
32. E.g: m= HumeraTariq
• At distance:
– 4: six leaf (‘H’, ‘u’, ‘m’, ‘e’, ‘T’, ‘i’, with total
frequency 6)
– 3: one leaf (‘q’, with frequency 1)
– 2: two leaf nodes (‘r’ and ‘a’, with total frequency
4)
• Compressed bit-stream = frequency * Distance
• total = 4·6 + 3·1 + 2.4 = 35 is the length of compressed
bit-stream as expected
Proved!!
33.
34. Let d be the number of symbols, n be the length of the
input
Huffman’s algorithm runs in O(n + d log d) time
35.
36. We can apply it to any bytestream
Milestone of LZW compression
37. REFERENCES
• Robert Sedgewick and Kevin Wayne - Algorithms, (4th edition)
• https://blog.itu.dk/BADS-F2009/files/2009/04/46-huffman.pdf
• Discrete Mathematics and Its Applications (7th Edition-Rosen)