4. Background
●
●
●
In information theory, coding refers to methods that
represent data in terms of bit sequences (sequences
of 0's and 1's)
Encoding is a method of taking data structures and
mapping them to bit sequences
Decoding is a method of taking bit sequences and
outputting the corresponding data structure
5. Example: Standard ASCII & Unicode
●
Standard ASCII encodes each character as a 7-bit sequence
●
Using 7 bits allows us to encode 27 possible characters
●
●
●
Unicode has three standards: UTF-8 (uses 8-bit sequences),
UTF-16 (uses 16-bit sequences), and UTF-32 (uses 32-bit
sequences)
UTF stands for Unicode Transformation Format
Python 2.X's Unicode support: “Python represents Unicode strings as either 16- or 32-bit integers), depending on how the Python interpreter was compiled.”
6. Two Types of Codes
●
●
●
There are two types of codes: fixed-length and variable-length
Fixed-length (e.g., ASCII, Unicode) codes encode every
character in terms of the same number of bits
Variable-length codes (e.g., Morse, Huffman) encode characters in terms of variable numbers of bits: more frequent symbols are encoded with fewer bits
7. Example: Fixed-Length Code
●
A – 000
C – 010
E – 100
G – 110
●
B – 001
D – 011
F – 101
H – 111
●
AADF = 000000011101
●
The encoding of AADF is 12 bits
8. Example: Variable-Length Code
●
A–0
C – 1010
●
B – 100
●
AADF = 0010111101
●
The encoding of AADF is 10 bits
D – 1011
E – 1100
F – 1101
G – 1110
H – 1111
9. End of Character in Variable-Length Code
●
●
●
One of the challenges in variable-length codes is knowing
where one character ends and the one begins
Morse uses a special character (separator code)
Prefix coding is another solution: the prefix of every
character is unique – no code of any character
starts another character
10. Huffman Code
●
●
●
●
Huffman code is a variable-length code that takes advantage of relative frequencies of characters
Huffman code is named after David Huffman, the researcher who discovered it
Huffman code is represented as a binary tree where leaves
are individual characters and their frequencies
Each non-leaf node is a set of characters in all of its subnodes and the sum of their relative frequencies
12. Using Huffman Tree to Encode/Decode
Characters
●
The tree on the previous slide, these are the encodings:
A is encoded as 0
B is encoded as 100
C is encoded as 1010
D is encoded as 1011
E is encoded as 1100
F is encoded as 1101
G is encoded as 1110
H is encoded as 1111
15. Constructing Leaves
### a leaf is a tuple whose first element is symbol
### represented as a string and whose second element is
### the symbol's frequency
def make_leaf(symbol, freq):
return (symbol, freq)
def is_leaf(x):
return isinstance(x, tuple) and
len(x) == 2 and
isinstance(x[0], str) and
isinstance(x[1], int)
16. Constructing Leaves
### return the character (symbol) of the leaf
def get_leaf_symbol(leaf):
return leaf[0]
### return the frequency of the leaf's character
def get_leaf_freq(leaf):
return leaf[1]
17. Constructing Huffman Trees
### A Non-Leaf node (internal node) is represented as
### a list of four elements:
### 1. left brach
### 2. right branch
### 3. list of symbols
### 4. combined frequency of symbols
[left_branch, right_branch, symbols, frequency]
19. Accessing Huffman Trees
def get_symbols(huff_tree):
if is_leaf(huff_tree):
return [get_leaf_symbol(huff_tree)]
else:
return huff_tree[2]
def get_freq(huff_tree):
if is_leaf(huff_tree):
return get_leaf_freq(huff_tree)
else:
return huff_tree[3]
20. Constructing Huffman Trees
### A Huffman tree is constructed from its left branch, which can
### be a huffman tree or a leaf, and its right branch, another
### huffman tree or a leaf. The new tree has the symbols of the
### left branch and the right branch and the frequency of the left
### branch and the right branch
def make_huffman_tree(left_branch, right_branch):
return [left_branch,
right_branch,
get_symbols(left_branch) + get_symbols(right_branch),
get_freq(left_branch) + get_freq(right_branch)]
27. Symbol Encoding
1. Given a symbol s and a Huffman tree ht, set current_node to the root
node and encoding to an empty list (you can also check if s is in the root
node's symbol leaf and, if not, signal error)
2. If current_node is a leaf, return encoding
3. Check if s is in current_node's left branch or right branch
4. If in the left, add 0 to encoding, set current_node to the root of the left
branch, and go to step 2
5. If in the right, add 1 to encoding, set current_node to the root of the
right branch, and go to step 2
6. If in neither branch, signal error
28. Example
●
Encode B with the sample Huffman tree
●
Set current_node to the root node
●
●
●
●
B is in current_node's the right branch, so add 1 to encoding &
recurse into the right branch (current_node is set to the root of the
right branch – {B, C, D, E, F, G, H}: 9)
B is in current_node's left branch, so add 0 to encoding and recurse into the left branch (current_node is {B, C, D}: 5)
B is in current_node's left branch, so add 0 to encoding & recurse
into the left branch (current_node is B: 3)
current_node is a leaf, so return 100 (value of encoding)
29. Message Encoding
●
●
●
Given a sequence of symbols message and a Huffman
tree ht
Concatenate the encoding of each symbol in message
from left to right
Return the concatenation of encodings
30. Example
●
Encode ABBA with the sample Huffman tree
●
Encoding for A is 0
●
Encoding for B is 100
●
Encoding for B is 100
●
Encoding for A is 0
●
Concatenation of encodings is 01001000
31. Message Decoding
1. Given a sequence of bits message and a Huffman tree ht, set current_node to
the root and decoding to an empty list
2. If current_node is a leaf, add its symbol to decoding and set current_node to
ht's root
3. If current_node is ht's root and message has no more bits, return decoding
4. If no more bits in message & current_node is not a leaf, signal error
5. If message's current bit is 0, set current_node to its left child, read the bit, & go
to step 2
6. If message's current bit is 1, set current_node to its right child, read the bit, &
go to step 2
32. Example
●
●
Decode 0100 with the sample Huffman tree
Read 0, go left to A:8 & add A to decoding and reset
current_node to the root
●
Read 1, go right to {B, C, D, E, F, G, H}: 9
●
Read 0, go left to {B, C, D}:5
●
Read 0, go left to B:3
●
Add B to decoding & reset current_node to the root
●
No more bits & current_node is the root, so return AB
34. List Comprehension
●
●
List comprehension is an syntactic construct in some
programming languages for building lists from list specifications
List comprehension derives its conceptual roots from
the set-former (set-builder) notation in mathematics
[Y for X in LIST]
●
List comprehension is available in other programming
languages such as Common Lisp, Haskell, and Ocaml
35. Set-Former Notation Example
4 x | x N , x
100
4 x is the output function
x is the variable
N is the input set
2
x 100 is the predicate
2
36. Set-Former Notation Examples
x a, b | x 3is the set of all strings over a, b
*
whose length is 0, 1, 2, or 3.
a b
n
n
| n 1 is the set of non - empty strings over a, b such
that a ' s precede b' s and the number of a ' s is equal to
the number of b' s.
xy | x a, b, y aa, ccis the set of strings where
a or b is followed by aa or cc.
37. For-Loop Implementation
### building the list of the set-former example with forloop
>>> rslt = []
>>> for x in xrange(201):
if x ** 2 < 100:
rslt.append(4 * x)
>>> rslt
[0, 4, 8, 12, 16, 20, 24, 28, 32, 36]
38. List Comprehension Equivalent
### building the same list with list comprehension
>>> s = [ 4 * x for x in xrange(201) if x ** 2 < 100]
>>> s
[0, 4, 8, 12, 16, 20, 24, 28, 32, 36]
39. For-Loop
### building list of squares of even numbers in [0, 10]
### with for-loop
>>> rslt = []
>>> for x in xrange(11):
if x % 2 == 0:
rslt.append(x**2)
>>> rslt
[0, 4, 16, 36, 64, 100]
40. List Comprehension Equivalent
### building the same list with list comprehension
>>> [x ** 2 for x in xrange(11) if x % 2 == 0]
[0, 4, 16, 36, 64, 100]
41. For-Loop
## building list of squares of odd numbers in [0,
10]
>>> rslt = []
>>> for x in xrange(11):
if x % 2 != 0:
rslt.append(x**2)
>>> rslt
[1, 9, 25, 49, 81]
42. List Comprehension Equivalent
## building list of squares of odd numbers [0, 10]
## with list comprehension
>>> [x ** 2 for x in xrange(11) if x % 2 != 0]
[1, 9, 25, 49, 81]
44. For-Loop
>>> rslt = []
>>> for x in xrange(6):
if x % 2 == 0:
for y in xrange(6):
if y % 2 != 0:
rslt.append((x, y))
>>> rslt
[(0, 1), (0, 3), (0, 5), (2, 1), (2, 3), (2, 5), (4, 1), (4,
3), (4, 5)]
45. List Comprehension Equivalent
>>> [(x, y) for x in xrange(6) if x % 2 == 0
for y in xrange(6) if y % 2 != 0]
[(0, 1), (0, 3), (0, 5), (2, 1), (2, 3), (2, 5), (4, 1), (4,
3), (4, 5)]
47. List Comprehension with Matrices
●
List comprehension can be used to scan rows and columns in matrices
>>> matrix = [
[10, 20, 30],
[40, 50, 60],
[70, 80, 90]
]
### extract all rows
>>> [r for r in matrix]
[[10, 20, 30], [40, 50, 60], [70, 80, 90]]
48. List Comprehension with Matrices
>>> matrix = [
[10, 20, 30],
[40, 50, 60],
[70, 80, 90]
]
### extract column 0
>>> [r[0] for r in matrix]
[10, 40, 70]
49. List Comprehension with Matrices
>>> matrix = [
[10, 20, 30],
[40, 50, 60],
[70, 80, 90]
]
### extract column 1
>>> [r[1] for r in matrix]
[20, 50, 80]
50. List Comprehension with Matrices
>>> matrix = [
[10, 20, 30],
[40, 50, 60],
[70, 80, 90]
]
### extract column 2
>>> [r[2] for r in matrix]
[30, 60, 90]
51. List Comprehension with Matrices
### turn matrix columns into rows
>>> rslt = []
>>> for c in xrange(len(matrix)):
rslt.append([matrix[r][c]
xrange(len(matrix))])
for
>>> rslt
[[10, 40, 70], [20, 50, 80], [30, 60, 90]]
r
in
52. List Comprehension with Matrices
●
List comprehension can work with iterables (e.g., dictionaries)
>>> dict = {'a' : 'A', 'bb' : 'BB', 'ccc' : 'CCC'}
>>> [(item[0], item[1], len(item[0]+item[1]))
for item in dict.items()]
[('a', 'A', 2), ('ccc', 'CCC', 6), ('bb', 'BB', 4)]
53. List Comprehension
●
If the expression inside [ ] is a tuple, parentheses are a must
>>> cubes = [(x, x**3) for x in xrange(5)]
>>> cubes
[(0, 0), (1, 1), (2, 8), (3, 27), (4, 64)]
●
Sequences can be unpacked in list comprehension
>>> sums = [x + y for x, y in cubes]
>>> sums
[0, 2, 10, 30, 68]
54. List Comprehension
●
for-clauses in list comprehensions can iterate over
any sequences:
>>> rslt = [ c * n for c in 'math' for n in (1, 2,
3)]
>>> rslt
['m', 'mm', 'mmm', 'a', 'aa', 'aaa', 't', 'tt','ttt', 'h',
'hh', 'hhh']
55. List Comprehension & Loop Variables
●
The loop variables used in the list comprehension for-loops
(and in regular for-loops) stay after the execution.
>>> for i in [1, 2, 3]: print i
1
2
3
>>> i + 4
7
>>> [j for j in xrange(10) if j % 2 == 0]
[0, 2, 4, 6, 8]
>>> j * 2
18
56. When To Use List Comprehension
●
For-loops are easier to understand and debug
●
List comprehensions may be harder to understand
●
●
●
List comprehensions are faster than for-loops in the interpreter
List comprehensions are worth using to speed up simpler
tasks
For-loops are worth using when logic gets complex