Data Compression Algorithms Explained

•Als PPTX, PDF herunterladen•

1 gefällt mir•2,005 views

The document discusses different methods of data compression by modeling redundancy in data. It provides three examples: (1) exploiting a linear pattern in data points to compress to 2 bits per sample instead of 5 bits; (2) assigning shorter codes to more frequent symbols in a sequence to compress to 2.58 bits per symbol from 3 bits; and (3) using entropy coding which assigns codes based on symbol probabilities to maximize compression. The goal is to remove redundancy while preserving information content.

Ingenieurwesen

 The development of data compression algorithms
for a variety of data can be divided into two phases.
 Modeling
 Coding
 In modeling phase we try to extract information
about any redundancy that exists in the data and
describe the redundancy in the form of a model.
2

 A description of the model and a “description” of
how the data differ from the model are coded,
generally using a binary alphabet.
 The difference between the data and the model is
often referred to as the residual.
 In the following three examples we will look at three
different ways that data can be modeled. We will
then use the model to obtain compression.
3

 If binary representations of these numbers is to be
transmitted or stored, we would need to use 5 bits per
sample.
 By exploiting the structure in the data, we can represent
 the sequence using fewer bits.
 If we plot this data as shown in following Figure
4

 We see that the data
seem to fall on a straight
line.
 A model for the data
could therefore be a
straight line given by the
equation:
5

 To examine the difference between the data and the
model. The difference (or residual) is computed:
 En =An − Ān = 0 1 0 −1 1 −1 0 1 −1 −1 1 1
 The residual sequence consists of only three numbers −1 0 1.
 If we assign a code of 00 to −1, a code of 01 to 0, and a code of
10 to 1, we need to use 2 bits to represent each element of the
residual sequence.
 Therefore, we can obtain compression by transmitting or
storing the parameters of the model and the residual
sequence.
 The encoding can be exact if the required compression is to be
lossless, or approximate if the compression can be lossy.
6

 The number of distinct values has been reduced.
 Fewer bits are required to represent each number and
compression is achieved.
 The decoder adds each received value to the previous decoded
value to obtain the reconstruction corresponding to the received
value.
8

 The sequence is made up of eight different symbols.
 we need to use 3 bits per symbol.
 Say we have assigned a codeword with only a single
bit to the symbol that occurs most often, and
correspondingly longer codewords to symbols that
occur less often.
9

 If we substitute the codes for each symbol,
we will use 106 bits to encode the entire
sequence.
 As there are 41 symbols in the sequence, this
works out to approximately 2.58 bits per
symbol.
 This means we have obtained a compression
ratio of 1.16:1.
11

 Information
 Amount of Information
 Entropy
 Maximum Entropy
 Condition for maximum entropy
12

 Compression is achieved by removing data redundancy
while preserving information content.
 The information content of a group of bytes (a
message).
 Entropy is the measure of information content in a
message.
 Messages with higher entropy carry more information
than messages with lower entropy.
 Data with low entropy permit a larger compression ratio
than data with high entropy.
13

 How to determine the entropy
 Find the probability p(x) of symbol x in the message
 The entropy H(x) of the symbol x is:
H(x) = - p(x) • log2p(x)
 The average entropy over the entire message is the
sum of the entropy of all n symbols in the message.
14

Weitere ähnliche Inhalte

Was ist angesagt?

Data compressionSherif Abdelfattah

Introduction Data Compression/ Data compression, modelling and coding,Image C...Smt. Indira Gandhi College of Engineering, Navi Mumbai, Mumbai

Data compressionAbhishek Grover

Data compressionSir Issac Newton COllege

Data Compression - Text Compression - Run Length EncodingMANISH T I

data compression.hasan sarker

Data compressionAshutosh Kawadkar

Data CompressionShubham Bammi

Arithmetic Codinganithabalaprabhu

4 data compressionRejin Thomas

Introduction for Data Compression MANISH T I

lecture on data compressionDr Rajiv Srivastava

Text compressionSammer Qader

Compression techniquesm_divya_bharathi

Chapter%202%20 %20 Text%20compression(2)nes

Losslessanithabalaprabhu

Lzw coding technique for image compressionTata Consultancy Services

Arithmetic codingVikas Goyal

Dictionary Based Compressionanithabalaprabhu

Interpixel redundancyNaveen Kumar

Was ist angesagt? (20)

Data compression

Introduction Data Compression/ Data compression, modelling and coding,Image C...

Data compression

Data Compression - Text Compression - Run Length Encoding

data compression.

Data compression

Data Compression

Arithmetic Coding

4 data compression

Introduction for Data Compression

lecture on data compression

Text compression

Compression techniques

Chapter%202%20 %20 Text%20compression(2)

Lossless

Lzw coding technique for image compression

Arithmetic coding

Dictionary Based Compression

Interpixel redundancy

Ähnlich wie Data Compression Algorithms Explained

Lecture Notes: EEEC6440315 Communication Systems - Information TheoryAIMST University

Lecft3dataali Hussien

Implementation of Lossless Compression Algorithms for Text DataBRNSSPublicationHubI

4_Datalink__Error_Detection_and Correction.pdfkenilpatel65

B041306015ijceronline

Compression iiChandra Mohan Negi

Compression Iianithabalaprabhu

2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...Helan4

Neuro genetic key based recursive modulo 2 substitution using mutated charact...ijcsity

Huffman&Shannon-multimedia algorithms.pptPrincessSaro

Introduction to data compression.pptxjesna40

Advanced compression technique for random dataijitjournal

Data representation computer architecturestudy cse

arithmetic coding for signal and multimedia processingKaransingh415757

Arithmetic coding09lavee

Cs8591 Computer NetworksKathirvel Ayyaswamy

Huffman and Arithmetic coding - Performance analysisRamakant Soni

chapter one && two.pdfmiftah88

07 Data Link LayerError Control.pdfbaysahcmjames2kblax

Ähnlich wie Data Compression Algorithms Explained (20)

Lecture Notes: EEEC6440315 Communication Systems - Information Theory

Lecft3data

Implementation of Lossless Compression Algorithms for Text Data

4_Datalink__Error_Detection_and Correction.pdf

B041306015

Compression ii

Compression Ii

2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...

Neuro genetic key based recursive modulo 2 substitution using mutated charact...

Huffman&Shannon-multimedia algorithms.ppt

Introduction to data compression.pptx

Advanced compression technique for random data

Data representation computer architecture

arithmetic coding for signal and multimedia processing

Arithmetic coding

Cs8591 Computer Networks

Huffman and Arithmetic coding - Performance analysis

chapter one && two.pdf

07 Data Link LayerError Control.pdf

Kürzlich hochgeladen

An experimental study in using natural admixture as an alternative for chemic...Chandu841456

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar

Work Experience-Dalton Park.pptxfvvvvvvvLewisJB

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani

Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721

Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani

Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665

Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423

Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3

Architect Hassan Khalil Portfolio for 2024hassan khalil

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR

Oxy acetylene welding presentation note.eptoze12

complete construction, environmental and economics information of biomass com...asadnawaz62

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N

young call girls in Green Park🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Indian Dairy Industry Present Status and.pptMadan Karki

POWER SYSTEMS-1 Complete notes examplesDr. Gudipudi Nageswara Rao

Kürzlich hochgeladen (20)

An experimental study in using natural admixture as an alternative for chemic...

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger

Work Experience-Dalton Park.pptxfvvvvvvv

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf

Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync

Instrumentation, measurement and control of bio process parameters ( Temperat...

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf

Call Girls Delhi {Jodhpur} 9711199012 high profile service

Vishratwadi & Ghorpadi Bridge Tender documents

Concrete Mix Design - IS 10262-2019 - .pptx

Architect Hassan Khalil Portfolio for 2024

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...

Oxy acetylene welding presentation note.

complete construction, environmental and economics information of biomass com...

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)

young call girls in Green Park🔝 9953056974 🔝 escort Service

Indian Dairy Industry Present Status and.ppt

POWER SYSTEMS-1 Complete notes examples

Data Compression Algorithms Explained

1. Mathematical Preliminaries 1

2.  The development of data compression algorithms for a variety of data can be divided into two phases.  Modeling  Coding  In modeling phase we try to extract information about any redundancy that exists in the data and describe the redundancy in the form of a model. 2

3.  A description of the model and a “description” of how the data differ from the model are coded, generally using a binary alphabet.  The difference between the data and the model is often referred to as the residual.  In the following three examples we will look at three different ways that data can be modeled. We will then use the model to obtain compression. 3

4.  If binary representations of these numbers is to be transmitted or stored, we would need to use 5 bits per sample.  By exploiting the structure in the data, we can represent  the sequence using fewer bits.  If we plot this data as shown in following Figure 4

5.  We see that the data seem to fall on a straight line.  A model for the data could therefore be a straight line given by the equation: 5

6.  To examine the difference between the data and the model. The difference (or residual) is computed:  En =An − Ān = 0 1 0 −1 1 −1 0 1 −1 −1 1 1  The residual sequence consists of only three numbers −1 0 1.  If we assign a code of 00 to −1, a code of 01 to 0, and a code of 10 to 1, we need to use 2 bits to represent each element of the residual sequence.  Therefore, we can obtain compression by transmitting or storing the parameters of the model and the residual sequence.  The encoding can be exact if the required compression is to be lossless, or approximate if the compression can be lossy. 6

7. 7

8.  The number of distinct values has been reduced.  Fewer bits are required to represent each number and compression is achieved.  The decoder adds each received value to the previous decoded value to obtain the reconstruction corresponding to the received value. 8

9.  The sequence is made up of eight different symbols.  we need to use 3 bits per symbol.  Say we have assigned a codeword with only a single bit to the symbol that occurs most often, and correspondingly longer codewords to symbols that occur less often. 9

10. 10

11.  If we substitute the codes for each symbol, we will use 106 bits to encode the entire sequence.  As there are 41 symbols in the sequence, this works out to approximately 2.58 bits per symbol.  This means we have obtained a compression ratio of 1.16:1. 11

12.  Information  Amount of Information  Entropy  Maximum Entropy  Condition for maximum entropy 12

13.  Compression is achieved by removing data redundancy while preserving information content.  The information content of a group of bytes (a message).  Entropy is the measure of information content in a message.  Messages with higher entropy carry more information than messages with lower entropy.  Data with low entropy permit a larger compression ratio than data with high entropy. 13

14.  How to determine the entropy  Find the probability p(x) of symbol x in the message  The entropy H(x) of the symbol x is: H(x) = - p(x) • log2p(x)  The average entropy over the entire message is the sum of the entropy of all n symbols in the message. 14

15. 15

Data Compression Algorithms Explained

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Data Compression Algorithms Explained

Ähnlich wie Data Compression Algorithms Explained (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Data Compression Algorithms Explained