Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Pbl1
1.
2.
Data compression (bit-rate reduction) involves encoding
information using fewer bits than the original
representation.
The process of reducing the size of a data file is popularly
referred to as data compression, although its formal name is
source coding (coding done at the source of the data before
it is stored or transmitted).[
Compression can be either lossy or lossless.
Lossless compression reduces bits by identifying and
eliminating statistical redundancy. No information is lost in
lossless compression.
Lossy compression reduces bits by identifying unnecessary
information and removing it.
3.
The lossless data compression method essentially has two steps: Analyze the files
and then eliminate the redundant data found within them.
For example, if a file compressor analyzed and eliminated all the repeated words in
a document file, the result would be a document with about 60 percent fewer
words. Such is the case with compressed files. Your application analyzes the file
and removes all the equivalent superfluous data bits, and shrinks the overall size of
the file.
However, if you attempted to read the article with the omitted words, it wouldn't
make any sense. Therefore, the file compression applications insert placeholders
where those eliminated words were.
When you extract the file, the application automatically restores the repeated words
to their places, making the file readable. Because no data is lost, this method is
called lossless compression.
4.
Lossy data compression is the converse of lossless data compression. In these
schemes, some loss of information is acceptable. Dropping nonessential detail from
the data source can save storage space.
Lossy data compression schemes are informed by research on how people perceive
the data in question.
For example, the human eye is more sensitive to subtle variations inluminance than
it is to variations in color.
JPEG image compression works in part by rounding off nonessential bits of
information. There is a corresponding trade-off between information lost and the
size reduction.
A number of popular compression formats exploit these perceptual differences,
including those used in music files, images, and video.
5.
Data compression works by finding patterns in data that occur frequently,
and changing their representation to something short, so that the total
amount of data is reduced without sacrificing any useful information.
For example, suppose you have a stream of data that consists of only ones
and zeros, like this: 10100001010101001. And suppose that you know that
this data stream usually contains a lot more zeros than ones; that is, a
stream is more likely to be 100000100000101000000 than
111110111011011111.
In this case, you can develop a way of abbreviating the zeros so that they
take up less space. You can define A as representing a one, B as
representing a single zero, and C as representing four consecutive zeros.
Now suppose you have a data stream like this:
100000100100000000000001010010000
6.
After you encode it, it will look like ACBABBACCCBABABBAC. Notice that
this is shorter than the original, because your encoding method helped
abbreviate long strings of consecutive zeros. This is data compression.
In order for data compression to work, the data stream must not be random.
There has to be some sort of pattern in it, or you can't compress it. For
example, if the stream contains ones and zeros, but there's no pattern, and
neither ones nor zeroes are more common, then you can't compress the data
stream, because there's nothing predictable about it.
If you want a more formal definition, data compression consists of a way of
encoding a set of input messages into a set of output messages such that the
most common input messages encode to the shortest output messages, and the
least common input messages encode to the longest output messages. As long
as the input messages are not randomly distributed, this will result in an output
stream that is shorter than the input stream. It's all information theory.
7.
The objective of image compression is to
reduce irrelevance and redundancy of the
image data in order to be able to store
or transmit data in an efficient form.
Image compression may be lossy or lossless.
8.
Lossless compression is preferred for archival purposes and
often for medical imaging, technical drawings, clip art, or
comics.
Lossless compression is possible because most real-world data
has statistical redundancy.
For example, an image may have areas of colour that do not
change over several pixels; instead of coding "red pixel, red
pixel, ..." the data may be encoded as "279 red pixels". This is
a basic example of run-length encoding; there are many
schemes to reduce file size by eliminating redundancy.
9. Methods for lossless image compression are:
Run-length encoding – used as default method
in PCX and as one of possible
in BMP, TGA, TIFF
DPCM and Predictive Coding
Entropy encoding
Adaptive dictionary algorithms such as LZW –
used in GIF and TIFF
Deflation – used in PNG, MNG, and TIFF
Chain codes
10.
Lossy compression methods, especially when used at low bit
rates, introduce compression artifacts.
Lossy methods are especially suitable for natural images such
as photographs in applications where minor (sometimes
imperceptible) loss of fidelity is acceptable to achieve a
substantial reduction in bit rate. The lossy compression that
produces imperceptible differences may be called visually
lossless.
Lossy image compression can be used in digital cameras, to
increase storage capacities with minimal degradation of
picture quality. Similarly, DVDs use the lossy MPEG-2 Video
codec for video compression.
11.
Methods for lossy compression:
Reducing the color space to the most common colors in the image. The
selected colors are specified in the color palette in the header of the
compressed image. Each pixel just references the index of a color in the
color palette, this method can be combined with dithering to
avoid posterization.
In contrast to lossless compression, which retains the integrity of the
original file, the so-called lossy data compression method scans the file
being compressed to determine what information the file can do without. It
then eliminates those bits completely, with no method to retrieve that data.
This method is akin to taking picture with your camera phone, opening a
photo-editing app, cropping off the edges of the picture and then sending it
to a friend. The recipient of that message cannot restore the pixels you
cropped off before you sent the image. Such is the case with lossy
compression. While this is method is more effective at reducing the size of
the file, you won't be able to restore the file to its original state when you
extract the file on the back end of the process.
12.
What is the so-called image compression
coding?
To store the image into bit-stream as compact
as possible and to display the decoded image in
the monitor as exact as possible.
13.
The image file is converted into a series of
binary data, which is called the bit-stream.
The decoded receives the encoded bitstream and decoded it to reconstruct the
image.
The total data quantity of the bit-stream is
less than the total data quantity of the
original image.
14.
15.
16. GIF – Graphics Interchange Format
Compressed but do not lose any of the
original data (loseless)
Limited to 256 colors
Still patented in a few countries
PNG – Portable Network Graphics
Up to 48 bits worth of color
New graphic format
17.
JPEG: Joint Photographic Experts Group – an
international standard since 1992.
Compresses the data but can lose some of
the original content (lossy).
Contains millions of colors.
Works with colour and greyscale images.
Up to 24 bit colour images (Unlike GIF)
Target photographic quality images (Unlike
GIF)
Suitable for many applications e.g.,satellite,
medical, general photography.
18.
19.
20.
“Audio compression is a way to reduce the
size of the audio file.”
A form of data compression designed to
reduce the size of audio files
Audio compression can be lossless or lossy
Audio compression algorithms are typically
referred to as audio codecs.
21. 2 types of Audio Compression
Lossless - allows one to preserve an exact copy
of one's audio files
Usage: For archival purposes, editing, audio
quality.
Lossy - irreversible changes , achieves far greater
compression, use psychoacoustics to recognize
that not all data in an audio stream can be
perceived by the human auditory system.
Usage: distribution of streaming audio, or
interactive applications
22.
Codecs:
Lossless
Lossy
Free Lossless Audio Codec
(FLAC)
MP2- MPEG-1Layer 2 audio
codec
Apple Lossless
MP3 – MPEG-1 Layer 3 audio
codec
MPEG-4 ALS
MPC Musepack
Monkey's Audio
Vorbis Ogg Vorbis
Lossless Predictive Audio
Compression (LPAC)
AAC Advanced Audio Coding
(MPEG-2 and MPEG-4)
Lossless Transform Audio
Compression (LTAC)
WMA Windows Media Audio
AC3 AC-3 or Dolby Digital
A/52
23.
Motion Picture Experts Group
An ISO standard for high-fidelity audio
compression.
An ISO/IEC working group, established in
1988 to develop standards for digital audio
and video formats.
24.
MPEG-1
-
Designed for up to 1.5 Mbit/sec.
Is used to compress video and is designed for
specially for Video CD (VCD).
MPEG-2
-
-
Designed for between 1.5 and 15 Mbit/sec.
Similar to MPEG-1, but it can be used for more
applications.
Transmission rates are more than double the
transmission for MPEG-1.
Works with HDTV and DVD.
25.
-
-
-
MPEG-4
Designed specially for the Internet.
Provides greater audio and video interactivity than
previous MPEG versions.
It allows developers to control objects
independently in a scene.
MPEG-4 includes the capability of representing
natural and synthesized sound and also support
natural textures, images, photograph, natural video
and animated video.
27.
-
-
MP3
The name of the file extension and also the
name of the type of file for MPEG.
A popular audio file that can be opened in
Windows Media Player and many other players.
WAV
WAV files are a format for sound files
developed by Microsoft with a .wav file
extension.
28.
-
-
Ogg
Is an audio compression format, comparable
to other formats used to store and play
digital music, but differs in that it is free,
open and unpatented.
It uses Vorbis, a specific audio compression
scheme that's designed to be contained in
Ogg.
29.
-
-
WMA
Short for Windows Media Audio
WMA is a Microsoft file format for encoding
digital audio files similar to MP3 though can
compress files at a higher rate than MP3.
WMA files, which use the ".wma" file
extension, can be of any size compressed to
match many different connection speeds, or
bandwidths.
30.
31.
Once a video signal is digital, it requires a large amount
of storage space and transmission bandwidth.
To reduce the amount of data, several strategies are
employed that compress the information without
negatively affecting the quality of the image.
Storing and transmitting uncompressed raw video is not
an efficient technique because it needs large amounts of
storage and bandwidth.
Digital Versatile Disk (DVD), DSS, and internet video,
all use digital data because it take a lot of space to store
and large bandwidth to transmit
32.
Video compression technique is used to compress the data for
these applications because it less storage space and less
bandwidth to transmit data.
With efficient compression techniques, a significant reduction in
file size can be achieved with little or no adverse effect on the
visual quality. The video quality can be affected if the file size is
further lowered by raising the compression level for a given
compression technique.
Videos are sequences of images displayed at a high rate. Each of
these images is called a frame.
Human eye can not notice small changes in the frames such as a
slight difference in color.
Typically 30 frames are displayed on the screen every second.
33.
video compression standards do not require the
encoding of all the details and some of the less
important video details are lost because lossy
compression is used due to its ability to get very high
compression ratios.
less efficient during sequences of fast movement
because fewer MBs in the same position from frame to
frame. In fact, users may note video artifacts during
these sequences if the file is over compressed.
34.
To accomplish this, an application known as a “codec”
analyzes the video frame by frame, and breaks each frame
down into square blocks known as “macro blocks.”
One macro block(MB) consists of four pixels. Typically, the
codec then analyzes each frame, checking for changes in
the MBs.
Areas where the MBs do not change for several frames in a
row are noted and further analyzed.
If the video compression codec determines that these areas
can be removed from some of the frames, it does so, thus
reducing overall file size.
35.
36.
Intra frame ( I )
-Typically about 12 frames between 1 frame
-every MB of the frame is coded using spatial redundancy
Predictive frame ( P )
-Encode from previous I or P reference frame
-most of the MBs of the frame are coded exploiting
temporal redundancy in the past
Bi-directional frames ( B )
-Encode from previous and future I or P frames
-most of the MBs of the frame are coded exploiting
temporal redundancy in the past and in the future
37.
38. Lossy
Lossy compression reduce file size by considerably graeter
amount than lossless compression but lose both information
and quantity.
The compressed file has less data in it than the original file.
It can lose a relatively large amount of data before you start
to notice a difference.
Lossy compression makes up for the loss in quality by
producing comparatively small files.
For example, DVDs are compressed using the MPEG2format, which can make files 15 to 30 times smaller, but
we still tend to perceive DVDs as having high-quality
picture.
39. Lossless
Lossless compression is exactly what it sounds like,
compression where none of the information is lost.
produces a less compressed file, but maintains the original
quality.
reducing the file size by encoding image information more
efficiently.
If file size is not an issue, using lossless compression will
result in a perfect-quality picture.
For example, a video editor transferring files from one
computer to another using a hard drive might choose to use
lossless compression to preserve quality while he or she is
working.
40.
Start by encoding the first frame using a still image
compression method.
It should then encode each successive frame by
identifying the differences between the frame and its
predecessor, and encoding these differences. If the
frame is very different from its predecessor it should be
coded independently of any other frame.
41. Intraframe
Intra frame compression is a brute-force method that
often requires significantly more CPU time than inter
frame, but it can achieve a better balance between file
size and quality loss.
occurs within individual frames
designed to minimize the duplication of data in each
picture(Spatial Redundancy)
42. Interframe
Inter frame video compression considers frames one at
a time, seeing them only as still images. It can analyze
brightness and color and search for areas that can be
optimized, but it does not consider macro blocks.
compression between frames
designed to minimize data redundancy in successive
pictures(Temporal redundancy)
43.
Flow Control and Buffering
Temporal Compression
-Adjacent frames highly
Spatial Compression
-Nearby pixels often correlated(as in still images)
Discrete Cosine Transform (DCT)
Vector Quantization (VQ)
Fractal Compression
Discrete Wavelet Transform (DWT).
44. Example:
AVI: Audio Video Interleave
-use to store audio and video data in file
-formatted as .AVI
JPEG2000: Compression standard for still image
-Lower latency
-Type of lossless compression
MPEG2 & MPEG4: Video Compression Standard
-widely used to DVD Discs and digital television broad
casting
-used in as encoder before transmission
45.
The ISO/IEC, or International Organization for Standardization and the
International Electrotechnical Commission, have a group called the Moving
Pictures Experts Group or MPEG. MPEG is responsible, for example, for the
familiar compression formats MPEG-1, MPEG-2 and MPEG-4.
The ITU-T standardizes formats for the International Telecommunications
Union, a United Nations Organization. Some popular ITU-T compression
formats include the H.261 and H.264 formats.
There are other compression formats, such as Intel Indeo and RealVideo (based
on the ITU-T H.263 codec). These are just as useful as the ones standardized
by the international groups, although some video sharing websites won’t accept
them.
There are also a few different formats to consider when exporting for the web:
MPEG4 (which includes .MV4 files), MPEG2, H.264, DivX, Quicktime,
Window Media Video(WMV), etc.
It’s important not to get video compression formats mixed up with media
container formats. A media container is a file format that contains data that had
been compressed using a video compression format. So the media container is
the end product of video.
46.
Step 1: Add Video File
Click the +FILE button in the upper left of the program
interface. Choose the video you want to convert in the Add
File dialog box and press Open.
47.
Step 2: Choose the Format or Device Preset
Choose the desired video format or target mobile device from the list of
presets. You can also use the Search function to quickly find the format or
device you need. Next, choose the output folder for the compressed videos by
clicking Browse and selecting the desired destination. By default, the output
video will be saved in C:Users%your username%VideosMovavi Library.
48.
Step 3: Define Quality and Size Values
Return to the source file list and click on the value displayed in
the Quality/Size column. A dialog box will open. Move the slider bar to adjust the
output file size and bitrate to meet your needs. Note that the output video size value is
only an estimate; the actual size of the converted video file may differ slightly Check out
our detailed article for other ways to reduce video size.
49. Step 4: Start the Video Compression
Press the Convert button to start the compression process. After the operation is
complete, the output folder with the converted video will open automatically.
51. Abstract Syntax Notation One (ASN.1) is a standard and notation that
describes rules and structures for representing, encoding, transmitting,
and decoding data in telecommunications and computer networking.
52. The notation provides a certain number of pre-defined
basic types such as:
integers (INTEGER),
booleans (BOOLEAN),
character strings (IA5String, UniversalString...),
bit strings (BIT STRING),
etc.,
and makes it possible to define constructed types such
as:
structures (SEQUENCE),
lists (SEQUENCE OF),
choice between types (CHOICE),
etc.
53.
ASN.1 sends information in any form anywhere it needs to be
communicated digitally. ASN.1 only covers the structural aspects of
information there are no operators to handle the values once these are
defined or to make calculations with. Therefore it is not a programming
language.
One of the main reasons for the success of ASN.1 is that this notation is
associated with several standardized encoding rules such as the BER
(Basic Encoding Rules), or more recently the PER (Packed Encoding
Rules), which prove useful for applications that undergo restrictions in
terms of bandwidth.
Encoding rules describe how the values defined in ASN.1 should be
encoded for transmission regardless of machine, programming
language, or how it is represented in an application program.
ASN.1's encodings are more streamlined than many competing
notations, enabling rapid and reliable transmission of extensible
messages, this is an advantage for wireless broadband.
Because ASN.1 has been an international standard since 1984, its
encoding rules are mature and have a long track record of reliability and
interoperability.
ASN.1 is widely used in industry sectors where efficient (low-bandwidth,
55. ASN.1's abstract syntax is similar in form to that of any high level programming language.
For example, consider the following C structure:
struct Student {
char name[50]; /* ``Foo Bar'' */
int grad; /* Grad student? (yes/no) */
float gpa; /* 1.1 */
int id; /* 1234567890 */
char bday[8]; /* mm/dd/yy */
}
Its ASN.1 counterpart is:
Student ::= SEQUENCE {
name OCTET STRING, -- 50 characters
grad BOOLEAN, -- comments preceded
gpa REAL, -- by ``--''
id INTEGER,
bday OCTET STRING -- birthday
}
56.
ASN.1 has been adopted in the communications protocol specification of
Telecommunications, including 3GPP mobile phones
Intelligent Transport Systems ITS
Internet voice communications technology in the VoIP
Multimedia standards
Security-related systems, including smart-cards and certificates - the
basis for e-commerce
Embedded systems communications
Air traffic control
57.
The eXternal Data Representation (XDR) is a standard for the description and
encoding of data. XDR uses a language to describe data formats, but the
language is used only for describing data and is not a programming
language. Protocols such as Remote Procedure Call (RPC) and the Network
File System (NFS) use XDR to describe their data formats.
XDR is an alternative to ASN.1. XDR is much simpler than ASN.1, but less
powerful. For instance:
◦ XDR uses implicit typing. Communicating peers must know the type of any
exchanged data. In contrast, ASN.1 uses explicit typing; it includes type
information as part of the transfer syntax.
◦ In XDR, all data is transferred in units of 4 bytes. Numbers are transferred
in network order, most significant byte first.
◦ Strings consist of a 4 byte length, followed by the data (and perhaps
padding in the last byte). Contrast this with ASN.1.
◦ Defined types include: integer, enumeration, boolean, floating point, fixed
length array, structures, plus others.
One advantage that XDR has over ASN.1 is that current implementations of
ASN.1 execute significantly slower than XDR.
58. there is a user named "john" who wants to store his lisp program
"sillyprog" that contains just the data "(quit)". His file would be
encoded as follows:
OFFSET
HEX BYTES
ASCII
------ -------------0
00 00 00 09
....
4
73 69 6c 6c
sill
8
79 70 72 6f
ypro
characters ...
12
67 00 00 00 g...
16
00 00 00 02
....
20
00 00 00 04
....
24
6c 69 73 70
lisp
28
00 00 00 04
....
32
6a 6f 68 6e
john
36
00 00 00 06
....
40
28 71 75 69
(qui
44
74 29 00 00
t)..
COMMENTS
----------------------- length of filename = 9
-- filename characters
-- ... and more
------
... and 3 zero-bytes of fill
filekind is EXEC = 2
length of interpretor = 4
interpretor characters
length of owner = 4
-- owner characters
-- length of file data = 6
-- file data bytes ...
-- ... and 2 zero-bytes of fill
59.
MIME (Multipurpose Internet Mail
Extensions) is a standard in order to expand
upon the limited capabilities of email, and
in particular to allow documents (such as
images, sound, and text) to be inserted in a
message.
60. MIME adds the following features to email
service:
Be able to send multiple attachments with a
single message;
Unlimited message length;
Use of character sets other than ASCII code;
Use of rich text (layouts, fonts, colors, etc)
Binary attachments (executable, images,
audio or video files, etc.), which may be
divided if needed.
61. MIME uses special header directives to describe the format
used in a message body, so that the email client can interpret
it correctly:
MIME-Version: This is the version of the MIME standard
used in the message. Currently only version 1.0 exists.
Content-type: Describes the data's type and subtype. It
can include a "charset" parameter, separated by a semicolon, defining which character set to use.
Content-Transfer-Encoding: Defines the encoding used in
the message body
Content-ID: Represents a unique identification for each
message segment
Content-Description: Gives additional information about
the message content.
Content-Disposition: Defines the attachment's settings, in
particular the name associated with the file, using the
attribute filename.
62.
63.
64.
Encryption is a method used to enhance the
security of a file or message by scrambling
the contents so that it can be read only by
someone who has the right key to
unscramble it. For example, the information
used for transaction such as purchasing
online (e.g address, phone number, and
credit card number) is usually encrypted to
help keep it safe.
65.
66.
Symmetric keys- only one, same key used to
encrypt and decrypt information transmitted.
67.
Asymmetric keys- use receiver’s public key to
encrypt and receiver’s private key to decrypt.
68.
Preserve confidentiality of the file or
message.
Save money on extra protection software as
the machine that uses the encrypted message
does not have to be secured.
69.
If the key to unlock the encrypted file is lost
then the data is no longer protected and
could also be lost.
Overall performance of the machine that use
the data will decrease since it takes a lot of
energy, processing and computer power to
do the encryption process.
Difficult to use the encrypted message as
some limitations have been placed on it.