Más contenido relacionado



  1. 2. Multimedia
  2. Objectives 2  To understand the terminologies of multimedia systems.  To study protocols used in multimedia applications.  To know different hardware and software components required to run multimedia.  To evaluate multimedia services that satisfy user requirements.
  3. Introduction 3  The way we utilize audio and video has evolved as a result of recent technological advancements.  In the past, we would listen to an audio broadcast on the radio and watch a video show on the television.  People nowadays desire to utilize the Internet for audio and video services in addition to text and image communications.  This chapter focuses on programs that provide audio and video services via the Internet.
  4. 4  Audio and video services may be divided into three main categories:
  5. Streaming Stored Audio/Video 5  The files are compressed and saved on a server using this approach.  The files are downloaded by a client through the Internet.  It is named On-demand audio/video.  E.g.  Stored audio files: songs, symphonies, books on tape, and popular lectures.  Stored video files: movies, TV shows, and music video clips.
  6. Streaming Live Audio/Video 6  Streaming live audio/video refers to the broadcasting of radio and TV programs through the Internet.  A user listens to broadcast audio and video through the Internet.  E.g. Internet Radio. Some radio stations solely transmit their programming via the Internet, while others broadcast them both over the Internet and over the air.
  7. Interactive Audio/Video 7  Interactive audio/video refers to the use of the Internet for interactive audio/video applications.  E.g. Internet telephony and Internet teleconferencing.
  8. Digitizing Audio and Video 8  Before audio or video signals can be transmitted over the Internet, they must first be digitized.  Digitizing Audio  Digitizing Video
  9. Digitizing Audio 9  When sound is supplied into a microphone, an electrical analog signal is produced that represents the amplitude of the sound as a function of time.  These signals are named analog audio signals.
  10. 10  An analog signal, such as audio, can be digitized to produce a digital signal.  According to the Nyquist theorem, if the highest frequency of the signal is f, we need to sample the signal 21 times per second.
  11. Digitizing Video 11  A video is made up of a series of frames. We receive the sensation of motion if the frames are presented on the screen quickly enough.  The reason for this is that our eyes cannot differentiate between the quickly flashing frames and individual frames.
  12. 12  There is no standard for the number of frames per second; nevertheless, 25 frames per second is popular in North America.  A frame must be refreshed to avoid a situation known as flickering(change in brightness).  Each frame is repainted twice in the television industry.  This implies 50 frames must be delivered, or 25 frames if memory is available at the sender site, with each frame repainted from memory.
  13. 13  Each frame is subdivided into picture elements, or pixels, which are tiny grids.  Each 8-bit pixel on black-and-white television represents one of 256 distinct grey levels. Each pixel on a color TV is 24 bits, with 8 bits for each basic color (red, green, and blue).  We can calculate the number of bits in 1s for a specific resolution.  A color frame with the lowest resolution is 1024 × 768 pixels. This equates to  2 x 25 x 1024 x768x 24=944 Mbps.
  14. Audio and Video Compression 14  Compression is required when sending audio or video over the Internet.
  15. 1. Audio Compression 15  Speech and music may both benefit from audio compression. We need to compress a 64-kHz digitized signal for speech, and a 1.41 I-MHz signal for music.  There are two kinds of techniques for audio compression:  Predictive Encoding  Perceptual Encoding
  16. Predictive Encoding 16  Instead of storing all of the sampled values, predictive encoding encodes the changes between the samples.  Speech compression is the most common use for this sort of compression.  GSM (13 kbps), G.729 (8 kbps), and G.723.3 are some of the standards that have been established (6.4 or 5.3 kbps).
  17. Perceptual Encoding: MP3 17  The perceptual encoding approach is the most popular compression technique used to generate CD-quality audio.  This kind of audio requires at least 1.411 Mbps, which cannot be sent without compression via the Internet.  This method is used by MP3 (MPEG audio layer 3), which is part of the MPEG standard.
  18. 18  Perceptual audio coding is a type of audio signal compression method that is based on human ear defects.  Perceptual encoding is based on the science of psychology concerned with the perception of sound and its physiological effects.  The concept is based on defects in our auditory system, which allows some sounds to hide other sounds. Masking can occur in both frequency and time.
  19. 19  Frequency masking: A strong sound in one frequency band can partially or completely hide a lower sound in another frequency range, which is known as frequency masking.  E.g. We cannot hear the words of a person who is sitting beside us in a room where an Arkestra in loud sound is playing.
  20. 20  Temporal masking: A loud sound can affect our hearing for a short period after it has ended in temporal masking.
  21. 21  MP3 compresses audio signals by using frequency and temporal masking. MP3 has three different data rates: 96 kbps, 128 kbps, and 160 kbps.  The rate is determined by the frequency range of the original analog audio.
  22. Video Compression 22  Video is comprised of multiple frames, and each of the frames is an image.  Video can be compressed by compressing the images.  The market is dominated by two standards:  Joint Photographic Experts Group (JPEG) and  Moving Picture Experts Group (MPEG).  Images are compressed using the Joint Photographic Experts Group (JPEG).  Video is compressed using the Moving Picture Experts Group (MPEG).
  23. Image Compression: JPEG 23  In the grayscale picture, each pixel can be represented by an 8-bit integer (256 levels).  The picture is in color, each pixel can be represented by 24 bits (3 x 8 bits), with every 8 bits representing red, blue, or green (RBG).
  24. 24  A grayscale image is split into 8 × 8-pixel blocks in JPEG .  The goal of splitting the image into blocks is to reduce the number of computations since the number of mathematical operations for each picture is equal to the square of the number of units.
  25. 25 Figure 2.2 JPEG grayscale
  26. 26  JPEG's entire concept is to convert the image into a linear (vector) set of numbers that shows the redundancies.  Using one of the text compression methods, the redundancies (lack of changes) may then be eliminated.
  27. 27 JPEG Process
  28. Discrete Cosine Transform (DCT) 28  During this phase, each block of 64 pixels is transformed using the discrete cosine transform (DCT).  The transformation modifies the 64 values, preserving the relative connections between pixels while revealing the redundancies.  We present the transformation outcomes for three different situations.
  29. 29  We present the transformation outcomes for three different situations. Case 1:Uniform Gray Scale Case 2:Two Sections Case 3:Gradient Gray Scale
  30. Case 1:Uniform Gray Scale 30  In this case, we have a grayscale block with a value of 20 for each pixel.  We receive a nonzero value for the first element (upper left corner) when we perform the transformations and the remaining of pixels have a 0 value.  The value of T(0,0) is the average (multiplied by a constant) of the P(x,y) values and is called the dc value (direct current, borrowed from electrical engineering).
  31. 31  The remaining values are called ac values, in which T(m,n) represents changes in the pixel values. As shown in Figure the rest of the values are 0s. Case 1: Uniform Gray Scale
  32. Case 2: Two Sections 32  In the second example, we have a block that has two distinct uniform greyscale sections.  The pixel values have changed significantly (from 20 to 50).  We receive a dc value as well as nonzero ac values when we perform the transformations.  However, the dc value is surrounded by just a few nonzero values. As per Figure 2.5, the majority of the values are zero.
  33. 33 Case 2: Two Sections
  34. Case 3:Gradient Gray Scale 34  In the third case, we have a block that slowly transforms.  That is, there is no significant difference in the values of nearby pixels.  When we do the transformations, we obtain a dc value along with several nonzero ac values as shown in Figure
  35. 35 Case 3: Gradient Gray Scale
  36. 36 From the all above cases we can conclude that:  The transformation creates table T from table P.  The dc value is the average value (multiplied by a constant) of the pixels.  The ac values are the changes.  Lack of changes in neighboring pixels creates 0s.
  37. Quantization 37  Quantization is the process of reducing the number of bits needed to store an integer value by reducing the precision of the integer.  Previously, when we quantized each number, we removed the fraction and preserved the integer part.  The number is divided by a constant, and the fraction is then dropped.  This further reduces the number of bits required.
  38. 38  A quantizing table (8 x 8) is used in most implementations to specify how to quantize each value.  The divisor is determined by the value's position in the T table.  This is done to optimize the number of bits and 0s for each specific application.
  39. 39  The quantizing step is the only part of the process that cannot be reversed.  We've lost some information that can't be recovered.  Due to this reason, only JPEG is called lossy compression because of this quantization phase.
  40. Compression 40  The values are read from the table after quantization, and redundant 0s are eliminated.  The table is read diagonally in a zigzag way rather than row by row or column by column to cluster the 0s together.  The reason behind this is if the picture changes smoothly, the bottom right corner of the T table is all 0s.  Figure depicts the process of reading the table.
  41. 41 Reading the Table
  42. Video Compression: MPEG 42  A motion picture is a fast sequence of frames, each of which represents an image.  To put it another way, a frame is a spatial combination of pixels, whereas a video is a temporal combination of frames transmitted one after the other.  Compressing video means spatially compressing each frame and temporally compressing a set of frames.
  43. 43  Spatial Compression  JPEG is used to compress each frame's spatial data. Each frame is an image that may be compressed separately.
  44. 44 Temporal Compression  Duplicate frames are eliminated during temporal compression.  We get 50 frames per second when we watch television.  However, the majority of the frames in a sequence are nearly identical.  E.g. When someone is speaking, the majority of the frame remains the same from one frame to the next, with the exception of the segment of the frame around the lips, which varies from one frame to the next.
  45. 45  For temporal data compression, the MPEG method divides frames into three types:  I-frames  P-frames  B-frames
  46. I-Frames (Intracoded Frame) 46  It is a frame that exists independently of any other frame (not to the frame sent before or to the frame sent after).  They are not constructed by other frames.  They arrive at regular intervals (e.g., every ninth frame is an I-frame).  An I-frame must appear on a regular basis to manage a rapid change in the frame that the preceding and subsequent frames are unable to display
  47. 47  A viewer may tune in at any moment when a video is shown.  If there is only one I-frame at the start of the show, late viewers will not get a complete picture.
  48. P-Frames (Predicted Frame) 48  It is related to the previous I-frame or P-frame.  Each P-frame only contains the differences from the previous frame.  E.g. if an object is moving quickly, the new changes may not be recorded in a P-frame. P-frames can only be built from previous I- or P-frames.  P-frames carry significantly less information than other frame types and even fewer bits after compression.
  49. B-Frames (Bidirectional Frame) 49  It is related to the I-frame or P-frame that comes before and after it. Each B-frame is relative to the past and future. It should be noted that a B-frame is never related to another B-frame.  Figure depicts a sample frame sequence.
  50. 50 MPEG Frames
  51. 51  Figure depicts the construction of I-Frames, P-Frames, and B-frames from a series of seven frames. MPEG Frame Construction
  52. Streaming Stored Audio/Video 52  In this section, we will discuss different approaches for downloading Streaming Stored audio/video files from the webserver.
  53. First Approach: Using a Web Server 53  You can save a compressed audio/video file as a text file.  To download the file, the client (browser) can use HTTP services and send a GET message.  The compressed file can be sent to the browser by the Web server.  The browser can then play the file using an application, referred to as a media player.  This method is very simple and clear and does not require any streaming.  This method is depicted in Figure 2.10.
  54. 54 Using a Web Server
  55. Drawbacks 55 This method has several drawbacks.  Even after compression, an audio/video file is usually quite large.  A video file and audio file require lots of megabits to store.  The file must be completely downloaded before it can be played.  With today's data rates, the user will have to wait a few seconds or even tens of seconds before the file can be played.
  56. Second Approach: Using a Web Server with Metafile 56  This approach involves connecting the media player directly to the Web server and downloading the audio/video file.  The audio/video file and a metafile containing information about the audio/video file are both stored on the Web server.  The steps in this approach are depicted in Figure
  57. 57 Using a Web Server with a Metafile
  58. 58 1. The HTTP client accesses the Web server by using the GET message. 2. The information about the metafile comes in the response. 3. The metafile is passed to the media player. 4. The media player uses the URL in the metafile to access the audio/video file. 5. The Web server responds.
  59. Third Approach: Using a Media Server 59  The issue with the second approach is that both the browser and the media player rely on HTTP services.  HTTP is intended to operate over TCP.  This is appropriate for retrieving the metafile but not the audio/video file.  The reason for this is that TCP retransmits a lost or damaged segment, which goes against the streaming philosophy.
  60. 60  TCP and its error control must be dropped in favor of UDP.  HTTP connects to the Web server, and the Web server itself is designed to work with TCP;  Here, we need a separate server, a media server for the processing of the audio and video files.
  61. 61 Using a Media Server
  62. 62 1. The HTTP client accesses the Web server by using a GET message. 2. The information about the metafile comes in the response. 3. The metafile is forwarded to the media player. 4. The media player uses the URL in the metafile to access the media server to download the file. 5. The media server sends reply.
  63. Fourth Approach: Using a Media Server and RTSP 63  The Real-Time Streaming Protocol (RTSP) is a control protocol that was created to enhance the functionality of the streaming process.  We can control the playback of audio/video using RTSP.  RTSP is an out-of-band control protocol similar to FTP's second connection.  A media server and RTSP are depicted in Figure
  64. 64 Using a Media Server and RTSP
  65. 65 1. The HTTP client accesses the Web server by using a GET message. 2. The information about the metafile comes in the response. 3. The metafile is passed to the media player. 4. The media player sends a SETUP message to create a connection with the media server. 5. The media server responds.
  66. 66 6. The media player sends a PLAY message to start playing (downloading). 7. The audio/video file is downloaded by using another protocol that runs over UDP. 8. The connection is broken by using the TEARDOWN message. 9. The media server responds.
  67. Streaming Live Audio/Video 67  Streaming live audio/video follows the same strategy to broadcast audio and video on radio and television stations.  Only the difference is that the station uses the Internet for broadcasting instead of the air.
  68. 68  Streaming stored audio/video and streaming live audio/video are both affected by delays, and neither can accept retransmission.  There is a distinction.  The communication in the first application is unicast and on-demand.  The communication is multicast and live in the second.
  69. 69  Live streaming is better suited to IP multicast services and protocols like UDP and RTP.  However, live streaming is still using TCP and multiple unicasting rather than multicasting.
  70. Real-Time Interactive Audio/Video 70  In Real-Time Interactive Audio/Video people interact with each other in real-time.  E.g. Internet phone or voice over IP and Video conferencing.
  71. Characteristics 71  we discuss several characteristics of real-time audio/video communication. 1. Time Relationship 2. Timestamp 3. Playback Buffer 4. Ordering 5. Multicasting 6. Translation 7. Mixing
  72. Time Relationship 72  The preservation of the time relationship between packets of a session is required for real-time data on a packet-switched network.  For Example: let us assume that a real time video server creates live video images and sends them online.  The video is digitized and packetized.  There are only three packets and each packet holds 10s of video information.
  73. 73 Time Relationship
  74. 74  But what if the packets arrive at different times?  Assume the  first packet arrives at 00:00:01 (1-s delay),  the second at 00:00:15 (5-s delay),  and the third at 00:00:27. (7-s delay).  If the receiver begins to play the first packet at 00:00:01, it will end at 00:00:11.  The next packet, however, has not yet arrived; it will arrive 4 seconds later.
  75. 75  As the video is viewed at the remote site, there is a gap between the first and second packets, and between the second and third.  This is referred to as jitter.  The delay between packets causes jitter in real-time data.  The situation is depicted in Figure
  76. 76  Assume, for example, that a real-time video server generates and distributes live video images over the internet.  Video has been digitized and packetized.  There are only three packets, and each packet contains 10s of video data.  The first packet begins at 00:00:00, the second packet at 00:10, and the third packet at 00:20.
  77. 77  Assume that each packet takes 1 second to reach its destination (equal delay).  The first packet can be played back at 00:00:01, the second packet at 00:00:11, and the third packet at 00:00:21.  Despite the fact that there is a 1s time difference between what the server sends and what the client sees on the computer screen, the action is taking place in real-time.  The packets' time relationship is maintained. The 1s lag is insignificant.
  78. 78 Jitter
  79. Timestamp 79  To prevent Jitter, we can time-stamp the packets and separate the arrival time from the playback time.  The use of a timestamp is one solution to Jitter. If each packet contains a timestamp indicating the time it was created in relation to the first (or previous) packet, the receiver can add this time to the time it begins playback.
  80. 80  In other words, the receiver knows when to play each packet.  Consider the previous example, where the first packet has a timestamp of 0, the second has a timestamp of 10, and the third has a timestamp of 20.  If the receiver begins playing the first packet at 00:00:08, the second at 00:00:18, and the third at 00:00:28.  There are no gaps between packets. The situation is depicted in Figure
  81. 81 Timestamp
  82. Playback Buffer 82  We need a buffer to store the data until it is played back so that we can separate the arrival time from the playback time.  The buffer is known as a playback buffer.  When a session starts (the first bit of the first packet arrives), the receiver defers playing the data until a certain threshold is reached.  The first bit of the first packet arrives at 00:00:01 in the preceding example; the threshold is 7 s, and the playback time is 00:00:08.  The threshold is measured in data time units.  The replay does not begin until the data time units reach the threshold value.
  83. 83  The data is stored in the buffer at a variable rate, but it is extracted and played back at a constant rate.  The amount of data in the buffer shrinks or expands, but there is no jitter as long as the delay is less than the time it takes to playback the threshold amount of data.  For our example, Figure depicts the buffer at various times.
  84. 84 Playback Buffer
  85. Ordering 85  One more feature is required in addition to time relationship information and timestamps for real-time traffic.  Each packet requires a sequence number.  If a packet is lost, the timestamp alone will not alert the receiver.  Let's pretend the timestamps are 0, 10, and 20.  The receiver receives only two packets with timestamps 0 and 20 if the second packet is lost.
  86. 86  The receiver assumes the packet with the timestamp 20 is the second packet, which was sent 20 seconds after the first.  The receiver has no way of knowing whether or not the second packet was lost.  To deal with this situation, you'll need a sequence number to order the packets.
  87. Multicasting 87  Audio and video conferencing rely heavily on multimedia.  The data is distributed using multicasting methods because the traffic can be heavy.  Two-way communication between receivers and senders is required for conferencing.
  88. Translation 88  A translator is a computer that can change the format of a high-bandwidth video signal to a lower-quality narrow-bandwidth signal.  This is required, for example, when a source generates a high-quality video signal at 5 Mbps and sends it to a recipient with a bandwidth of less than 1 Mbps.  A translator is required to decode the signal and encode it again at a lower quality that requires less bandwidth in order to receive it.
  89. Mixing 89  When multiple sources can send data at the same time (as in a video or audio conference), the traffic is divided into multiple streams.  Data from various sources can be mixed to converge traffic to a single stream.  A mixer mathematically combines signals from various sources to produce a single signal.
  90. Support from Transport Layer Protocol 90  Some of the procedures in real-time applications are preferable to implement in the transport layer protocol.  Let's take a look at which of the existing transport layers is appropriate for this type of traffic.
  91. 91  Mainly TCP and UDP are two transport layer protocols. TCP is not appropriate for interactive traffic.  It does not support time-stamping and multicasting.  The error control mechanism supported by TCP is not suitable for interactive traffic as retransmission of the lost or corrupted packet is not expected.  The concept of time-stamping and playback is thrown off by retransmission.  Today's audio and video signals have so much redundancy (even with compression) that we can simply ignore a lost packet.  The listener or viewer at the remote location may miss it.
  92. 92  For interactive multimedia traffic, UDP is better.  Multicasting is supported by UDP, but there is no retransmission strategy.  UDP, on the other hand, does not support time-stamping, sequencing, or mixing.  These features are provided by the Real-time Transport Protocol (RTP), a new transport protocol.  For interactive traffic, UDP is preferable to TCP.  However, we require the services of RTP, a different transport layer protocol, to compensate for UDP's shortcomings.
  93. RTP (Real-time Transport Protocol) 93  The Real-time Transport Protocol (RTP) is a protocol designed to handle real-time Internet traffic.  RTP lacks a delivery mechanism (multicasting, port numbers, and so on).  It must be used in conjunction with UDP. RTP acts as a bridge between UDP and the application program.  RTP's primary contributions are time-stamping, sequencing, and mixing capabilities.  RTP's position in the protocol suite is sketched in Figure
  94. RTP 94
  95. RTP-Packet Format 95  The format is simple and broad enough to cover a wide range of real-time applications.  If an application requires additional data, it adds it to the beginning of its payload.  The RTP packet header is shown in Figure
  96. 96 RTP packet header format
  97. 97  Ver (2-bits) :It defines the version number. The current version is 2.  P (1-bit):If this field is set to 1, it indicates the appearance of padding at the end of the packet. The value of the last byte in the padding defines the length of the padding. There is no padding if the value of the P field is 0.  X (1-bit):If this field is set to 1, it indicates an extra extension header between the basic header and the data. If this field is set to 0 then, no extra extension header.
  98. 98  Contributor Count (4-bits):It gives the count of Contributors. We can have a maximum of 15 contributors (between 0 and 15).  M (1-bit):It is used by the application as a marker. It indicates, for example, the end of its data.  Payload Type (7-bits):It gives the type of payload. Several Payload Types are defined but Table 2.1 describes some of the payload types and the applications.
  99. 99 Payload types
  100. 100  Sequence Number (16-bits) This field is used to give the number to the RTP packets. The first packet's sequence number is chosen at random, and it is increased by one for each subsequent packet. The receiver uses the sequence number to detect lost or out-of-order packets.  Timestamp (32-bits) This field indicates the time relationship between the packets. The first packet's timestamp is a random number. The value for each subsequent packet is the sum of the preceding timestamp plus the time the first byte is produced.
  101. 101  Synchronization Source Identifier (32-bits) In the case of only one source, this field defines the source. If there are multiple sources, the mixer serves as the synchronization source, while the other sources serve as contributors. The source identifier's value is a random number chosen by the source.  Contributor Identifier (32-bits) Each of these 32-bit identifiers (up to 15 in total) defines a source. When there are multiple sources in a session, the mixer serves as the synchronization source, while the remaining sources serve as contributors.
  102. 102  Despite the fact that RTP is a transport layer protocol, the RTP packet is not directly encapsulated in an IP datagram. Instead, RTP is encapsulated in a UDP user datagram and treated as an application program.  RTP does not have a well-known port assigned to it.  The port can be chosen at any time, with the exception that the port number must be an even number.  RTP's companion, Real-time Transport Control Protocol (RTCP), uses the next number (an odd number).  RTP uses a temporary even-numbered UDP port.
  103. RTCP(Real-time Transport Control Protocol) 103  Real-time Transport Control Protocol (RTCP) is a protocol implemented to facilitate messages which regulate the flow and quality of data while also allowing the recipient to provide feedback to the source or sources.  Figure depicts the five types of messages supported by RTCP. The number next to each box denotes the message's type.
  104. RTCP-Message Types 104
  105. Sender Report 105  The active senders in a conference send the sender report on a regular basis to report transmission and reception statistics for all RTP packets sent during the interval.  The sender report includes an absolute timestamp, which is the number of seconds since 12:00 a.m. on January 1, 1970.  The absolute timestamp enables the receiver to synchronize multiple RTP messages at the same time.  It is especially critical when both audio and video are transmitted.
  106. Receiver Report 106  The receiver report is intended for passive participants who do not send RTP packets.  The report informs the sender and other recipients about the service's quality.
  107. Source Description Message 107  A source description message is sent by the source on a regular basis to provide additional information about itself.  The name, e-mail address, phone number, and address of the source's owner or controller can be included in this information.
  108. Bye Message 108  To close a stream, a source sends a bye message. It enables the source to announce its departure from the conference. Other sources can detect a lack of a source, but this message is a direct announcement.
  109. Application-Specific Message 109  A packet for an application that wants to use new applications is called an application-specific message. It enables the creation of new message types.
  110. 110  UDP Port  RTPC uses a temporary port. RTCP uses an odd- numbered UDP port number that follows the port number selected for RTP.
  111. Voice Over IP 111  Voice over IP or Internet telephony is a real-time interactive audio/video application.  The concept here is to use the Internet as a telephone network with some added features.  This application allows two parties to communicate over a packet-switched Internet.  SIP and H.323 are two protocols designed specifically for this type of communication.  They are discussed briefly here.
  112. SIP (Session Initiation Protocol) 112  Session Initiation Protocol (SIP) is an application layer protocol and is created by IETE.  It establishes, manages, and terminates a multimedia session (call).  It allows you to create two-party, multi-party, or multicast sessions.  SIP is designed to run on UDP, TCP, or SCTP, regardless of the underlying transport layer.
  113. Messages 113  SIP, like HTTP, is a text-based protocol.  Six messages are used in SIP, shown in Figure.
  114. 114  A header and a body are included in each SIP message. The header is made up of several lines that describe the message's structure, caller capability, media type, and other details.  SIP messages are described as follows.  INVITE: The caller initializes a session with the INVITE message.  ACK: After the callee answers the call, the caller sends an ACK message for confirmation.  BYE: The BYE message terminates a session.  OPTIONS: The OPTIONS message queries a machine about its capabilities.  CANCEL: The CANCEL message cancels an already started initialization process.  REGISTER: The REGISTER message makes a connection when the callee is not available.
  115. Addresses 115  SIP is a very adaptable protocol. To identify the sender and receiver in SIP, an e-mail address, an IP address, a phone number, and other types of addresses can be used.  However, the address must be in SIP format. Some common formats are shown in Figure
  116. 116 SIP formats
  117. SIP Session  A basic SIP session comprises three modules: Establishing, Communicating, and Terminating.  Figure depicts a simple SIP session. 117
  118. 118  Establishing a Session  In order to establish a session in SIP, a three-way handshake is required. To initiate communication, the caller sends an INVITE message via UDP, TCP, or SCTP. If the callee agrees to begin the session, she sends a reply message. The caller sends an ACK message to confirm that a reply code has been received.
  119. 119  Communicating  After the session is established, the caller and callee can communicate via two temporary ports.  Terminating the Session  The session can be ended by either party sending a BYE message.
  120. Tracking the Callee 120  SIP has a mechanism (similar to DNS) for determining the IP address of the terminal where the callee is seated.  SIP employs the concept of registration to carry out this tracking.  Some servers are designated as registrars by SIP.  At any given time, a user is registered with at least one registrar server, which is aware of the callee's IP address.
  121. 121  When a caller needs to communicate with the callee, the caller can use the e-mail address in the INVITE message instead of the IP address.  The message is routed through a proxy server.  The proxy server sends a lookup message to the registrar server that has the callee's information.  When the proxy server receives a reply message from the registrar server, it inserts the newly discovered IP address of the callee into the caller's INVITE message.  This message is then delivered to the callee.  The procedure is depicted in Figure
  122. 122
  123. H.323 123  Architecture  H.323 is a standard developed by ITV that allows telephones on the public telephone network to communicate with computers connected to the Internet (referred to as terminals in H.323).  The general architecture of H.323 is depicted in Figure
  124. H.323 Architecture 124
  125. 125  A gateway is a device that connects the Internet to the telephone network.  A gateway is a five-layer device that can convert a message from one protocol stack to another.  The gateway in this case does the same thing.  It converts a message from a telephone network to an Internet message.  As we discussed in the SIP, the gatekeeper server on the local area network serves as the registrar server.
  126. Protocols  To establish and maintain voice (or video) communication, H.323 employs several protocols.  These protocols are depicted in Figure 126
  127. 127  H.323 compresses using G.71 or G.723.1.  It employs the H.245 protocol, which allows the parties to negotiate the compression method.  Q.931 protocol is used to establish and terminate connections.  For registration with the gatekeeper, another protocol called H.225, or RAS (Registration, Administration, Status), is used.
  128. 128
  129. 129  Let us use a simple example to demonstrate the operation of telephone communication using H.323.  Figure 2.27 depicts the steps that a terminal takes to communicate with a telephone. 1. The gatekeeper receives a broadcast message from the terminal. The gatekeeper responds by providing its IP address. 2. The terminal and gatekeeper communicate via H.225, which is used to negotiate bandwidth. 3. Q.931 is used to establish a connection between the terminal, gatekeeper, gateway, and telephone.
  130. 130 4. To negotiate the compression method, the terminal, gatekeeper, gateway, and telephone use H.245 to communicate. 5. RTP is used by the terminal, gateway, and telephone to exchange audio under the management of RTCP. 6. To terminate the communication, the terminal, gatekeeper, gateway, and telephone use Q.931.
  131. References 131 1. Data communications and networking by Behrouz Forouzan 4th/5th edition, McGraw Hill Pvt Ltd. 2. Computer Networks by Andrew S Tanenbaum, 4th/5th edition, Pearson Education 3. Cryptography and Network Security: Principles and Practice, William Stallings, 7th edition, Pearson Education 4. Network Security Essentials: Applications and Standards (For VTU), William Stallings, 3rd edition, Pearson Education