Estimation of bitlength of transformed quantized residue

INTERNATIONALComputer EngineeringCOMPUTER(IJCET), ISSN 0976 – &
International Journal of JOURNAL OF and Technology ENGINEERING
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 3, Issue 3, October - December (2012), pp. 168-183
IJCET
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2012): 3.9580 (Calculated by GISI) ©IAEME
www.jifactor.com

ESTIMATION OF BITLENGTH OF TRANSFORMED-QUANTIZED
RESIDUE COEFFICIENTS WITH CONTEXT INFORMATION AND
ITS SYNTAX ELEMENTS FOR MODE DECISION IN H.264 BASELINE
ENCODER
P. Essaki Muthu, Research Scholar, Dr. MGR Educational and Research Institute,
Chennai, INDIA
Dr. R M O Gemson, Professor, SCT Institute of Technology, Bangalore, INDIA
pessakimuthu@yahoo.com, mogratnam@rediffmail.com

ABSTRACT

To achieve the best coding efficiency, H.264 encoder has to evaluate exhaustively all the
mode combinations of intra and inter predictions for deciding coding mode. The
computational complexity and time taken for deciding modes and encoding are more than
any other previous standards. The RD-Cost based mode decision is used generally, which
will consume more time and has more computational complexity. Estimation of bitlength
(rate) based on Context based Adaptive Variable Length Coding (CAVLC) and Exp-Golomb
Coding tables is proposed during RD-cost calculation avoiding real entropy encoding. The
experimental results demonstrate that proposed method reduces at least 40% of computation
complexity without compromising the coding efficiency and performance.

Index Terms – Context based Adaptive Variable Length Coding (CAVLC), H.264, Rate
Distortion Cost (RD-Cost)

SECTION 1: INTRODUCTION

The newest international video coding standard H.264/ advanced video coding (AVC) has
been approved by ITU-T as recommended H.264 and by ISO/IEC as the MPEG-4 part 10
AVC international Standard [1]. The emerging H.264/AVC achieves significantly better
performance in both peak signal-to-noise ratio (PSNR) and visual quality at the same bit rate
compared with prior video coding standards. H.264/AVC can save up to 39%, 49%, and 64%
of the bit rate, when compared with MPEG-4, H.263, and MPEG-2 [2]. H.264 encoder
compresses the raw video into compressed stream by Intra prediction (I-frame), Inter
prediction (P-frame) and Bidirectional Inter Prediction (B-frame). H.264 uses one important
technique called Lagrangian rate-distortion optimization (RDO) performed for intra and inter
mode prediction. The rate-distortion optimization technique is very time consuming and the
computational complexity of H.264/AVC is dramatically high using the RDO technique.

168

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –

H.264 uses Entropy Encoder (CAVLC/CABAC) for encoding the transformed-quantized
residue coefficients (the coefficients may be positive or negative) and Exp-Golomb Encoder
for encoding the syntax elements into bit stream. The mode decision in Intra frame and
motion estimation in Inter frame are finalized based on Rate-Distortion Criterion (RD-Cost).
RD-Cost is calculated by weighted sum of the quality distortion and rate [3].
The RD-Cost is calculated as follows:
RD-Cost = D + λ.R (1)
where D is the quality distortion defined by Sum of Absolute Difference (SAD) or Sum of
Squared Error (SSE) or Sum of Absolute Transform Difference (SATD), between the original
block and reconstructed block, R is the rate defined by the sum of number of bits for
encoding residue coefficients, and syntax elements of the block and λ is the Lagrangian
Factor which varies with Quantization Parameter (QP). Difference ‘D’ is otherwise known as
Error ‘E’, which is the difference between the original and reconstructed block.
In Baseline H.264 Encoder, the CAVLC Encoder is used to generate the bitstream for the
given residue coefficients and Exp-Golomb Encoder is used to generate the bitstream for the
given syntax elements respectively. Alongside, the number of bits in the bitstream is
calculated and used as rate in RD-Cost calculation for mode decision and motion estimation.
The bitstreams are generated for all nine modes for 4x4 sub-macroblock and kept
separately for future use. Out of all generated bitstreams for all the possible modes, only one
bitstream is finalized based on minimum RD-Cost and sent to Network Abstraction Layer as
finalized bitstream. In Intra 16x16 Prediction, the same method is followed for all four modes
and mode decision is done based on minimum RD-Cost. In Inter frame, the bitstreams are
generated for all possibility of blocksizes (P_16x16, P_16x8, P_8x16, P_8x8 and sub-P_8x8).
The mode decision is done based on minimum RD-Cost among I_16x16, I_4x4, P_16x16,
P_16x8, P_8x16 and P_8x8.
Instead of generating the entire bitstream and finding the number of bits for all the modes,
several methods have been proposed to get an approximate estimate of resulting bits for
coding transformed quantized residue coefficients and its syntax elements. Many researchers
have [4 – 7] proposed methods to reduce the computational complexity of RD-Cost
calculation.
The bitrate is estimated using standard deviation of transform coefficients in [5], which
results approximate bitrate. In [6], the approximate bits are estimated based on five different
types of symbols of CAVLC. Tu et al [7], proposed a transform domain bit-rate estimation,
but still the approximate value is estimated, resulted in bitrate deviation and quality
degradation.
While RD-Cost calculation, it is unnecessary to find bitstream for all possible modes, but
it is important to find the bitlength of residue and syntax elements for all possible modes.
Once mode is decided, then the required bitstream can be generated by CAVLC Encoder and
Exp-Golomb Encoder.
The objective of this paper is to propose a reduced complexity method to find exact
bitlength only, not generating the bitstreams for all the modes, that could be used in RD-Cost
calculation to finalize the decided mode.
This paper is organized in five sections. Section 2 explains the estimation of bitlength of
Residue coefficients and the estimation of various syntax elements like macroblock type,
modes, motion vectors, etc. Section 3 gives the complexity analysis of proposed method
comparing with method followed by JM/X264 reference software. The simulation-results of
the proposed method and existing method are presented in Section 4. Finally, Section 5
concludes the paper by providing the comparisons and inferences.

169


SECTION 2: BITLENGTH ESTIMATION OF RESIDUE COEFFICIENTS AND ITS
SYNTAX ELEMENTS

This section explains the estimation of bitlength of Residue coefficients in sub-section 2.1
and the estimation of various syntax elements in sub-section 2.2.

2.1) RESIDUE COEFFICIENTS:
The Residue coefficients are broadly classified into Luma and Chroma Coefficients. The
Luma Coefficients are categorized into Luma_4x4, Luma DC and Luma AC. The Chroma
Coefficients are generally divided into Chroma DC and Chroma AC Coefficients. In Intra
16x16 Prediction, Luma DC and Luma AC Coefficients are generated. Intra and Inter
Chroma Predictions will generate Chroma DC and Chroma AC Coefficients. The other
predictions such as Intra 4x4 and Luma Inter Predictions generate Luma_4x4 Coefficients.

2.1.1) TYPES OF LUMA AND CHROMA RESIDUE COEFFICIENTS
The types of Luma and Chroma Residue Coefficients are explained hereafter.

2.1.1.1) LUMA RESIDUE COEFFICIENTS:
For a Macroblock of residue coefficients, there are sixteen 4x4 sub-macroblocks of
residue coefficients when the prediction is done as 4x4 blocks. Each 4x4 sub-macroblock
is called Luma_4x4. There are 16 coefficients in each sub-macroblock.
In the case of 16x16 prediction, the residue coefficients are divided into two groups,
called Luma DC and Luma AC Coefficients [8]. In each 4x4 sub-macroblock, the topleft
coefficient is considered as DC coefficient and the rest 15 coefficients are considered as
AC coefficients. There will be a 4x4 Luma DC coefficients and sixteen 4x4 Luma AC
coefficients. In each Luma AC 4x4 coefficients, the topleft coefficient is forced to zero
and not accounted in encoding. So there will be 16 coefficients in Luma DC and 15
coefficients in each Luma AC sub-macroblocks.

2.1.1.2) CHROMA RESIDUE COEFFICIENTS:
In the case of Chroma prediction, the residue coefficients are divided into two groups,
called Chroma DC and Chroma AC Coefficients. From each 4x4 sub-macroblock, the
topleft coefficient is considered as DC coefficient and the rest 15 coefficients are
considered as AC coefficients. There will be 4x4 Chroma DC coefficients and sixteen 4x4
Chroma AC coefficients if the Chroma SubSampling is 4:4:4. If the Chroma SubSampling
is 4:2:2, there will be 2x4 Chroma DC coefficients and eight 4x4 Chroma AC coefficients.
There will be 2x2 Chroma DC coefficients and four 4x4 Chroma AC coefficients if the
Chroma SubSampling is 4:2:0. In each Chroma AC 4x4 coefficients, the topleft coefficient
is forced to zero. So there will be 15 coefficients in each Chroma AC sub-macroblocks.

2.1.2) PROPOSED METHOD TO FIND RESIDUAL BITLENGTH

The method of calculating the number of bits for a 4x4 Residue coefficients is
explained hereafter. The 16 coefficients in a 4x4 Residue coefficients are arranged in a zig-
zag manner, as shown in the Figure 1, and written in an array as follows:
(0,0) – (0,1) – (1,0) – (2,0) – (1,1) – (0,2) – (0,3) – (1,2) – (2,1) – (3,0) – (3,1) – (2,2) – (1,3)
– (2,3) – (3,2) – (3,3)

170


(0,0) (0,1) (0,2) (0,3)
(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2) (3,3)

Figure 1 Zig-Zag manner of arrangement

In the case of Luma_4x4 and Luma DC, there will be 16 coefficients in a 4x4 sub-
macroblock. The total number of coefficients is tot_coef = 16. The Luma AC and Chroma
AC coefficients are having only 15 coefficients and are arranged as follows:
(0,1) – (1,0) – (2,0) – (1,1) – (0,2) – (0,3) – (1,2) – (2,1) – (3,0) – (3,1) – (2,2) – (1,3) – (2,3)
– (3,2) – (3,3)
The total number of coefficiets is tot_coef = 15 in Luma AC and Chroma AC coefficients.
Chroma DC of 2x2 size (in case of 4:2:0), or 2x4 size (in case 4:2:2) or 4x4 size (in case
of 4:4:4) are also arranged in zig-zag manner. So, tot_coef = 4 (4:2:0) or 8 (4:2:2) or 16
(4:4:4).
The parameters by which the bitlength is estimated are
1. Number of nonzero coefficients of top and left sub-macroblocks (nC)
2. Chroma Sub Sampling (4:4:4 / 4:2:2 / 4:2:0)

The parameters for which the bitlength to be estimated are listed below.
1. (T1,TC,nC) code – Number of nonzero coefficients (TC) with respect to number of
Trailing Ones (T1) with reference to context information (nC)
2. T1 code – The pattern of Trailing Ones (T1)
3. Levels’ code – The nonzero coefficients (Levels)
4. (TZ,TC) code – Number of zeros embedded in the nonzero coefficients (TZ) with
respect to number of nonzero coefficients (TC)
5. RB code – The number of consecutive embedded zeros (RB) in between non zero
coefficients with respect to embedded zeros/zeros left

The sum of bitlengths of all the above five parameters will give the bitlength of the given
4x4 sub-macroblock.

2.1.2.1) FINDING BITLENGTH OF (T1,TC,nC) CODE
The number of nonzero coefficients among the given array of residue coefficients is
calculated as TC. For this array (or sub-macroblock), nCP = TC is returned. [0 ≤ TC ≤
tot_coef]. Note that nCP of Luma DC or Chroma DC will not be referred for the present sub-
macroblock, but this nCP is used for estimation for next neighbour sub-macroblocks.
The zero coefficients from the last are removed one – by – one, till the nonzero coefficient
is reached. This reduced array may have zeros in between the first coefficient and last
nonzero coefficient.
If the last nonzero coefficient is not equal to ‘±1’, then Trailing ones (T1) = 0. If the last
nonzero coefficients is equal to ‘±1’, then number of continuous ones (+1 or -1) from last
nonzero coefficient ‘±1’ towards first nonzero coefficient are counted. The count is restricted
to 3. Maximum value of T1 = 3 i.e., only three ‘±1’s only from the last nonzero coefficient
inclusive are considered. If there is any other ‘±1’ present in the array, that is considered as
levels, not as T1s.
For Luma_4x4, Luma DC, Luma AC, Chroma DC (4:4:4) and Chroma AC, the bitlength
of (T1, TC, nC) will be estimated based on nC. The value of nC is calculated as below.

171


‫ ۓ‬ቂ 2 ቃ , both available
L nC +nC
T

ۖ
nCൌ nCL , ݊‫ ்ܥ‬is not available
‫۔‬nCT , ݊‫ܥ‬௅ is not available
ۖ
(2)

‫ ,0 ە‬both are not available

where nCL is the number of nonzero coefficients in the left sub-macroblock and nCT is the
number of nonzero coefficients in the top sub-macroblock. Note that nCP has no role here.
Based on nC, the columns will be selected from Table A.1.

1, 0 ≤ nC < 2
‫,2 ۓ‬ 2 ≤ nC < 4
ۖ
3, 4 ≤ nC < 8
column ൌ
‫,4 ۔‬ 8 ≤ nC
(3)
ۖ5, Chroma DC 4: 2: 0
‫,6ە‬ Chroma DC 4: 2: 2

The bitlength of (T1, TC, nC) code will be selected from Table A.1. For Eg. T1 = 2, TC =
10, nC = 1, then bitlength of (T1, TC, nC) code = 14.

2.1.2.2) FINDING BITLENGTH OF T1 CODE
The bitlength of pattern of Trailing Ones is the number of Trailing Ones, i.e., T1.

2.1.2.3) FINDING BITLENGTH OF LEVELS’ CODE

coefficients are called levels. The estimation of their bitlengths is explained below. ܺ ≫ ܰ
Bitlength of levels is initialized to zero. If TC > T1, then the remaining nonzero

means ቔଶಿቕ, and ܺ ≪ ܰ means X multiplied by 2N.
௑

Step 1: Keeping the last nonzero coefficient as ‘a’, if (TC > 10) and (T1 < 3), Suffixlength = 1
or else Suffixlength = 0
Step 2: If (Suffixlength = 0)
(i) If (TC ≤ 3 or T1 < 3), |a| |a| – 1 and sign is kept same.
(ii) If |a| < 8, bitlength = 2 |a| – 1 + (a < 0). Go to Step 7.
(iii) ElseIf |a| < 16, bitlength = 19. Go to Step 7.
(iv) Else (If |a| ≥ 16), there is Diff = |a| – 16. Go to Step 6.

Step 4: If (|a| – 1) ≥ [15 ≪ (Suffixlength – 1)], Diff = (|a| – 1) – [15 ≪ (Suffixlength – 1)]. Go
Step 3: Else (If Suffixlength = 1), change |a| |a| – 1 and sign is kept same.

Step 5: Else, bitlength = [(|a| – 1) ≫ (Suffixlength – 1)] + Suffixlength + 1. Go to Step 7.
to Step 6.

Step 6: bitlength = 28 + 3 (Diff ≫ 11)
Step 7: bitlength of levels = bitlength of levels + bitlength
Step 8: Based on present nonzero coefficient |‘a’|, the new Suffixlength is found from Table
A.2.
(i) If the present nonzero coefficient ‘a’, is the last nonzero coefficient (other than
trailing ones), and if the present |a| > 3, then next Suffixlength = 2.
(ii) Else the new Suffixlength is found from Table A.2, based on present nonzero
coefficient |‘a’|.
(a) If the new Suffixlength = 2 and previous Suffixlength = 0, then next
Suffixlength = new Suffixlength (i.e., 2)

172


(b)ElseIf the new Suffixlength > previous Suffixlength, next Suffixlength =
previous Suffixlength + 1
(c) Else, the same previous Suffixlength is kept as new Suffixlength.
Step 9: If any nonzero coefficient (‘a’) is available next (reverse reading!), then go to Step 4.
Step 10: Stop

2.1.2.4) FINDING BITLENGTH OF (TZ, TC) CODE
If TC < tot_coef, by reading the array between the last nonzero coefficient and the first
coefficient, number of embedded zeros are counted and represented as TZ. For Chroma DC
(4:2:0), the bitlength of (TZ,TC) code is found in Table A.3 and for Chroma DC (4:2:2), the
bitlength of (TZ,TC) code is found in Table A.4. For Luma, Chroma DC (4:4:4) and Chroma
AC, the bitlength of (TZ,TC) code is calculated from Table A.5.

2.1.2.5) FINDING BITLENGTH OF RB CODE
If TZ > 0, then runbefore (RB) is calculated. RB is the number of consecutive zeros
between two nonzero coefficients from last and first coefficient in the given array. The
bitlength of RB is calculated as follows:

Step 1: zerosLeft = TZ and starting from last nonzero coefficient. Bitlength of RB is
initialized to zero.
Step 2: The number of zeros present before that nonzero coefficient (RB) is calculated.
Step 3: Bitlength of RB = Bitlength of RB + Bitlength of (RB, zerosLeft) which is found
from Table A.6.
Step 4: zerosLeft = zerosLeft – RB
Step 5: If zerosLeft = 0 or number of coefficient remaining = zerosLeft, or the present
nonzero coefficient = first nonzero coefficient, STOP.
Step 6: Else, previous nonzero coefficient is found and process should go to Step 2.

−2 −2 3 −1
2.1.2.6) EXAMPLE

ܴ݁‫ ݏݐ݂݂݊݁݅ܿ݅݁݋ܿ ݁ݑ݀݅ݏ‬ൌ ቌ 1 0 8 4ቍ
10 −4 −1 1
1 5 0 0

Assuming number of nonzero coefficients in left sub-macroblock, nCL = 8 and top sub-
macroblock, nCT = 6 and Chroma subsampling = 4:2:0

Re-arranged array is:
{-2, -2, 1, 10, 0, 3, -1, 8, -4, 1, 5, -1, 4, 1, 0, 0}

݊‫ ܥ‬ൌ ቂ ଶ ቃ ൌ ቂ ଶ ቃ ൌ 7, so, col_num = 3 is selected
௡஼௅ା௡஼் ଼ା଺

T1 = 1, TC = 13
bitlength of (T1,TC,nC) code = 9
bitlength of T1 code =1

173


Table 1 Bitlength calculation of levels and zerosleft
Bitlength of Bitlength of
Level RB Zerosleft
Level (RB,zerosleft)
(Trailing
1 0 1 1
one)
4 4 0 1 1
-1 3 0 1 1
5 5 0 1 1
1 3 0 1 1
-4 4 0 1 1
8 6 0 1 1
-1 4 0 1 1
3 4 1 1 1
10 6
1 4
-2 4
-2 4
Total 51 9

bitlength of levels’ code = 51
bitlength of (Tz,TC) code =3
bitlength of RB code =9
So, total bitlength of residue coefficients
= 9 + 1 + 51 + 3 + 9 = 73

2.2) FINDING BITLENGTH OF SYNTAX ELEMENTS
The Syntax Elements corresponding to a macroblock layer (Ref: macroblock_layer [9])
coded in H.264 Baseline Encoder are listed as follows:
1. Macroblock type (mb_type)
2. Sub-macroblock type (sub_mb_type)
3. Intra Luma Prediction mode (intra_luma_pred_mode)
4. Intra Chroma Prediction mode (intra_chroma_pred_mode)
5. Motion Vector Difference (mvd_l0)
6. Coded Block Pattern (coded_block_pattern), and
7. Delta QP of Macroblock (mb_qp_delta)

2.2.1) MACROBLOCK TYPE (mb_type)
The type of Macroblock is represented as mb_type. In Intra frame, there are two types of
Macroblock: Intra 4x4 and Intra 16x16. In Inter Frame, there are Intra 4x4, Intra 16x16, Inter
16x16, Inter 16x8, Inter 8x16 and Inter 8x8 Macroblock. Each type will have its own code
[9].

For Intra Frame
mb_type = 0, Intra 4x4 Prediction (4)

mb_type = 12 x (cbp_luma ≠ 0) + 4 x cbp_chroma + Intra16x16PredMode + 1, Intra 16x16
Prediction (5)

174


The cbp_luma is the coded block pattern of luma residue coefficients. It is a 4-bit pattern,
in which the least significant bit (1st bit) represents the presence of at least one nonzero
coefficient in the first 8x8 sub-macroblock in the 16x16 Residue Macroblock. The 2nd least
significant bit represents 2nd 8x8 sub-macroblock and so on. The value of cbp_luma varies
from 0 to 15.
The cbp_chroma is the coded block pattern of chroma residue coefficients. The
cbp_chroma is a 2-bit value. It is ‘0’, if DC and AC coefficients of Chroma Residue are zero.
It is ‘1’, if at least one DC coefficient is nonzero and all AC coefficients are zero. The
cbp_chroma is ‘2’, if at least one of the AC coefficients is non-zero. The value of
cbp_chroma varies from 0 to 2, will never be 3.
Intra16x16PredMode is the desired Intra 16x16 Prediction mode, i.e., 0 – Vertical, 1 –
Horizontal, 2 – DC, and 3 – Plane Prediction Mode.
For example, in Intra Frame, if the macroblock is decided with Intra 4x4 prediction, then
the mb_type of that macroblock is 0. If the macroblock is finalized with Intra 16x16
prediction, Vertical Prediction, coded block pattern luma is 0 and coded block pattern chroma
is 0, then mb_type is 1.

For Inter Frame
mb_type = 0, Inter 16x16 Prediction (6)
mb_type = 5, Intra 4x4 Prediction (10)
mb_type = 12 x (cbp_luma ≠ 0) + 4 x cbp_chroma + Intra16x16PredMode + 6, Intra 16x16
Prediction (11)

The bitlength of macroblock type is calculated,
Bitlength of mb_type

ൌ 2 x ቒlog ଶ ቀቂ ቃ + 1ቁቓ + 1
୫ୠ_୲୷୮ୣ
ଶ
(12)

2.2.2) SUB-MACROBLOCK TYPE (sub_mb_type)
The type of 8x8 sub-macroblock is represented as sub_mb_type. There are four types
based on the block size, namely 8x8, 8x4, 4x8 and 4x4, in Inter Predicted frame.

1, Inter 8x8 Prediction
bitlength of sub_mb_type = ൞
(13)

There are four 8x8 sub-macroblocks in a macroblock. If the macroblock type is Inter 8x8
Prediction, only then sub-macroblock type will be coded.

2.2.3) INTRA LUMA PREDICTION MODE (intra_luma_pred_mode)
The mode by which the luma 4x4 sub-macroblock is predicted is called
intra_luma_pred_mode. There are 9 different luma 4x4 prediction modes, namely Vertical,
Horizontal, DC, Diagonal Down Left, Diagonal Down Right, Vertical Right, Horizontal
Down, Vertical Left and Horizontal Up Mode. The decided mode of a given 4x4 sub-
macroblock is predicted by the following equation.

175


Predicted_mode = min(modeLEFT, modeTOP) (14)
where modeLEFT is the mode of left 4x4 sub-macroblock and modeTOP is the mode of top 4x4
sub-macroblock

If any of the modes is not available, then it is replaced with DC mode (mode = 2). The
bitlength of intra_luma_pred_mode is calculated as follows.

Bitlength of intra_luma_pred_mode
= 1, if intra_luma_pred_mode = Predicted_mode
= 4, if intra_luma_pred_mode ≠ Predicted_mode(15)

2.2.4) INTRA CHROMA PREDICTION MODE (intra_chroma_pred_mode)
The mode by which the chroma macroblock is predicted is called
intra_chroma_pred_mode. Based on the intra_chroma_pred_mode, the bitlength is calculated
as follows.

Bitlength of intra_chroma_pred_mode
= 1, DC mode
= 3, Horizontal mode
= 3, Vertical mode
= 5, Plane Prediction mode (16)

2.2.5) MOTION VECTOR DIFFERENCE (mvd_l0)
There are two motion vector difference components per block in a Macroblock. The
bitlength of a motion vector difference component is given below.

Bitlength of mvd_l0 = 2 x ‫ڿ‬log ଶ ሺ|mvd_l0| + 1ሻ‫1 + ۀ‬ (17)

2.2.6) CODED BLOCK PATTERN (coded_block_pattern)
Coded Block Pattern is a 6-bit pattern. The most significant 2-bits represent coded block
pattern of chroma components, i.e., cbp_chroma. The last 4-bits represent coded block
pattern of luma components, i.e., cbp_luma. The bitlength of Coded Block Pattern is selected
from Table A.7 based on the type of Macroblock (i.e., Intra or Inter Macroblock).

2.2.7) DELTA QP OF MACROBLOCK (mb_qp_delta)
The equation (17) can be used to find the bitlength of Delta QP of Macroblock.

= 2 x ‫ڿ‬logଶ ሺ|mb_qp_delta| + 1ሻ‫1 + ۀ‬
Bitlength of mb_qp_delta
(18)

SECTION 3: COMPLEXITY ANALYSIS

The complexity refers here the computational complexity. There are two different types of
frames, namely I-frame and P-frame in H.264 Baseline Encoder. The complexity analysis
done for I-frame and P-frame is explained below.

3.1) COMPLEXITY ANALYSIS FOR I-FRAME
The reference Encoder Software like JM [10] and X264 [11], will have the following
complexity for a macroblock. In Intra frame, the residue coefficients, its bitstreams and

176


bitlength of all the nine modes for a sub-macroblock are calculated and stored till RD-Cost
calculation and finalization of Intra4x4 mode. Let ‘G’ be complexity of generation of
bitstream and ‘B’ be the bitlength calculation. So the total complexity of one sub-macroblock
is B + G. For a macroblock, there are 16 x 9 = 144 residue bitstreams generated by CAVLC
Encoder and Exp-Golomb Encoder. For Intra 4x4 Prediction, there will be 144 (B + G)
complexity. The Intra 16x16 Prediction will have 17 x 4 = 68 residue bitstreams, so 68 (B +
G) complexity. For Intra Chroma Prediction, there will be 2 x 5 x 4 = 40 residue bitstreams,
so 40 (B + G) complexity. For an Intra Macroblock, the complexity is (144 + 68 + 40 = 252)
(B + G) = 252B + 252G.
The proposed method will have the following complexity. LetB be the complexity of
bitlength estimation of a sub-macroblock in the proposed method.
In Intra 4x4 Prediction, for each sub-macroblock, the bitlengths for 9 modes are estimated.
But bitstream is generated for the finalized mode. So the complexity is 144B + 16 (B + G).
In Intra 16x16 Prediction, for each macroblock, the bitlengths for 4 modes are estimated.
In Intra Chroma Prediction, for each macroblock, the bitlengths for 4 modes are estimated.
For an Intra Macroblock, the complexity is (144 + 68 + 40) B + (16 + 17 + 10) (B + G) =
252B + 43 (B + G).
For bitlength calculation, in Reference Softwares, the bitlength is calculated by finding the
sum of lengths of Suffix and Prefix of each level. But in the proposed method, bitlength itself
is calculated. So B ≈ 2B is assumed. For bitstream generation, the codes are found based on
Suffixlength and Prefixlength and converted to bitstream in Reference Softwares. So
complexity G is almost equal to complexity B, G = B.
Keeping B ≈ 2B, G = B, G > 0, B > 0 andB > 0, the complexity of Reference Software
is 252G + 252B = 1008B and that of proposed method is 252B + 43 (B + G) = 424B. This
clearly indicates that the I-frame’s complexity by the proposed method is 40% of that by
Reference software.

3.2) COMPLEXITY ANALYSIS FOR P-FRAME
In Inter frame, each 8x8 sub-macroblock will have 4 different blocktypes (8x8 / 8x4 / 4x8
/ 4x4). For a macroblock, Inter 8x8 Prediction will have 4 blocktype/Partition x 4 sub-
macroblocks/blocktype x 4 Partitions = 64 sub-macroblocks. Each sub-macroblock will have
(B + G) complexity. Inter 8x8 Prediction will have 64 (B + G) complexity. In addition to that,
Inter Chroma Prediction will have 10 (B + G) complexity. Inter 8x16 Prediction, Inter 16x8
Prediction, Inter 16x16 Prediction and SKIP Prediction will have 26 (B + G) complexity
each. For an Inter Macroblock, the complexity is (64 + 10 + 4x26 = 178) (B + G) = 178B +
178G = 712B.

The proposed method gives the following:
Inter 8x8 Prediction will have 74B complexity. The others will have 4x26B = 104B
complexity. But the bitstream is generated once and its complexity is (16+10) (B + G). For an
Inter Macroblock, the complexity is (74 + 104) B + 26 (B + G) = 178B + 26 (B + G) =
282B.
Allowing Intra Prediction in Inter Frame, the result will be as follows.
(282B + 424B) < (712B + 1008B)
(706B) < (1720B) (19)

177


Equation (19) clearly indicates that the P-frame’s complexity by the proposed method is
40% of that by Reference software.
The complexity of bitlength estimation of syntax elements in the proposed method
(CSyE) is half of that of reference software (CSyE), as reference software computes the
bitlength and generates the bitstream, i.e., 2CSyE = CSyE.

SECTION 4: EXPERIMENTAL RESULTS

For a given 4x4 residue sub-macroblock, the bitlength is calculated for different
Quantization Parameters (QP) varying from 0 to 51 and for different nC values (nC = 1, 2, 4,
8). The bitlength of the residue coefficients is computed by CAVLC method, i.e., generating
bitstream and calculating the number of bits in the bitstream. The time taken to calculate the
bitlength for the above is noted. The bitlength is estimated by the proposed method. The
bitlengths have been found exactly matching with the standard method (CAVLC). The time
taken to do the above by the proposed method is noted. The complexity of computing
bitlength of residue coefficients is measured in terms of normalized time for different QP and
different context information (nC). The normalized time is calculated by dividing the time
plot by the maximum time taken by CAVLC method. The normalized time plot for different
nC and for different QPs are shown in Figure 2 – 5. They clearly show the complexity
reduction in terms of normalized time. For lower QP (QP < 40), the advantage of complexity
reduction is more than that of higher QP (QP > 40).

Figure 2 Complexity comparison at nC = 1

178





179


SECTION 5: CONCLUSION

In H.264 Baseline Encoder, the intra mode is decided by evaluating all the modes by
means of RD-Cost. While RD-Cost, the bitstreams of all nine modes will be generated, and
only one mode is finalized. Generation of bitstreams for all modes is not necessary, instead
the bitlength is important. A method is proposed in this paper to calculate the bitlength of
residue and syntax elements while RD-Cost process. Here, the estimation of bitlength of
residue coefficients and syntax elements is accurate with the CAVLC and Exp-Golomb
Encoder for the same quality. But reduction of complexity accelerates the performance of
hardware / software H.264 Encoder. The complexity may be computed in terms of arithmetic
and logical operations, memory usage or speed. Here the complexity is measured in terms of
normalized time. The proposed method definitely has at least 40% reduced complexity than
any reference softwares like JM, X264. The tables (2 – 8) mentioned here may be used in
hardware applications. This proposed method finds reduction of complexity in Bitrate
adaptation and Framerate adaptation.

ACKNOWLEDGMENT

The author thanks RiverSilica Technologies Private Limited, INDIA for their
encouragement and support.

REFERENCES

[1] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC
video coding standard,” IEEE Transactions on Circuits and Systems for Video Technology.,
Vol. 13, No. 7, pp. 560–576, Jul. 2003.
[2] A. Joch, F. Kossentini, H. Schwarz, T. Wiegand, and G. J. Sullivan, “Performance
comparison of video standards using Lagrangian control,” in Proc. IEEE Int. Conf. Image
Process., 2002, pp. 501–504.
[3] Gary J. Sullivan, Senior Member, IEEE and Thomas Wiegand, “Video Compression—
From Concepts to the H.264/AVC Standard”, Proceedings of the IEEE, Pg.18, Vol. 93, No.
1, January 2005
[4] Yao-Chung Lin, Torsten Fink, Erwin Bellers, “Fast Mode Decision for H.264 based on
Rate-Distortion Cost Estimation”, IEEE International Conference on Acoustics, Speech and
Signal Processing, ICASSP 2007, Vol. 1, pp. I-1137 – I-1140, 2007
[5] Yu-Ming Lee, Yu-Ting Sun, and Yinyi Lin, “SATD-Based Intra Mode Decision for
H.264/AVC Video Coding”, IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 20, No. 3, March 2010
[6] Mohammed Golam Sarwer and Lai-Man Po, Member, IEEE, “Fast Bit Rate Estimation
for Mode Decision of H.264/AVC”, IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 17, No. 10, October 2007
[7] Yu-Kuang Tu, Jar-Ferr Yang and Ming-Ting Sun, “Efficient Rate-Distortion Estimation
for H.264/AVC Coders”, IEEE Transactions on Circuits and Systems for Video Technology,
Vol. 16, No. 5, May 2006
[8] Iain E.G. Richardson, “H.264 and MPEG-4 Video Compression”, John Wiley & Sons Ltd
[9] H.264 Standard – Advanced Video Coding for generic audiovisual services, ITU-T
Recommendation, 11/2007
[10] Joint Video Team, JM Reference Software V17.1 - iphome.hhi.de/suehring/tml/
[11] X264 Encoder Open Source - http://www.videolan.org/developers/x264.html

180


ANNEXURE A

Table A.1 Bitlength for (T1, TC)

T1 TC 1 2 3 4 5 6
0 0 1 2 4 6 2 1
0 1 6 6 6 6 6 7
0 2 8 6 6 6 6 7
0 3 9 7 6 6 6 9
0 4 10 8 7 6 6 9
0 5 11 8 7 6 - 10
0 6 13 9 7 6 - 11
0 7 13 11 7 6 - 12
0 8 13 11 8 6 - 13
0 9 14 12 8 6 - -
0 10 14 12 9 6 - -
0 11 15 12 9 6 - -
0 12 15 13 9 6 - -
0 13 16 13 10 6 - -
0 14 16 13 10 6 - -
0 15 16 14 10 6 - -
0 16 16 14 10 6 - -
1 1 2 2 4 6 1 2
1 2 6 5 5 6 6 7
1 3 8 6 5 6 7 7
1 4 9 6 5 6 8 9
1 5 10 7 5 6 - 10
1 6 11 8 6 6 - 11
1 7 13 9 6 6 - 12
1 8 13 11 7 6 - 12
1 9 14 11 8 6 - -
1 10 14 12 8 6 - -
1 11 15 12 9 6 - -
1 12 15 13 9 6 - -
1 13 15 13 9 6 - -
1 14 16 14 10 6 - -
1 15 16 14 10 6 - -
1 16 16 14 10 6 - -
2 2 3 3 4 6 3 3
2 3 7 6 5 6 7 7
2 4 8 6 5 6 8 7
2 5 9 7 5 6 - 9
2 6 10 8 6 6 - 10
2 7 11 9 6 6 - 11
2 8 13 11 7 6 - 12
2 9 13 11 7 6 - -
2 10 14 12 8 6 - -
2 11 14 12 8 6 - -
2 12 15 13 9 6 - -
2 13 15 13 9 6 - -
2 14 16 13 10 6 - -
2 15 16 14 10 6 - -
2 16 16 14 10 6 - -
3 3 5 4 4 6 6 5
3 4 6 4 4 6 7 6
3 5 7 5 4 6 - 7
3 6 8 6 4 6 - 7
3 7 9 6 4 6 - 10
3 8 10 7 5 6 - 11
3 9 11 9 6 6 - -
3 10 13 11 7 6 - -
3 11 14 11 8 6 - -
3 12 14 12 8 6 - -
3 13 15 13 9 6 - -

181


T1 TC 1 2 3 4 5 6
3 14 15 13 10 6 - -
3 15 16 13 10 6 - -
3 16 16 14 10 6 - -

Table A.2 Suffix length
Suffix length to Non zero
be set Coefficient
0 0
1 1-3
2 4-6
3 7-12
4 13-24
5 25-48
6 >48

Table A.3 Total zeros table for 2x2 Chroma DC (4:2:0)
TZ / TC 1 2 3
0 1 1 1
1 2 2 1
2 3 2 -
3 3 - -

Table A.4 Total zeros table for 2x4 Chroma DC block (4:2:2)
TZ/TC 1 2 3 4 5 6 7
0 1 3 3 3 2 2 1
1 3 2 3 2 2 2 1
2 3 3 2 2 2 1
3 4 3 2 2 2
4 4 3 3 3
5 4 3 3
6 4 1 3

Table A.5 Total zeros table for 4x4 blocks
TZ/TC 1 2 3 4 5 6 7 8
0 1 3 4 5 4 6 6 6
1 3 3 3 3 4 5 5 4
2 3 3 3 4 4 3 3 5
3 4 3 3 4 3 3 3 3
4 4 3 4 3 3 3 3 2
5 5 4 4 3 3 3 2 2
6 5 4 3 3 3 3 3 3
7 6 4 3 4 3 3 4 3
8 6 4 4 3 4 4 3 6
9 7 5 5 4 5 3 6
10 7 5 5 5 4 6
11 8 6 6 5 5
12 8 6 5 5
13 9 6 6
14 9 6
15 9

Table A.5 Total zeros table for 4x4 blocks
TZ/TC 9 10 11 12 13 14 15
0 6 5 4 4 3 2 1
1 6 5 4 4 3 2 1
2 4 3 3 2 1 1
3 2 2 3 1 2
4 2 2 1 3
5 3 2 3
6 2 4
7 5

182


Table A.6 Bitlength of (RB, zerosLeft)
zerosLeft
RB
1 2 3 4 5 6 >6
0 1 1 2 2 2 2 3
1 1 2 2 2 2 3 3
2 - 2 2 2 3 3 3
3 - - 2 3 3 3 3
4 - - - 3 3 3 3
5 - - - - 3 3 3
6 - - - - - 3 3
7 - - - - - - 4
8 - - - - - - 5
9 - - - - - - 6
10 - - - - - - 7
11 - - - - - - 8
12 - - - - - - 9
13 - - - - - - 10
14 - - - - - - 11

Table A.7 Bitlength of coded block pattern
cbp Intra Inter cbp Intra Inter
0 5 1 24 11 11
1 9 3 25 11 11
2 9 5 26 9 11
3 9 7 27 5 11
4 11 5 28 9 11
5 9 7 29 5 11
6 11 9 30 7 11
7 7 7 31 3 9
8 11 5 32 11 5
9 11 9 33 11 9
10 9 7 34 11 9
11 7 7 35 9 9
12 9 7 36 11 9
13 7 9 37 9 9
14 7 9 38 11 11
15 3 7 39 7 9
16 9 3 40 11 9
17 11 11 41 11 11
18 11 11 42 9 9
19 9 11 43 7 9
20 11 11 44 9 9
21 9 11 45 7 9
22 11 11 46 9 11
23 5 11 47 1 7

183

Estimation of bitlength of transformed quantized residue

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (10)

Ähnlich wie Estimation of bitlength of transformed quantized residue

Ähnlich wie Estimation of bitlength of transformed quantized residue (20)

Mehr von IAEME Publication

Mehr von IAEME Publication (20)

Estimation of bitlength of transformed quantized residue