SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
Reshmi S, Sreenesh Shasidharan / International Journal of Engineering Research and
                    Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue6, November- December 2012, pp.1271-1275
    Porting and Optimization of ITU-T G.729.1 codec on SC3850
                            DSP core
                               Reshmi S1, Sreenesh Shasidharan 2
                        1
                         (M.Tech Student, Department of Electronics & Communication,
                             Federal Institute of Science & Technology, Ankamaly)
                      2
                        (Assistant Professor, Department of Electronics & Communication,
                             Federal Institute of Science & Technology, Ankamaly)


Abstract—This paper presents a methodology                 quality by exploiting the full range of human
towards the details of optimization for the real-          speech.G.729.1 codec- an 8-32 Kbit/s scalable
time implementation of ITU’s G.729.1 codec on              wideband speech and audio codec was standardized
SC3850 DSP core running at 1 GHZ clock speed.              by ITU-T in May 2006 as a part of standardization
The main aim is the optimized fixed point                  activity held in Jan 2004. This is the former speech
implementation of G.729.1 speech coding                    codec with an embedded scalable structure and is
algorithm on SC3850 DSP core. Optimizations                backward compatible with existing ITU G.729 speech
were done in C language to reduce the execution            coding standard. As bit rate increases quality of
speed of the codec by exploiting the architectural         speech will increase proportionally due to the scalable
features of the target device. First the reference         structure of the codec. Scalable coding technology
code provided by ITU-T’s published documents               used by the G.729.1 codec allows the proper selection
was modified to meet the requirements in C                 of bit-rate during transmission by any communication
language. Next by exploiting the features of               component. This is done by simply truncating the
SC3850core, the critical parts of our code that            bitstream. Coder operates in many modes and high-
consumes more cycles were modified in order to             quality speech communication is obtained with the
reduce their execution time. In each stage, the            low-delay mode. The main applications of this codec
correctness of the implementation was verified by          are: VoIP (IP telephony) including IP phones, VoIP
testing our codes against testing vectors provided         handsets, voice recording equipments audio/video
by ITU-T. More than 90 percent improvement                 conferencing, media servers/gateways, call center
over the baseline codec was achieved by the                equipment voice messaging servers. G.729 Annex J
systematic implementation of these optimization            and G.729EV is another name for G.729.1 where EV
techniques for G.729.1 codec on SC3850 core.               denotes Embedded Variable (bit rate).
    Keywords:bitrate,G.729.1codec,MCPS,optimizati          StarCore's variable length execution set (VLES)
on,SC3850                                                  architecture processors are largely used in codecs due
                                                           to lower power consumption, efficient compilability,
                I. INTRODUCTION                            and very compact code density. Merging of VLES and
          Nowadays, the speech communication               DSP technologies on to a single core/system has
services demanded by the current tech-savvy globe are      mitigated the ever increasing requirements of signal
ever-increasing. The numbers of requests for services      processing applications on several platforms. Here we
augment with number of users. Hence the most               are going to discuss the implementation of ITU’s
straight forward way for the designers of                  G.729.1 fixed point standard on SC3850 digital signal
communication systems to fulfill this increasing need      processor.
is providing transmission canals with lower bandwidth
                                                                   II. OVERVIEW OF G.729.1 CODEC
to each user. Compression is the most efficient
                                                                     G.729.1 codec is an 8-32 kbit/s scalable
technique for that. The objective of speech
                                                           wideband speech and audio coding algorithm
compression is to represent a digitized speech signal
                                                           interoperable with G.729, G.729A and G.729B[2].
using      reduced bit-rate as possible so that the
                                                           The input which is an analog audio signal is sampled
reconstructed speech signal maintains an optimum
                                                           at 8 KHz or 16 KHz(default)sampling frequency. The
level of perceptual quality.
                                                           resulting signal is then quantized and converted to
 VoIP is nowadays facing new challenges in terms of
                                                           16-bit linear PCM which serves as the input to the
quality of service and efficiency in networks. Wide-
                                                           encoder. Decoder output is also 16-bit linear PCM.
band audio codecs are extensively used to meet these
                                                           The output bitstream of encoder is scalable and is
challenges by allowing a flat transition of voice
                                                           structured in 12 embedded layers. This layers
quality from narrowband (300-3400 Hz) to wideband
                                                           corresponds to 12 available bit rates from 8 to 32
(50-7000 Hz). They are capable of providing most
                                                           kbit/s with core layer interoperable with G.729. The
lifelike conversations with higher level of fidelity and
                                                           output bandwidth of G.729.1 is 50-4000 Hz at 8 and
                                                           12 kbit/s and 50-7000 Hz from 14 to 32 kbit/s
                                                           (per 2 kbit/s steps). The G.729.1 coder operates on

                                                                                                1271 | P a g e
Reshmi S, Sreenesh Shasidharan / International Journal of Engineering Research and
                    Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue6, November- December 2012, pp.1271-1275
20 milliseconds frames of input speech, corresponding      Time-domain aliasing cancellation (TDAC) encoder
to 320 samples at a sampling rate of 16000 samples         which is a transform-based coder jointly encodes the
per second. In-order to consistent with G.729 using 10     lower-band CELP difference signal and the higher-
ms frame and 5 ms subframes [3], two 10 ms CELP            band signal SHB(n).In addition, parameter-level
frames are processed per 20 ms frame. Hence 20ms           redundancy in the bitstream is introduced by the frame
frames are referred to as superframes.                     erasure concealment (FEC) encoder by transmitting
                                                           the signal classification information, energy
A. Encoding priciple
                                                           information, and phase information [2].
         Encoding is done in such a way that the input
speech samples are analyzed in order to extract the        B. Decoding principle
parameters of the speech synthesis model.                            G.729.1 decoder is illustrated in Figure 2.At
Transmitting these parameters as opposed to                the receiver, the transmitted parameters are decoded.
compressed speech samples saves the bandwidth and          The decoder operation is based on the received bit-rate
reduces the noise. At the receiving side parameters are    or actual number of received layers. If the received bit
used to synthesize and reconstruct speech.                 rates are 8 or 12 kbit/s the CELP decoder done the job
 The G.729.1 encoder is illustrated in Figure 1. The       of decoding by reconstructing a lower-band signal
encoder operates at the maximal bit-rate of 32 kbit/s.     (50-4000 Hz).This is then post-filtered and pre-
The coder is organized in three-stages: embedded           processed by a high-pass filter to increase the
Code-Excited Linear-Prediction (CELP) coding,              subjective quality and reduces the coding error. The
Time-Domain Bandwidth Extension (TDBWE) and                resulting signal is upsampled to 16 kHz using the
predictive transform coding [2]. Layers 1 and 2 is         QMF synthesis filterbank. If the received bit rate is 14
generated by the embedded CELP stage which gives a         kbit/s, the TDBWE decoder reconstructs a higher-
narrowband output(50-4000 Hz) at 8 and 12 kbit/s.          band signal SBWE(n) which is upon merging with
Layer 3 is generated by the TDBWE stage producing          narrowband enhancement layer (12 kbit/s) produces a
a wideband output (50-7000 Hz) at 14 kbit/s. Layers 4      wide-band output of 50-7000 Hz. If the received bit
to 12 are generated by the TDAC stage operates in          rate is between 16 to 32 kbit/s, the TDAC decoder
the Modified Discrete Cosine Transform (MDCT)              reconstructs both a lower-band difference signal and a
domain at 14 to 32 kbit/s.                                 higher-band signal. Inorder to mitigate the
                                                           pre/postecho artefacts due to transform coding the
                                                           resultant signal is shaped in time domain. CELP
                                                           output is combined with modified TDAC lower-band
                                                           signal. The quality of the speech output can be
                                                           improved by using modified TDAC higher-band
                                                           signal instead of TDBWE output for the whole
                                                           frequency range.The signals are then upsampled to 16
                                                           KHz and combined in QMF filterbank.
                                                                                                   Adaptive     2 LP
                                                                  CELP                             post-proc
                                                                 Encoder


                                                                   TDAC                    Pre/post echo
                                                                  Encoder                    Reduction

         Figure 1.   Block diagram of encoder                                              Pre/post echo
                                                                                             reduction
                                                              SBWE(n)
           A 64-coefficient quadrature mirror filterbank
                                                                                                               2 HPF
(QMF) divides the input signal SWB(n’) into a higher               TDBWE
band and lower band. The input signal at lower-band                Encoder
                                                                                                    (-1) n
and higher-band is decimated by a factor of 2. In order
to remove the frequency components below 50Hz the
lower-band decimated signal is passed through an                        Figure 2.   Block digram of decoder
elliptical high-pass filter (HPF) with cut-off frequency
0.05 KHz .The resulting signal is encoded by a                   III.     FIXED POINT IMPLEMENTATION
narrowband embedded CELP encoder [3]. The input
                                                           A. Arcitecture
signal at higher-band is spectrally folded to get more
                                                           The StarCore SC3850 DSP core used in mainstream
exact representation and is passed through an elliptic
                                                           DSP applications, such as wireless and wireline
low-pass filter (LPF) with 3 kHz cutoff frequency to
                                                           communications, on both the infrastructure and the
remove the frequency components below 3 KHz. The
                                                           subscriber sides. The SC3850 core is having a
resulting signal is then encoded by parametric
                                                           variable-length execution set (VLES) execution
time-domain bandwidth extension (TDBWE) encoder.

                                                                                                    1272 | P a g e
Reshmi S, Sreenesh Shasidharan / International Journal of Engineering Research and
                    Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue6, November- December 2012, pp.1271-1275
model, which utilizes maximum parallelism by              number of processor cycles by analyzing the profiler
allowing multiple address generation and data             information.
arithmetic logic units to execute multiple instructions   3. Apply generic optimization techniques to improve
in a single clock cycle[4]. Target markets for the        the execution speed in C.
SC3850 architecture includes wireless and wireline        4. Analyze the disassembly the code (assembly
base stations, wireless and wireline infrastructure,      generated by the complier) to find the areas where
broadband wireless etc. Key features of the SC3850        better assembly can be written.
DSP core include the following [4]:                       5. After each optimization MCPS is calculated and
 Main core resources                                     note down the value.
         4 data ALU execution units                       6. If not satisfied with the performance of codes, go
        2 integer and address generation units            back to step 2.
        Sixteen 40-bit data register
                                                          B. Profiling
         Sixteen 32-bit address registers
                                                          Profile information of the reference code gives an
 Instruction set                                         precise estimate about the relative contribution of
        16-bit instruction set, expandable to 32- and     different modules to the total computational
        48-bit instructionsSixteen 40-bit data register   complexity. Profiling information helped to identify
         Sixteen 32-bit address registers                 critical portions of the code, which were optimized to
         Very high execution parallelism                  improve the performance. Codewarrior’s simulator
         Up to six instructions executed in a single      configuration is used for profiling.
        clock       cycle, statically scheduled
        Up to 4 data ALU instructions and 2 memory        C. MCPS Calculation
        access/integer instructions per cycle                   MCPS (million cycles per second) is a
 Data type support                                       performance metric. It is the total no. of cycles taken
        Byte (8-bit), word (16-bit), and long (32-bit)    to process all the frames It is obtained by multiplying
        data widths, supported by instructions and        the no. of cycles required to execute one frame by no.
        memory moves.                                     of frames per second as in (1).
B. Software                                                                  𝐶 ×𝑓𝑠
C language was used for high level implementation.                  MCPS=                         (1)
                                                                             𝑆×𝑁
The encoder and decoder parameters were dealt
efficiently with the help of structures. SC3850           where, C is total no. of cycles for executing one
assembly language was used for low level                  frame, fs is the sampling rate, S is the total no. of
modifications. Firstly the encoder and decoder tester     samples / frame and N is the total no. of frames .
applications were developed using Visual Studio 2008      D. Optimization Techniques
express edition by modification of reference code.        We followed both target dependent and target
Later the code is loaded to Code Warrior for SC3850       independent optimization techniques in this project.
specific implementation.                                  Target independent techniques are generalized
                                                          techniques applied at high level optimization. Target
                IV. OPTIMIZATION
          The methodology for reforming a software        dependent optimization involves techniques utilizing
                                                          architectural features of SC3850. The compiler
system in all aspects to make it work more efficiently
                                                          specific optimization level was set at –o3[5] .
utilizing fewer resources is known as optimization
                                                          Some of the techniques followed are explained as
Optimization can be done for speed and memory.
                                                          follows[5]
Here we concentrated on speed optimization for
superior performance.
The optimization tool used in this project was Code         1) Making use of intrinsics
Warrior 10.2.9. It is latest version from the             An intrinsic function is a way to instruct the compiler
CodeWarrior family provides a graphical user              to use a specific assembly language instruction
interface for managing software development projects.     [5].Many functions in the reference code were
CodeWarrior 10.2.9 can be used to develop C, C++,         replaced by intrinsic after studying its exact
and assembly language code targeted at many               functionality. The replacement was done with proper
                                                          care to maintain bit-exactness of the original ITU-T
processors.
                                                          implementation.. The StarCore C/C++ compiler
A. Optimization strategy                                  substitutes the intrinsic function call with a set of
The strategy we followed in the optimization is as        designated assembly language instructions. Intrinsics
follows                                                   improves performance of generated code and exploits
1. Initial MCPS of the code is calculated by running      the architectural features of SC3850 that can’t be done
the project with worst-case and all the optimization      by normal C language.
parameters enabled.
2. Profile all the functions in the implementation.
Identify the critical functions that takes the more

                                                                                               1273 | P a g e
Reshmi S, Sreenesh Shasidharan / International Journal of Engineering Research and
                    Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue6, November- December 2012, pp.1271-1275
   2) Function inlining                                    reduced memory moves with high degree of
Function inlining is the process of substituting the       parallelism.
called function code in the place of functional call          7) Making use of overflow flag
thereby eliminating functional call overhead .Inlining     SC3850 Exception and mode register (EMR) has an
can be done by instructing the StarCore compiler           overflow flag called DOVF which is set if saturation
using special #pragma directives. This technique           results due to some of the arithmetic instructions.
improves execution time at the expense of larger code      Overflow check can be done efficiently by monitoring
size. Small functions that are frequently-called are the   this flag and this can be directly accessed by making
best candidates for inlining in the C code. The criteria   use of assembly routines.
for the selecting the best candidate depends upon the
number and type of parameters the function passed,                  V. VERIFICATION AND TESTING
the data type of the returned value and register                     At each stage of optimization we verified the
allocation in the resulting assembly code.                 correctness of our implementation using the test
                                                           vectors included as a part of G.729.1
   3) Fast 32-bit DPF format operations                    recommendation. Each of these test vectors includes
The ITU-T G.729EV reference code was originally            an input and output file. For each input file, the output
designed for 16-bit processors which do not support        file of an implementation should match the prescribed
32-bit operations. Thus the 32-bit operations were         output for the specified input in a bit exact manner. By
done using a 32-bit double precision format in which       doing this testing phase we became assured that our
data is represented with a precision of 1 in 2^31.Some     implementation was fully compliant with pseudo
functions which originally took the 32-bit parameters      codes provided by CCITT.
in two higher and lower 16-bit half words and
perform 16-bit operations on them were replaced with                VI. RESULTS AND CONCLUSION
single 32-bit DPF parameter. Instead of using two                   In this paper we presented fixed-point
separate DALU registers for two 16-bit half words          optimized implementation of G.729.1 codec (annex J)
now only a single DALU register for a 32-bit word is       on SC3850 DSP core. First the details of
used. Thus, functions that originally received two         implementation of this algorithm in C language were
pointers to two 16-bit arrays could now operate with       discussed. Next using our C written codes,
pointer to a single 32-bit array. Modifications are        optimization was done for real-time implementation.
implemented in DPF format to maintain bit-exactness        We suggested a strategy for optimization process
of the original ITU-T implementation. This                 which can be used in other similar researches. Using
optimization makes use of the processors 32-bit            the proposed strategy, a set of different techniques in
capabilities and got 50% reduction in MCPS and             order to reduce consuming cycles of G.729.1
memory moves since the parameters passed to                algorithm were introduced. These techniques included
function reduced by half.                                  some methods based on SC3850 architecture and
   4) Loop unrolling                                       some methods specifically for optimization of G.729.1
 Looping overhead in software which increases the          algorithm. . The codec can be easily used for real-time
cycle count can be reduced by loop unrolling.              applications.The speed of operation of codec is
Unrolling the loop        N times can be done by           improved by 90% and it’s a great success.
duplicates the loop body N times and decreasing the
loop counter by a factor of N.Since the architecture of    Table.1 shows the MCPS consumption of the codec
SC3850 contains 4 ALU’s unrolling the bigger loops         before and after optimization for a sampling rate of 16
(higher loop count) by a factor of N=4 gave better         KHz .
performance. This significantly reduces the MCPS,
since it maximizes the efficient usage of multiple                                   After                   %
ALU execution units simultaneously and multiple                        Initial                  Final
                                                           G.729.1                  adding                 improv
register moves. Unrolling the inner loops reduces the                  MCPS                     MCPS
                                                            Codec                  Intrinsic               ement
function looping overhead significantly, as loop                                   (MCPS)
counter needs to be updated less often and fewer
branches are executed.                                     Encoder
                                                                       400            245       28         93
   5) Loop merging
Combining two or more loops of same loop count into
a single loop loads the ALU more efficiently. Loop         decoder
                                                                       240            215        22         90
fusion is a process of combining multiple loops, which
reuse the same data, into one loop. The advantage of
this is that data can be reused, reducing the memory
access, and loop overhead is reduced.
   6) Use of SIMD2 instructions
This optimization will make efficient use of MAC
registers and enables packet data movement results in


                                                                                                 1274 | P a g e
Reshmi S, Sreenesh Shasidharan / International Journal of Engineering Research and
                     Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                     Vol. 2, Issue6, November- December 2012, pp.1271-1275


                ACKNOWLEDGMENT
I gratefully acknowledge the contributions of Federal
Institute of science and technology, ankamaly for the
support and assistance for the successful completion
of this project .

                  REFERENCES

[1]   Wai     c     Chu Speech Coding Algorithms
      Foundation and Evolution of Standardized
      Coders- Mobile Media Laboratory DoCoMo
      USA Labs San Jose, California.
[2]   ITU-T Rec. G.729.1, “G.729 based Embedded
      Variable bit-rate coder - An 8-32kbits/s scalable
      wideband coder bitstream interoperable with
      G.729” May 2006.
[3]   ITU-T Rec. G.729, “Coding of Speech at 8 kbit/s
      using Conjugate Structure Algebraic Code
      Excited Linear Prediction (CSACELP),”March
      1996
[4]   MSC8156 Reference Manual Six Core Digital
      Signal Processor MSC8156RM Rev 2, June
      2011
[5]   C Code Optimization Examples for the StarCore
      SC3850 Core,
      Rev 0, April 2010




                                                                               1275 | P a g e

Weitere ähnliche Inhalte

Was ist angesagt?

Generation and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codesGeneration and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codesIOSR Journals
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
FPGA Based Power Efficient Chanalizer For Software Defined Radio
FPGA Based Power Efficient Chanalizer For Software Defined RadioFPGA Based Power Efficient Chanalizer For Software Defined Radio
FPGA Based Power Efficient Chanalizer For Software Defined RadioIJMER
 
Speech coding standards2
Speech coding standards2Speech coding standards2
Speech coding standards2elroy25
 
Software Design of Digital Receiver using FPGA
Software Design of Digital Receiver using FPGASoftware Design of Digital Receiver using FPGA
Software Design of Digital Receiver using FPGAIRJET Journal
 
Low power fpga solution for dab audio decoder
Low power fpga solution for dab audio decoderLow power fpga solution for dab audio decoder
Low power fpga solution for dab audio decodereSAT Publishing House
 
FPGA Implementation of LDPC Encoder for Terrestrial Television
FPGA Implementation of LDPC Encoder for Terrestrial TelevisionFPGA Implementation of LDPC Encoder for Terrestrial Television
FPGA Implementation of LDPC Encoder for Terrestrial TelevisionAI Publications
 
AN OPTIMIZED H.264-BASED VIDEO CONFERENCING SOFTWARE FOR ...
AN OPTIMIZED H.264-BASED VIDEO CONFERENCING SOFTWARE FOR ...AN OPTIMIZED H.264-BASED VIDEO CONFERENCING SOFTWARE FOR ...
AN OPTIMIZED H.264-BASED VIDEO CONFERENCING SOFTWARE FOR ...Videoguy
 
Design of Variable Digital FIR Filter for Software Defined Radio Applications
Design of Variable Digital FIR Filter for Software Defined Radio ApplicationsDesign of Variable Digital FIR Filter for Software Defined Radio Applications
Design of Variable Digital FIR Filter for Software Defined Radio ApplicationsIRJET Journal
 
Basics of speech coding
Basics of speech codingBasics of speech coding
Basics of speech codingsakshij91
 
Combining cryptography with channel coding to reduce complicity
Combining cryptography with channel coding to reduce complicityCombining cryptography with channel coding to reduce complicity
Combining cryptography with channel coding to reduce complicityIAEME Publication
 

Was ist angesagt? (20)

Y25124127
Y25124127Y25124127
Y25124127
 
Generation and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codesGeneration and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codes
 
I010435659
I010435659I010435659
I010435659
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
FPGA Based Power Efficient Chanalizer For Software Defined Radio
FPGA Based Power Efficient Chanalizer For Software Defined RadioFPGA Based Power Efficient Chanalizer For Software Defined Radio
FPGA Based Power Efficient Chanalizer For Software Defined Radio
 
Speech coding standards2
Speech coding standards2Speech coding standards2
Speech coding standards2
 
Software Design of Digital Receiver using FPGA
Software Design of Digital Receiver using FPGASoftware Design of Digital Receiver using FPGA
Software Design of Digital Receiver using FPGA
 
79 83
79 8379 83
79 83
 
Low power fpga solution for dab audio decoder
Low power fpga solution for dab audio decoderLow power fpga solution for dab audio decoder
Low power fpga solution for dab audio decoder
 
SPEECH CODING
SPEECH CODINGSPEECH CODING
SPEECH CODING
 
Mk3621242127
Mk3621242127Mk3621242127
Mk3621242127
 
40120140504002
4012014050400240120140504002
40120140504002
 
FPGA Implementation of LDPC Encoder for Terrestrial Television
FPGA Implementation of LDPC Encoder for Terrestrial TelevisionFPGA Implementation of LDPC Encoder for Terrestrial Television
FPGA Implementation of LDPC Encoder for Terrestrial Television
 
Speech encoding techniques
Speech encoding techniquesSpeech encoding techniques
Speech encoding techniques
 
AN OPTIMIZED H.264-BASED VIDEO CONFERENCING SOFTWARE FOR ...
AN OPTIMIZED H.264-BASED VIDEO CONFERENCING SOFTWARE FOR ...AN OPTIMIZED H.264-BASED VIDEO CONFERENCING SOFTWARE FOR ...
AN OPTIMIZED H.264-BASED VIDEO CONFERENCING SOFTWARE FOR ...
 
Ff34970973
Ff34970973Ff34970973
Ff34970973
 
Design of Variable Digital FIR Filter for Software Defined Radio Applications
Design of Variable Digital FIR Filter for Software Defined Radio ApplicationsDesign of Variable Digital FIR Filter for Software Defined Radio Applications
Design of Variable Digital FIR Filter for Software Defined Radio Applications
 
43 131-1-pb
43 131-1-pb43 131-1-pb
43 131-1-pb
 
Basics of speech coding
Basics of speech codingBasics of speech coding
Basics of speech coding
 
Combining cryptography with channel coding to reduce complicity
Combining cryptography with channel coding to reduce complicityCombining cryptography with channel coding to reduce complicity
Combining cryptography with channel coding to reduce complicity
 

Andere mochten auch (20)

Aa26172179
Aa26172179Aa26172179
Aa26172179
 
Ag26204209
Ag26204209Ag26204209
Ag26204209
 
Co26628632
Co26628632Co26628632
Co26628632
 
Ep24889895
Ep24889895Ep24889895
Ep24889895
 
Amo Conjunto de Canetas
Amo Conjunto de CanetasAmo Conjunto de Canetas
Amo Conjunto de Canetas
 
Informacion eliminada de un celular
Informacion eliminada de un celular Informacion eliminada de un celular
Informacion eliminada de un celular
 
Unidad 08
Unidad 08Unidad 08
Unidad 08
 
Presentación1
Presentación1Presentación1
Presentación1
 
2 campañas aesleme
2 campañas aesleme2 campañas aesleme
2 campañas aesleme
 
Odc report 2
Odc report 2Odc report 2
Odc report 2
 
Lugares Colinas de Aby
Lugares Colinas de AbyLugares Colinas de Aby
Lugares Colinas de Aby
 
Kelly Paper Escondido Store Renovations
Kelly Paper Escondido Store RenovationsKelly Paper Escondido Store Renovations
Kelly Paper Escondido Store Renovations
 
Ricardo
RicardoRicardo
Ricardo
 
Edad media
Edad mediaEdad media
Edad media
 
Manuweb09
Manuweb09Manuweb09
Manuweb09
 
Organização geografica da congregação no mundo
Organização geografica da congregação no mundoOrganização geografica da congregação no mundo
Organização geografica da congregação no mundo
 
Parthiban v
Parthiban vParthiban v
Parthiban v
 
Design Afetivo e Estética da Interação
Design Afetivo e Estética da InteraçãoDesign Afetivo e Estética da Interação
Design Afetivo e Estética da Interação
 
Avós são o máximo
Avós são o máximoAvós são o máximo
Avós são o máximo
 
Disrupt it
Disrupt itDisrupt it
Disrupt it
 

Ähnlich wie Gg2612711275

HIGH SPEED CONTINUOUS-TIME BANDPASS Σ∆ ADC FOR MIXED SIGNAL VLSI CHIPS
HIGH SPEED CONTINUOUS-TIME BANDPASS Σ∆ ADC FOR MIXED SIGNAL VLSI CHIPSHIGH SPEED CONTINUOUS-TIME BANDPASS Σ∆ ADC FOR MIXED SIGNAL VLSI CHIPS
HIGH SPEED CONTINUOUS-TIME BANDPASS Σ∆ ADC FOR MIXED SIGNAL VLSI CHIPSVLSICS Design
 
An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...Videoguy
 
Data Converters in SDR Platforms
Data Converters in SDR PlatformsData Converters in SDR Platforms
Data Converters in SDR Platformsidescitation
 
EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...
EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...
EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...VLSICS Design
 
EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...
EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...
EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...VLSICS Design
 
Efficient Hardware Co-Simulation of Down Convertor for Wireless Communication...
Efficient Hardware Co-Simulation of Down Convertor for Wireless Communication...Efficient Hardware Co-Simulation of Down Convertor for Wireless Communication...
Efficient Hardware Co-Simulation of Down Convertor for Wireless Communication...VLSICS Design
 
VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...
VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...
VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...VLSICS Design
 
VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...
VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...
VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...VLSICS Design
 
Design and Implementation of Low Power High Speed Symmetric Decoder Structure...
Design and Implementation of Low Power High Speed Symmetric Decoder Structure...Design and Implementation of Low Power High Speed Symmetric Decoder Structure...
Design and Implementation of Low Power High Speed Symmetric Decoder Structure...Dr. Amarjeet Singh
 
Hybrid ldpc and stbc algorithms to improve ber reduction in ofdm
Hybrid ldpc and stbc algorithms to improve ber reduction in ofdmHybrid ldpc and stbc algorithms to improve ber reduction in ofdm
Hybrid ldpc and stbc algorithms to improve ber reduction in ofdmIAEME Publication
 
A prototyping of software defined radio using qpsk modulation
A prototyping of software defined radio using qpsk modulationA prototyping of software defined radio using qpsk modulation
A prototyping of software defined radio using qpsk modulationIAEME Publication
 
Design and implementation of DADCT
Design and implementation of DADCTDesign and implementation of DADCT
Design and implementation of DADCTSatish Kumar
 
A 15 bit third order power optimized continuous time sigma delta modulator fo...
A 15 bit third order power optimized continuous time sigma delta modulator fo...A 15 bit third order power optimized continuous time sigma delta modulator fo...
A 15 bit third order power optimized continuous time sigma delta modulator fo...eSAT Publishing House
 
Design of Low Power Sigma Delta ADC
Design of Low Power Sigma Delta ADCDesign of Low Power Sigma Delta ADC
Design of Low Power Sigma Delta ADCVLSICS Design
 
Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...IJECEIAES
 
Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377Editor IJARCET
 
Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377Editor IJARCET
 

Ähnlich wie Gg2612711275 (20)

HIGH SPEED CONTINUOUS-TIME BANDPASS Σ∆ ADC FOR MIXED SIGNAL VLSI CHIPS
HIGH SPEED CONTINUOUS-TIME BANDPASS Σ∆ ADC FOR MIXED SIGNAL VLSI CHIPSHIGH SPEED CONTINUOUS-TIME BANDPASS Σ∆ ADC FOR MIXED SIGNAL VLSI CHIPS
HIGH SPEED CONTINUOUS-TIME BANDPASS Σ∆ ADC FOR MIXED SIGNAL VLSI CHIPS
 
An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...
 
Data Converters in SDR Platforms
Data Converters in SDR PlatformsData Converters in SDR Platforms
Data Converters in SDR Platforms
 
EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...
EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...
EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...
 
EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...
EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...
EFFICIENT HARDWARE CO-SIMULATION OF DOWN CONVERTOR FOR WIRELESS COMMUNICATION...
 
Efficient Hardware Co-Simulation of Down Convertor for Wireless Communication...
Efficient Hardware Co-Simulation of Down Convertor for Wireless Communication...Efficient Hardware Co-Simulation of Down Convertor for Wireless Communication...
Efficient Hardware Co-Simulation of Down Convertor for Wireless Communication...
 
VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...
VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...
VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...
 
VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...
VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...
VLSI Design of Low Power High Speed 4 Bit Resolution Pipeline ADC In Submicro...
 
Design and Implementation of Low Power High Speed Symmetric Decoder Structure...
Design and Implementation of Low Power High Speed Symmetric Decoder Structure...Design and Implementation of Low Power High Speed Symmetric Decoder Structure...
Design and Implementation of Low Power High Speed Symmetric Decoder Structure...
 
Hybrid ldpc and stbc algorithms to improve ber reduction in ofdm
Hybrid ldpc and stbc algorithms to improve ber reduction in ofdmHybrid ldpc and stbc algorithms to improve ber reduction in ofdm
Hybrid ldpc and stbc algorithms to improve ber reduction in ofdm
 
A prototyping of software defined radio using qpsk modulation
A prototyping of software defined radio using qpsk modulationA prototyping of software defined radio using qpsk modulation
A prototyping of software defined radio using qpsk modulation
 
Design and implementation of DADCT
Design and implementation of DADCTDesign and implementation of DADCT
Design and implementation of DADCT
 
A 15 bit third order power optimized continuous time sigma delta modulator fo...
A 15 bit third order power optimized continuous time sigma delta modulator fo...A 15 bit third order power optimized continuous time sigma delta modulator fo...
A 15 bit third order power optimized continuous time sigma delta modulator fo...
 
Design of Low Power Sigma Delta ADC
Design of Low Power Sigma Delta ADCDesign of Low Power Sigma Delta ADC
Design of Low Power Sigma Delta ADC
 
SDH presentation
SDH presentationSDH presentation
SDH presentation
 
Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...
 
B0530714
B0530714B0530714
B0530714
 
Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377
 
Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377Ijarcet vol-2-issue-7-2374-2377
Ijarcet vol-2-issue-7-2374-2377
 
Turbo encoder and decoder chip design and FPGA device analysis for communicat...
Turbo encoder and decoder chip design and FPGA device analysis for communicat...Turbo encoder and decoder chip design and FPGA device analysis for communicat...
Turbo encoder and decoder chip design and FPGA device analysis for communicat...
 

Gg2612711275

  • 1. Reshmi S, Sreenesh Shasidharan / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue6, November- December 2012, pp.1271-1275 Porting and Optimization of ITU-T G.729.1 codec on SC3850 DSP core Reshmi S1, Sreenesh Shasidharan 2 1 (M.Tech Student, Department of Electronics & Communication, Federal Institute of Science & Technology, Ankamaly) 2 (Assistant Professor, Department of Electronics & Communication, Federal Institute of Science & Technology, Ankamaly) Abstract—This paper presents a methodology quality by exploiting the full range of human towards the details of optimization for the real- speech.G.729.1 codec- an 8-32 Kbit/s scalable time implementation of ITU’s G.729.1 codec on wideband speech and audio codec was standardized SC3850 DSP core running at 1 GHZ clock speed. by ITU-T in May 2006 as a part of standardization The main aim is the optimized fixed point activity held in Jan 2004. This is the former speech implementation of G.729.1 speech coding codec with an embedded scalable structure and is algorithm on SC3850 DSP core. Optimizations backward compatible with existing ITU G.729 speech were done in C language to reduce the execution coding standard. As bit rate increases quality of speed of the codec by exploiting the architectural speech will increase proportionally due to the scalable features of the target device. First the reference structure of the codec. Scalable coding technology code provided by ITU-T’s published documents used by the G.729.1 codec allows the proper selection was modified to meet the requirements in C of bit-rate during transmission by any communication language. Next by exploiting the features of component. This is done by simply truncating the SC3850core, the critical parts of our code that bitstream. Coder operates in many modes and high- consumes more cycles were modified in order to quality speech communication is obtained with the reduce their execution time. In each stage, the low-delay mode. The main applications of this codec correctness of the implementation was verified by are: VoIP (IP telephony) including IP phones, VoIP testing our codes against testing vectors provided handsets, voice recording equipments audio/video by ITU-T. More than 90 percent improvement conferencing, media servers/gateways, call center over the baseline codec was achieved by the equipment voice messaging servers. G.729 Annex J systematic implementation of these optimization and G.729EV is another name for G.729.1 where EV techniques for G.729.1 codec on SC3850 core. denotes Embedded Variable (bit rate). Keywords:bitrate,G.729.1codec,MCPS,optimizati StarCore's variable length execution set (VLES) on,SC3850 architecture processors are largely used in codecs due to lower power consumption, efficient compilability, I. INTRODUCTION and very compact code density. Merging of VLES and Nowadays, the speech communication DSP technologies on to a single core/system has services demanded by the current tech-savvy globe are mitigated the ever increasing requirements of signal ever-increasing. The numbers of requests for services processing applications on several platforms. Here we augment with number of users. Hence the most are going to discuss the implementation of ITU’s straight forward way for the designers of G.729.1 fixed point standard on SC3850 digital signal communication systems to fulfill this increasing need processor. is providing transmission canals with lower bandwidth II. OVERVIEW OF G.729.1 CODEC to each user. Compression is the most efficient G.729.1 codec is an 8-32 kbit/s scalable technique for that. The objective of speech wideband speech and audio coding algorithm compression is to represent a digitized speech signal interoperable with G.729, G.729A and G.729B[2]. using reduced bit-rate as possible so that the The input which is an analog audio signal is sampled reconstructed speech signal maintains an optimum at 8 KHz or 16 KHz(default)sampling frequency. The level of perceptual quality. resulting signal is then quantized and converted to VoIP is nowadays facing new challenges in terms of 16-bit linear PCM which serves as the input to the quality of service and efficiency in networks. Wide- encoder. Decoder output is also 16-bit linear PCM. band audio codecs are extensively used to meet these The output bitstream of encoder is scalable and is challenges by allowing a flat transition of voice structured in 12 embedded layers. This layers quality from narrowband (300-3400 Hz) to wideband corresponds to 12 available bit rates from 8 to 32 (50-7000 Hz). They are capable of providing most kbit/s with core layer interoperable with G.729. The lifelike conversations with higher level of fidelity and output bandwidth of G.729.1 is 50-4000 Hz at 8 and 12 kbit/s and 50-7000 Hz from 14 to 32 kbit/s (per 2 kbit/s steps). The G.729.1 coder operates on 1271 | P a g e
  • 2. Reshmi S, Sreenesh Shasidharan / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue6, November- December 2012, pp.1271-1275 20 milliseconds frames of input speech, corresponding Time-domain aliasing cancellation (TDAC) encoder to 320 samples at a sampling rate of 16000 samples which is a transform-based coder jointly encodes the per second. In-order to consistent with G.729 using 10 lower-band CELP difference signal and the higher- ms frame and 5 ms subframes [3], two 10 ms CELP band signal SHB(n).In addition, parameter-level frames are processed per 20 ms frame. Hence 20ms redundancy in the bitstream is introduced by the frame frames are referred to as superframes. erasure concealment (FEC) encoder by transmitting the signal classification information, energy A. Encoding priciple information, and phase information [2]. Encoding is done in such a way that the input speech samples are analyzed in order to extract the B. Decoding principle parameters of the speech synthesis model. G.729.1 decoder is illustrated in Figure 2.At Transmitting these parameters as opposed to the receiver, the transmitted parameters are decoded. compressed speech samples saves the bandwidth and The decoder operation is based on the received bit-rate reduces the noise. At the receiving side parameters are or actual number of received layers. If the received bit used to synthesize and reconstruct speech. rates are 8 or 12 kbit/s the CELP decoder done the job The G.729.1 encoder is illustrated in Figure 1. The of decoding by reconstructing a lower-band signal encoder operates at the maximal bit-rate of 32 kbit/s. (50-4000 Hz).This is then post-filtered and pre- The coder is organized in three-stages: embedded processed by a high-pass filter to increase the Code-Excited Linear-Prediction (CELP) coding, subjective quality and reduces the coding error. The Time-Domain Bandwidth Extension (TDBWE) and resulting signal is upsampled to 16 kHz using the predictive transform coding [2]. Layers 1 and 2 is QMF synthesis filterbank. If the received bit rate is 14 generated by the embedded CELP stage which gives a kbit/s, the TDBWE decoder reconstructs a higher- narrowband output(50-4000 Hz) at 8 and 12 kbit/s. band signal SBWE(n) which is upon merging with Layer 3 is generated by the TDBWE stage producing narrowband enhancement layer (12 kbit/s) produces a a wideband output (50-7000 Hz) at 14 kbit/s. Layers 4 wide-band output of 50-7000 Hz. If the received bit to 12 are generated by the TDAC stage operates in rate is between 16 to 32 kbit/s, the TDAC decoder the Modified Discrete Cosine Transform (MDCT) reconstructs both a lower-band difference signal and a domain at 14 to 32 kbit/s. higher-band signal. Inorder to mitigate the pre/postecho artefacts due to transform coding the resultant signal is shaped in time domain. CELP output is combined with modified TDAC lower-band signal. The quality of the speech output can be improved by using modified TDAC higher-band signal instead of TDBWE output for the whole frequency range.The signals are then upsampled to 16 KHz and combined in QMF filterbank. Adaptive 2 LP CELP post-proc Encoder TDAC Pre/post echo Encoder Reduction Figure 1. Block diagram of encoder Pre/post echo reduction SBWE(n) A 64-coefficient quadrature mirror filterbank 2 HPF (QMF) divides the input signal SWB(n’) into a higher TDBWE band and lower band. The input signal at lower-band Encoder (-1) n and higher-band is decimated by a factor of 2. In order to remove the frequency components below 50Hz the lower-band decimated signal is passed through an Figure 2. Block digram of decoder elliptical high-pass filter (HPF) with cut-off frequency 0.05 KHz .The resulting signal is encoded by a III. FIXED POINT IMPLEMENTATION narrowband embedded CELP encoder [3]. The input A. Arcitecture signal at higher-band is spectrally folded to get more The StarCore SC3850 DSP core used in mainstream exact representation and is passed through an elliptic DSP applications, such as wireless and wireline low-pass filter (LPF) with 3 kHz cutoff frequency to communications, on both the infrastructure and the remove the frequency components below 3 KHz. The subscriber sides. The SC3850 core is having a resulting signal is then encoded by parametric variable-length execution set (VLES) execution time-domain bandwidth extension (TDBWE) encoder. 1272 | P a g e
  • 3. Reshmi S, Sreenesh Shasidharan / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue6, November- December 2012, pp.1271-1275 model, which utilizes maximum parallelism by number of processor cycles by analyzing the profiler allowing multiple address generation and data information. arithmetic logic units to execute multiple instructions 3. Apply generic optimization techniques to improve in a single clock cycle[4]. Target markets for the the execution speed in C. SC3850 architecture includes wireless and wireline 4. Analyze the disassembly the code (assembly base stations, wireless and wireline infrastructure, generated by the complier) to find the areas where broadband wireless etc. Key features of the SC3850 better assembly can be written. DSP core include the following [4]: 5. After each optimization MCPS is calculated and  Main core resources note down the value. 4 data ALU execution units 6. If not satisfied with the performance of codes, go 2 integer and address generation units back to step 2. Sixteen 40-bit data register B. Profiling Sixteen 32-bit address registers Profile information of the reference code gives an  Instruction set precise estimate about the relative contribution of 16-bit instruction set, expandable to 32- and different modules to the total computational 48-bit instructionsSixteen 40-bit data register complexity. Profiling information helped to identify Sixteen 32-bit address registers critical portions of the code, which were optimized to Very high execution parallelism improve the performance. Codewarrior’s simulator Up to six instructions executed in a single configuration is used for profiling. clock cycle, statically scheduled Up to 4 data ALU instructions and 2 memory C. MCPS Calculation access/integer instructions per cycle MCPS (million cycles per second) is a  Data type support performance metric. It is the total no. of cycles taken Byte (8-bit), word (16-bit), and long (32-bit) to process all the frames It is obtained by multiplying data widths, supported by instructions and the no. of cycles required to execute one frame by no. memory moves. of frames per second as in (1). B. Software 𝐶 ×𝑓𝑠 C language was used for high level implementation. MCPS= (1) 𝑆×𝑁 The encoder and decoder parameters were dealt efficiently with the help of structures. SC3850 where, C is total no. of cycles for executing one assembly language was used for low level frame, fs is the sampling rate, S is the total no. of modifications. Firstly the encoder and decoder tester samples / frame and N is the total no. of frames . applications were developed using Visual Studio 2008 D. Optimization Techniques express edition by modification of reference code. We followed both target dependent and target Later the code is loaded to Code Warrior for SC3850 independent optimization techniques in this project. specific implementation. Target independent techniques are generalized techniques applied at high level optimization. Target IV. OPTIMIZATION The methodology for reforming a software dependent optimization involves techniques utilizing architectural features of SC3850. The compiler system in all aspects to make it work more efficiently specific optimization level was set at –o3[5] . utilizing fewer resources is known as optimization Some of the techniques followed are explained as Optimization can be done for speed and memory. follows[5] Here we concentrated on speed optimization for superior performance. The optimization tool used in this project was Code 1) Making use of intrinsics Warrior 10.2.9. It is latest version from the An intrinsic function is a way to instruct the compiler CodeWarrior family provides a graphical user to use a specific assembly language instruction interface for managing software development projects. [5].Many functions in the reference code were CodeWarrior 10.2.9 can be used to develop C, C++, replaced by intrinsic after studying its exact and assembly language code targeted at many functionality. The replacement was done with proper care to maintain bit-exactness of the original ITU-T processors. implementation.. The StarCore C/C++ compiler A. Optimization strategy substitutes the intrinsic function call with a set of The strategy we followed in the optimization is as designated assembly language instructions. Intrinsics follows improves performance of generated code and exploits 1. Initial MCPS of the code is calculated by running the architectural features of SC3850 that can’t be done the project with worst-case and all the optimization by normal C language. parameters enabled. 2. Profile all the functions in the implementation. Identify the critical functions that takes the more 1273 | P a g e
  • 4. Reshmi S, Sreenesh Shasidharan / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue6, November- December 2012, pp.1271-1275 2) Function inlining reduced memory moves with high degree of Function inlining is the process of substituting the parallelism. called function code in the place of functional call 7) Making use of overflow flag thereby eliminating functional call overhead .Inlining SC3850 Exception and mode register (EMR) has an can be done by instructing the StarCore compiler overflow flag called DOVF which is set if saturation using special #pragma directives. This technique results due to some of the arithmetic instructions. improves execution time at the expense of larger code Overflow check can be done efficiently by monitoring size. Small functions that are frequently-called are the this flag and this can be directly accessed by making best candidates for inlining in the C code. The criteria use of assembly routines. for the selecting the best candidate depends upon the number and type of parameters the function passed, V. VERIFICATION AND TESTING the data type of the returned value and register At each stage of optimization we verified the allocation in the resulting assembly code. correctness of our implementation using the test vectors included as a part of G.729.1 3) Fast 32-bit DPF format operations recommendation. Each of these test vectors includes The ITU-T G.729EV reference code was originally an input and output file. For each input file, the output designed for 16-bit processors which do not support file of an implementation should match the prescribed 32-bit operations. Thus the 32-bit operations were output for the specified input in a bit exact manner. By done using a 32-bit double precision format in which doing this testing phase we became assured that our data is represented with a precision of 1 in 2^31.Some implementation was fully compliant with pseudo functions which originally took the 32-bit parameters codes provided by CCITT. in two higher and lower 16-bit half words and perform 16-bit operations on them were replaced with VI. RESULTS AND CONCLUSION single 32-bit DPF parameter. Instead of using two In this paper we presented fixed-point separate DALU registers for two 16-bit half words optimized implementation of G.729.1 codec (annex J) now only a single DALU register for a 32-bit word is on SC3850 DSP core. First the details of used. Thus, functions that originally received two implementation of this algorithm in C language were pointers to two 16-bit arrays could now operate with discussed. Next using our C written codes, pointer to a single 32-bit array. Modifications are optimization was done for real-time implementation. implemented in DPF format to maintain bit-exactness We suggested a strategy for optimization process of the original ITU-T implementation. This which can be used in other similar researches. Using optimization makes use of the processors 32-bit the proposed strategy, a set of different techniques in capabilities and got 50% reduction in MCPS and order to reduce consuming cycles of G.729.1 memory moves since the parameters passed to algorithm were introduced. These techniques included function reduced by half. some methods based on SC3850 architecture and 4) Loop unrolling some methods specifically for optimization of G.729.1 Looping overhead in software which increases the algorithm. . The codec can be easily used for real-time cycle count can be reduced by loop unrolling. applications.The speed of operation of codec is Unrolling the loop N times can be done by improved by 90% and it’s a great success. duplicates the loop body N times and decreasing the loop counter by a factor of N.Since the architecture of Table.1 shows the MCPS consumption of the codec SC3850 contains 4 ALU’s unrolling the bigger loops before and after optimization for a sampling rate of 16 (higher loop count) by a factor of N=4 gave better KHz . performance. This significantly reduces the MCPS, since it maximizes the efficient usage of multiple After % ALU execution units simultaneously and multiple Initial Final G.729.1 adding improv register moves. Unrolling the inner loops reduces the MCPS MCPS Codec Intrinsic ement function looping overhead significantly, as loop (MCPS) counter needs to be updated less often and fewer branches are executed. Encoder 400 245 28 93 5) Loop merging Combining two or more loops of same loop count into a single loop loads the ALU more efficiently. Loop decoder 240 215 22 90 fusion is a process of combining multiple loops, which reuse the same data, into one loop. The advantage of this is that data can be reused, reducing the memory access, and loop overhead is reduced. 6) Use of SIMD2 instructions This optimization will make efficient use of MAC registers and enables packet data movement results in 1274 | P a g e
  • 5. Reshmi S, Sreenesh Shasidharan / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue6, November- December 2012, pp.1271-1275 ACKNOWLEDGMENT I gratefully acknowledge the contributions of Federal Institute of science and technology, ankamaly for the support and assistance for the successful completion of this project . REFERENCES [1] Wai c Chu Speech Coding Algorithms Foundation and Evolution of Standardized Coders- Mobile Media Laboratory DoCoMo USA Labs San Jose, California. [2] ITU-T Rec. G.729.1, “G.729 based Embedded Variable bit-rate coder - An 8-32kbits/s scalable wideband coder bitstream interoperable with G.729” May 2006. [3] ITU-T Rec. G.729, “Coding of Speech at 8 kbit/s using Conjugate Structure Algebraic Code Excited Linear Prediction (CSACELP),”March 1996 [4] MSC8156 Reference Manual Six Core Digital Signal Processor MSC8156RM Rev 2, June 2011 [5] C Code Optimization Examples for the StarCore SC3850 Core, Rev 0, April 2010 1275 | P a g e