SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
Vivek Vijay Nar, Alice N. Cheeran, Souvik Banerjee / International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.461-465
462 | P a g e
Verification of TD-PSOLA for Implementing Voice Modification
Vivek Vijay Nar, Alice N. Cheeran, Souvik Banerjee
(Electrical Department, VJTI, Mumbai, India)
(Electrical Department, VJTI, Mumbai, India)
(Electrical Department, VJTI, Mumbai, India)
ABSTRACT
Voice modification is conversion of a model input
voice signal into desired output signal. This can
be achieved by modifying basic parameters of
voice viz. vocal tract, pitch and time scale. Four
models which are used for this purpose are
discussed here namely LPC model, H/S model,
TD-PSOLA model and MBR-PSOLA model. In
this paper, TD-PSOLA technique is studied and
implemented. Results of this implementation are
derived and reviewed. The voice modification
system thus developed can contribute greatly to
the Medical and Entertainment Industry where
desire voice modification is required.
Keywords – Vocal tract, formants, pitch, voice.
I. INTRODUCTION
It is noteworthy fact that for people the voice is not
only just a tool for communication, but also an
identifying feature that allows expression of
personality. Researchers observed changes in
formant tracks and pitch contours, during clinical
diagnosis of certain diseases. Diseases like
Laryngeal cancer can have a drastic impact on the
speech leading to disturbances of voice. Rehan A.
Kazi concludes that the formant frequencies of
laryngectomy patients are higher than the formant
frequencies of normal subjects [1]. Gregory K.
Sewall claims that patients with Parkinson‟s-related
dysphonia usually have reduced pitch ranges and
increased vocal tremor [2]. In such cases a need
arises for voice modification. In voice modification
a model input voice signal is modified by varying
vocal tract, pitch and time scale. Men have a larger
vocal tract, which essentially gives the resultant
voice a lower-sounding timbre. Whereas Female
have a smaller vocal tract, giving the resultant voice
a higher-sounding timbre. Adult male voices are
usually lower-pitched and adult female voices are
higher-pitched. Several works have been done for
achieving voice modification and many methods are
available for the same. Some of these are The
classical AutoRegressive (LPC) [3], The hybrid
Harmonic/Stochastic (H/S) [4],[5], Time-Domain
Pitch-Synchronous Overlap-Add (TD-PSOLA) [6]
and Multi-Band Re-synthesis Pitch-Synchronous
Overlap-Add (MBR-PSOLA) [7]. These methods
can process stored data as well as data in real time.
E.g. in karaoke, pitch shifting is used to tune the
user's singing so that it reaches the original melody
in real-time [8]. In music composition, composers
process music using pitch shifting [9]. Also it is
used in applications like public speech system,
entertainment Industry where specific voice style,
tone, expressions are required.
Voice modification process using TD-PSOLA is
explained in subsequent sections. Section 2
describes speech signal fundamentals and
importance parameters of voice. The algorithms
used for modification are compared in section 3.
Section 4 gives the details of TD-PSOLA method.
Implementation of method and results are
interpreted in section 5 and 6. Section 7 summarizes
and throws light on future scope.
II. SPEECH SIGNAL FUNDAMENTALS
The vocal cords periodically vibrate to generate
glottal flow. Human pronounces a vowel or a voiced
consonant which is composed of glottal pulses.
Pitch period is the period of a glottal pulse.
Fundamental frequency is reciprocal of the pitch
period. The vocal tract acts as a time-varying filter
to the glottal flow. The characteristics of the vocal
tract include the frequency response, which depends
on the position of organs, such as the pharynx and
tongue. The peak frequencies in the frequency
response of the vocal tract are formants, also known
as formant frequencies. A speech signal is a
convolution of a time-varying stimulus (Glottal
Flow) and a time-varying filter (Vocal Tract) as
shown in Fig.1.
Time-varying
Stimulus
(Glottal Flow) (Speech Signal)
Fig.1. Modeling of speech signal
II. COMPARISON OF ALGORITHMS
The four models covered are:
1. The linear-prediction voice model is best
classified as a parametric, spectral, source-filter
model, in which the short-time spectrum is
decomposed into a flat excitation spectrum
multiplied by a smooth spectral envelope capturing
primarily vocal formants [3].
2. The hybrid Harmonic/Stochastic (H/S) model
proposed in [4] and [5]. In this speech signals
expresses as the summation of slowly varying
harmonic and stochastic components. This is based
on Griffin's analysis algorithm, uses the OLA
synthesis approach, the prosody matching and
segment concatenation.
Time-varying Filter
(Vocal Tract)
Vivek Vijay Nar, Alice N. Cheeran, Souvik Banerjee / International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.461-465
463 | P a g e
3. TD-PSOLA algorithms rely on a pitch-
synchronous overlap-add approach for modifying
the speech prosody and concatenating speech
waveforms [6]. The modifications of the speech
signal are performed in the time domain depending
on the length of the window used in the synthesis
process. The time domain approach provides very
efficient solutions for the real time implementation
of synthesis systems.
4. The Multi-Band Re-synthesis Pitch-Synchronous
Overlap-Add (MBR-PSOLA) model is based on re-
synthesis of the segments database of an original
and efficient hybrid H/S [7]. It supports spectral
interpolation between voiced parts of segments, with
virtually no increase in complexity.
Result of Comparison
TD-PSOLA virtually exhibits no segments
concatenation capabilities. LPC is slightly superior
in this. LPC, hybrid H/S and MBR-PSOLA are
superior to TD-PSOLA for automatic analysis
procedures. An efficient segment database
compression algorithm is ensured for LPC and
hybrid H/S synthesizers. It is currently being
developed for MBR-PSOLA. Fluidity is better of
MBR-PSOLA and hybrid H/S due to superior
concatenation capabilities. Prosody matching gives
comparable results with all four models. TD-
PSOLA is computationally simple compared to
complexity of their respective synthesizers. It is
much more intelligible than other synthesizers. It is
much less sensitive to analysis V/UV errors than
MBR-PSOLA. It is perceived as equally natural
because it does not make use of any speech model.
As seen from above, TD-PSOLA technique is better
than other.
III. TD-PSOLA (TIME DOMAIN PITCH
SYNCHRONOUS OVERLAP-ADD)
This approach is based on the decomposition of the
signal into overlapping frames synchronized with
the pitch period. After modifications of the speech
signal, consistency and accuracy of the pitch marks
must be preserved [10]. For this pitch detection is
performed to generate pitch marks through
overlapping windowed speech.
Input signal 𝑠 𝑛 and 𝑠𝑎 𝑛 centered at 𝑡 𝑎 time is
defined as:
𝑠𝑎 𝑛 = 𝑠 𝑡 𝑎+𝑛
Where, 𝑡 𝑎 is an analysis marks.
A short-time version of 𝑠𝑎 𝑛 by multiplying it by a
window 𝑤𝑎 𝑛 is defined as 𝑧 𝑎 𝑛 :
𝑧 𝑎 𝑛 = 𝑤𝑎 𝑛 × 𝑠𝑎 𝑛
The window length is two times of the local pitch
period. To synthesize speech at different pitch
periods, the Short Time signals (ST) are simply
overlapped and added with desired spacing. The
synthesized speech is defined as:
𝑧 𝑛 = 𝑧 𝑛 [𝑛 − 𝑡 𝑎 ]
∞
𝑎=−∞
A good choice for the time marks 𝑡 𝑎 is to coincide
with the instants of closing of the vocal folds which
indicates the periodicity of speech. For unvoiced
speech, these marks could be arbitrarily placed. This
estimation from speech waveforms is a very difficult
problem.
In speech analysis, a sequence of pitch-marks is
provided after filtering the speech signal.
Voiced/unvoiced decision is based on the zero-
crossing and the short time energy for each segment
between two consecutive pitch marks. A coefficient
of voicement (v/uv) can be computed in order to
quantize the periodicity of the signal [11]. To select
pitch marks among local extreme of the speech
signal, a set of mark candidates given with all
negative and positive peaks. The OLA synthesis is
based on the superposition-addition of elementary
signals in the new positions. These positions are
determined by the height and the length of the
synthesis signal. To increase the pitch, the
individual pitch-synchronous frames are extracted
and given to Hanning window. Then output frame
moved close together and added up, whereas output
frame moved further apart to decrease the pitch.
Increasing the pitch will result in a shorter signal, so
to keep constant duration duplicate frames need to
be added. A fast re-sampling method is used to shift
the frame precisely, where it will appear in the new
signal using the pitch mark and the synthesis mark
of a given frame.
IV. IMPLEMENTATION
Following steps were implemented for achieving
voice modification:
1. A wave file of mono sound quality, rate of
11025 with 8 bits per sample is given as
input and amplitude vs. time graph was
plotted.
2. Fast Fourier Transform of input file was
taken and then magnitude (dB) vs.
frequency graph of the same was plotted.
3. Input value of pitch scale, time scale and
vocal tract were stored in a variable „a‟, „b‟,
„c‟ respectively which ranges from -1 to 1.
4. Using a polyphase filter implementation,
resampling of the input at (P/Q) times of
the original sampling rate is done.
Resample applies an anti-aliasing
(lowpass) FIR filter to input.
Where,
P = round (GValue*10); GValue = 2^c;
Q=10.
5. Approximate pitch contour was calculated
based on energy peaks for finding pitch
marks and Path was found out using
rridden function. By differentiating pitch
Vivek Vijay Nar, Alice N. Cheeran, Souvik Banerjee / International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.461-465
464 | P a g e
marks pitch period was found and first and
last pitch marks were removed.
6. Analysis segment center was found. Then
segment was stretched and new segment
was added and amount of overlap between
windows was defined.
7. Output signal so acquired, was plotted to
get graphs mentioned in first two steps.
8. Finally both input and output files were
played and result was concluded.
V. RESULTS
In this study, a voice quality modification algorithm
with TD-PSOLA modifier was implemented and
tested. Normal female voice was taken as the
reference and graphs plotted as shown in Fig.2 and
Fig.3.
Fig. 2 Time domain graph of input
Fig. 3 Frequency domain graph of input
Above signal is then modified in different voices by
varying parameters. After changing Vocal Tract
parameter from 0 to +1 and from 0 to -1, the change
in frequency domain graphs is shown in Fig. 4 and
5.
Fig. 4 FFT graph of VT= +1
Fig. 5 FFT graph of VT= -1
After changing Time Scale parameter from 0 to +1,
signal is expanded in time domain as shown in Fig.
6 and when change from 0 to -1 signal shrinks as
shown in Fig. 7.
Fig. 6 FFT graph of TS= +1
Vivek Vijay Nar, Alice N. Cheeran, Souvik Banerjee / International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.461-465
465 | P a g e
Fig. 7 FFT graph of TS= -1
After changing Pitch Scale parameter from 0 to +1
and from 0 to -1, the change in frequency domain
graphs is shown in Fig. 8 and 9.
Fig. 8 FFT graph of PS= +1
Fig. 9 FFT graph of PS= -1
VI. CONCLUSION AND FUTURE SCOPE
As seen from above results, since male have larger
Vocal Tract, when VT parameter increased to +1,
husky male voice is heard and decreasing it to -1
result into small child voice. Since female have
higher pitch, when PS parameter increased to +1,
nasal female voice is heard and decreasing it to -1
result into male voice. Same is summarized in Table
below.
Table 1 Result of voice modification
It is seen from above that vocal tract controls timbre
of the voice signal whereas pitch varies tone of the
same. Above results show that the algorithm can
effectively modify input voice into the desired voice
quality.
Results of the simulation verify that the quality of
the synthesized signal by TD-PSOLA technique
depends on the precision of the analysis marking
and the synthesis marking. In future, Algorithm with
better analysis and synthesis marking can be
implemented for better voice quality and voice
modification technique can be used for voice
conversion applications.
REFERENCES
[1] Kazi, Rehan A., Vyas M.N. Prasad, Jeeve
Kanagalingam, Christopher M. Nutting,
Peter Clarke, Peter Rhys-Evans, and Kevin
J. Harrington, “Assessment of the Formant
Frequencies in Normal and Laryngectomy
Individuals Using Linear Predictive
Coding”, Journal of Voice 21, no. 6:661-
668.
[2] Sewall, Gregory K. MD; Jack Jiang, MD,
PhD; and Charles N. Ford, MD, “Clinical
Evaluation of Parkinson‟s-Related
Dysphonia”, The Laryngoscope, 2006,
116:1740-1744.
[3] J.D. MARKEL, A.H. GRAY Jr, ”Linear
Prediction of Speech”, Springer Verlag,
New York, pp. 10-42, 1976.
[4] D.W. GRIFFIN, J.S. LIM, “Multi-Band
Excitation Vocoder”, IEEE Trans. on
ASSP, vol. ASSP-36, pp. 1223-1235, august
1988.
[5] A. J. ABRANTES, J. S. MARQUES, I. M.
TRANSCOSO, "Hybrid Sinusoidal
Modeling of Speech without Voicing
Decision", EUROSPEECH 91, pp. 231-
234.
[6] E.MOULINES, F. CHARPENTIER, "Pitch
Synchronous waveform Processing
techniques for Text-To-Speech Synthesis
using diphones", Speech Communication,
Vol. 9, n°5-6. 1989.
[7] T. DUTOIT, H. LEICH, "Improving the
TD-PSOLA Text-To-Speech Synthesizer
with a Specially Designed MBE Re-
Synthesis of the Segments Database", Proc.
-1 0 1
Vocal Tract Small Child Normal Female Husky Male
Pitch Scale Normal Male Normal Female Nasal Female
Time Scale Fast Normal Female Slow
Parameter Values
Parameter
Vivek Vijay Nar, Alice N. Cheeran, Souvik Banerjee / International Journal of Engineering
Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 3, May-Jun 2013, pp.461-465
466 | P a g e
EUSIPCO 92, 25-28 august 92, Brussels,
pp. 343-347.
[8] M. Ryynanen, T. Virtanen, J. Paulus, A.
Klapuri, “Accompaniment Separation and
Karaoke Application Based on Automatic
Melody Transcription”, IEEE International
Conference on Multimedia and Expo, 2008,
pp. 1147-1420.
[9] E. Gomez, G. Peterschmitt, X. Amatriain,
P. Herrera, “Content-Based Melodic
Transformations of Audio Material for a
Music Processing Application”, Proc of
6th International Conference on Digital
Audio Effects, London, UK, Sept 2003.
[10] Abdelkader Chabchoub and Adnan Cherif,
“Implementation of the Arabic Speech
Synthesis with TD-PSOLA Modifier”,
International Journal of Signal System
Control and Engineering Application,
2010, Volume: 3, Issue: 4, pp. 77-80.
[11] Cheveigne,A. and H. Ahara,”A
comparative evaluation of Fo estimation
algorithm”, Proceedings of the Euro
Speech Conference. (ESC’98), Norvege,
pp: 453-467.

Weitere ähnliche Inhalte

Was ist angesagt?

Modified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesizeModified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesizeIAEME Publication
 
High Quality Arabic Concatenative Speech Synthesis
High Quality Arabic Concatenative Speech SynthesisHigh Quality Arabic Concatenative Speech Synthesis
High Quality Arabic Concatenative Speech Synthesissipij
 
Implementation of Interleaving Methods on MELP 2.4 Coder to Reduce Packet Los...
Implementation of Interleaving Methods on MELP 2.4 Coder to Reduce Packet Los...Implementation of Interleaving Methods on MELP 2.4 Coder to Reduce Packet Los...
Implementation of Interleaving Methods on MELP 2.4 Coder to Reduce Packet Los...IJERA Editor
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...sipij
 
An Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresAn Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresSivaranjan Goswami
 
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...ijsrd.com
 
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...IOSR Journals
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]威華 王
 
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...IRJET Journal
 
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...CSCJournals
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Multistage Implementation of Narrowband LPF by Decimator in Multirate DSP App...
Multistage Implementation of Narrowband LPF by Decimator in Multirate DSP App...Multistage Implementation of Narrowband LPF by Decimator in Multirate DSP App...
Multistage Implementation of Narrowband LPF by Decimator in Multirate DSP App...iosrjce
 

Was ist angesagt? (18)

Modified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesizeModified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesize
 
F010334548
F010334548F010334548
F010334548
 
N010237982
N010237982N010237982
N010237982
 
High Quality Arabic Concatenative Speech Synthesis
High Quality Arabic Concatenative Speech SynthesisHigh Quality Arabic Concatenative Speech Synthesis
High Quality Arabic Concatenative Speech Synthesis
 
Implementation of Interleaving Methods on MELP 2.4 Coder to Reduce Packet Los...
Implementation of Interleaving Methods on MELP 2.4 Coder to Reduce Packet Los...Implementation of Interleaving Methods on MELP 2.4 Coder to Reduce Packet Los...
Implementation of Interleaving Methods on MELP 2.4 Coder to Reduce Packet Los...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
 
An Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresAn Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech features
 
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
Cancellation of Noise from Speech Signal using Voice Activity Detection Metho...
 
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
 
Int journal 01
Int journal 01Int journal 01
Int journal 01
 
129966864160453838[1]
129966864160453838[1]129966864160453838[1]
129966864160453838[1]
 
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
 
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
E010333137
E010333137E010333137
E010333137
 
Matlab: Speech Signal Analysis
Matlab: Speech Signal AnalysisMatlab: Speech Signal Analysis
Matlab: Speech Signal Analysis
 
Multistage Implementation of Narrowband LPF by Decimator in Multirate DSP App...
Multistage Implementation of Narrowband LPF by Decimator in Multirate DSP App...Multistage Implementation of Narrowband LPF by Decimator in Multirate DSP App...
Multistage Implementation of Narrowband LPF by Decimator in Multirate DSP App...
 

Andere mochten auch (20)

Am33223232
Am33223232Am33223232
Am33223232
 
Bd33327330
Bd33327330Bd33327330
Bd33327330
 
Bf33335340
Bf33335340Bf33335340
Bf33335340
 
Au33270273
Au33270273Au33270273
Au33270273
 
Ai33194200
Ai33194200Ai33194200
Ai33194200
 
Af33174179
Af33174179Af33174179
Af33174179
 
Av33274282
Av33274282Av33274282
Av33274282
 
Ch33509513
Ch33509513Ch33509513
Ch33509513
 
Ci33514517
Ci33514517Ci33514517
Ci33514517
 
Cn33543547
Cn33543547Cn33543547
Cn33543547
 
Z35149151
Z35149151Z35149151
Z35149151
 
Pe3426652670
Pe3426652670Pe3426652670
Pe3426652670
 
Q358895
Q358895Q358895
Q358895
 
Pd3426592664
Pd3426592664Pd3426592664
Pd3426592664
 
Ot3425662590
Ot3425662590Ot3425662590
Ot3425662590
 
Oq3425482554
Oq3425482554Oq3425482554
Oq3425482554
 
Oc3424632467
Oc3424632467Oc3424632467
Oc3424632467
 
Oa3424532456
Oa3424532456Oa3424532456
Oa3424532456
 
W35124128
W35124128W35124128
W35124128
 
GEC_Alsthom_Project Deployment
GEC_Alsthom_Project DeploymentGEC_Alsthom_Project Deployment
GEC_Alsthom_Project Deployment
 

Ähnlich wie Bz33462466

Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing IJECEIAES
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderIJTET Journal
 
Voice Morphing
Voice MorphingVoice Morphing
Voice MorphingSayyed Z
 
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...CSCJournals
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Reviewijiert bestjournal
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemeskevig
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESkevig
 
3. speech processing algorithms for perception improvement of hearing impaire...
3. speech processing algorithms for perception improvement of hearing impaire...3. speech processing algorithms for perception improvement of hearing impaire...
3. speech processing algorithms for perception improvement of hearing impaire...k srikanth
 
Iciiecs1461
Iciiecs1461Iciiecs1461
Iciiecs1461phyuhsan
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...CSCJournals
 
Line Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis FunctionLine Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis FunctionIDES Editor
 
Survey On Speech Synthesis
Survey On Speech SynthesisSurvey On Speech Synthesis
Survey On Speech SynthesisCSCJournals
 
(2016) Hennequin and Rigaud - Long-Term Reverberation Modeling for Under-Dete...
(2016) Hennequin and Rigaud - Long-Term Reverberation Modeling for Under-Dete...(2016) Hennequin and Rigaud - Long-Term Reverberation Modeling for Under-Dete...
(2016) Hennequin and Rigaud - Long-Term Reverberation Modeling for Under-Dete...François Rigaud
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Priyanka Reddy
 

Ähnlich wie Bz33462466 (20)

Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing
 
Lr3620092012
Lr3620092012Lr3620092012
Lr3620092012
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
Voice Morphing
Voice MorphingVoice Morphing
Voice Morphing
 
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Review
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
 
3. speech processing algorithms for perception improvement of hearing impaire...
3. speech processing algorithms for perception improvement of hearing impaire...3. speech processing algorithms for perception improvement of hearing impaire...
3. speech processing algorithms for perception improvement of hearing impaire...
 
Iciiecs1461
Iciiecs1461Iciiecs1461
Iciiecs1461
 
50120140501002
5012014050100250120140501002
50120140501002
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
 
Pitch and time scale modifications
Pitch and time scale modificationsPitch and time scale modifications
Pitch and time scale modifications
 
Line Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis FunctionLine Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis Function
 
H0814247
H0814247H0814247
H0814247
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
Survey On Speech Synthesis
Survey On Speech SynthesisSurvey On Speech Synthesis
Survey On Speech Synthesis
 
(2016) Hennequin and Rigaud - Long-Term Reverberation Modeling for Under-Dete...
(2016) Hennequin and Rigaud - Long-Term Reverberation Modeling for Under-Dete...(2016) Hennequin and Rigaud - Long-Term Reverberation Modeling for Under-Dete...
(2016) Hennequin and Rigaud - Long-Term Reverberation Modeling for Under-Dete...
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 

Kürzlich hochgeladen

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Kürzlich hochgeladen (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Bz33462466

  • 1. Vivek Vijay Nar, Alice N. Cheeran, Souvik Banerjee / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.461-465 462 | P a g e Verification of TD-PSOLA for Implementing Voice Modification Vivek Vijay Nar, Alice N. Cheeran, Souvik Banerjee (Electrical Department, VJTI, Mumbai, India) (Electrical Department, VJTI, Mumbai, India) (Electrical Department, VJTI, Mumbai, India) ABSTRACT Voice modification is conversion of a model input voice signal into desired output signal. This can be achieved by modifying basic parameters of voice viz. vocal tract, pitch and time scale. Four models which are used for this purpose are discussed here namely LPC model, H/S model, TD-PSOLA model and MBR-PSOLA model. In this paper, TD-PSOLA technique is studied and implemented. Results of this implementation are derived and reviewed. The voice modification system thus developed can contribute greatly to the Medical and Entertainment Industry where desire voice modification is required. Keywords – Vocal tract, formants, pitch, voice. I. INTRODUCTION It is noteworthy fact that for people the voice is not only just a tool for communication, but also an identifying feature that allows expression of personality. Researchers observed changes in formant tracks and pitch contours, during clinical diagnosis of certain diseases. Diseases like Laryngeal cancer can have a drastic impact on the speech leading to disturbances of voice. Rehan A. Kazi concludes that the formant frequencies of laryngectomy patients are higher than the formant frequencies of normal subjects [1]. Gregory K. Sewall claims that patients with Parkinson‟s-related dysphonia usually have reduced pitch ranges and increased vocal tremor [2]. In such cases a need arises for voice modification. In voice modification a model input voice signal is modified by varying vocal tract, pitch and time scale. Men have a larger vocal tract, which essentially gives the resultant voice a lower-sounding timbre. Whereas Female have a smaller vocal tract, giving the resultant voice a higher-sounding timbre. Adult male voices are usually lower-pitched and adult female voices are higher-pitched. Several works have been done for achieving voice modification and many methods are available for the same. Some of these are The classical AutoRegressive (LPC) [3], The hybrid Harmonic/Stochastic (H/S) [4],[5], Time-Domain Pitch-Synchronous Overlap-Add (TD-PSOLA) [6] and Multi-Band Re-synthesis Pitch-Synchronous Overlap-Add (MBR-PSOLA) [7]. These methods can process stored data as well as data in real time. E.g. in karaoke, pitch shifting is used to tune the user's singing so that it reaches the original melody in real-time [8]. In music composition, composers process music using pitch shifting [9]. Also it is used in applications like public speech system, entertainment Industry where specific voice style, tone, expressions are required. Voice modification process using TD-PSOLA is explained in subsequent sections. Section 2 describes speech signal fundamentals and importance parameters of voice. The algorithms used for modification are compared in section 3. Section 4 gives the details of TD-PSOLA method. Implementation of method and results are interpreted in section 5 and 6. Section 7 summarizes and throws light on future scope. II. SPEECH SIGNAL FUNDAMENTALS The vocal cords periodically vibrate to generate glottal flow. Human pronounces a vowel or a voiced consonant which is composed of glottal pulses. Pitch period is the period of a glottal pulse. Fundamental frequency is reciprocal of the pitch period. The vocal tract acts as a time-varying filter to the glottal flow. The characteristics of the vocal tract include the frequency response, which depends on the position of organs, such as the pharynx and tongue. The peak frequencies in the frequency response of the vocal tract are formants, also known as formant frequencies. A speech signal is a convolution of a time-varying stimulus (Glottal Flow) and a time-varying filter (Vocal Tract) as shown in Fig.1. Time-varying Stimulus (Glottal Flow) (Speech Signal) Fig.1. Modeling of speech signal II. COMPARISON OF ALGORITHMS The four models covered are: 1. The linear-prediction voice model is best classified as a parametric, spectral, source-filter model, in which the short-time spectrum is decomposed into a flat excitation spectrum multiplied by a smooth spectral envelope capturing primarily vocal formants [3]. 2. The hybrid Harmonic/Stochastic (H/S) model proposed in [4] and [5]. In this speech signals expresses as the summation of slowly varying harmonic and stochastic components. This is based on Griffin's analysis algorithm, uses the OLA synthesis approach, the prosody matching and segment concatenation. Time-varying Filter (Vocal Tract)
  • 2. Vivek Vijay Nar, Alice N. Cheeran, Souvik Banerjee / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.461-465 463 | P a g e 3. TD-PSOLA algorithms rely on a pitch- synchronous overlap-add approach for modifying the speech prosody and concatenating speech waveforms [6]. The modifications of the speech signal are performed in the time domain depending on the length of the window used in the synthesis process. The time domain approach provides very efficient solutions for the real time implementation of synthesis systems. 4. The Multi-Band Re-synthesis Pitch-Synchronous Overlap-Add (MBR-PSOLA) model is based on re- synthesis of the segments database of an original and efficient hybrid H/S [7]. It supports spectral interpolation between voiced parts of segments, with virtually no increase in complexity. Result of Comparison TD-PSOLA virtually exhibits no segments concatenation capabilities. LPC is slightly superior in this. LPC, hybrid H/S and MBR-PSOLA are superior to TD-PSOLA for automatic analysis procedures. An efficient segment database compression algorithm is ensured for LPC and hybrid H/S synthesizers. It is currently being developed for MBR-PSOLA. Fluidity is better of MBR-PSOLA and hybrid H/S due to superior concatenation capabilities. Prosody matching gives comparable results with all four models. TD- PSOLA is computationally simple compared to complexity of their respective synthesizers. It is much more intelligible than other synthesizers. It is much less sensitive to analysis V/UV errors than MBR-PSOLA. It is perceived as equally natural because it does not make use of any speech model. As seen from above, TD-PSOLA technique is better than other. III. TD-PSOLA (TIME DOMAIN PITCH SYNCHRONOUS OVERLAP-ADD) This approach is based on the decomposition of the signal into overlapping frames synchronized with the pitch period. After modifications of the speech signal, consistency and accuracy of the pitch marks must be preserved [10]. For this pitch detection is performed to generate pitch marks through overlapping windowed speech. Input signal 𝑠 𝑛 and 𝑠𝑎 𝑛 centered at 𝑡 𝑎 time is defined as: 𝑠𝑎 𝑛 = 𝑠 𝑡 𝑎+𝑛 Where, 𝑡 𝑎 is an analysis marks. A short-time version of 𝑠𝑎 𝑛 by multiplying it by a window 𝑤𝑎 𝑛 is defined as 𝑧 𝑎 𝑛 : 𝑧 𝑎 𝑛 = 𝑤𝑎 𝑛 × 𝑠𝑎 𝑛 The window length is two times of the local pitch period. To synthesize speech at different pitch periods, the Short Time signals (ST) are simply overlapped and added with desired spacing. The synthesized speech is defined as: 𝑧 𝑛 = 𝑧 𝑛 [𝑛 − 𝑡 𝑎 ] ∞ 𝑎=−∞ A good choice for the time marks 𝑡 𝑎 is to coincide with the instants of closing of the vocal folds which indicates the periodicity of speech. For unvoiced speech, these marks could be arbitrarily placed. This estimation from speech waveforms is a very difficult problem. In speech analysis, a sequence of pitch-marks is provided after filtering the speech signal. Voiced/unvoiced decision is based on the zero- crossing and the short time energy for each segment between two consecutive pitch marks. A coefficient of voicement (v/uv) can be computed in order to quantize the periodicity of the signal [11]. To select pitch marks among local extreme of the speech signal, a set of mark candidates given with all negative and positive peaks. The OLA synthesis is based on the superposition-addition of elementary signals in the new positions. These positions are determined by the height and the length of the synthesis signal. To increase the pitch, the individual pitch-synchronous frames are extracted and given to Hanning window. Then output frame moved close together and added up, whereas output frame moved further apart to decrease the pitch. Increasing the pitch will result in a shorter signal, so to keep constant duration duplicate frames need to be added. A fast re-sampling method is used to shift the frame precisely, where it will appear in the new signal using the pitch mark and the synthesis mark of a given frame. IV. IMPLEMENTATION Following steps were implemented for achieving voice modification: 1. A wave file of mono sound quality, rate of 11025 with 8 bits per sample is given as input and amplitude vs. time graph was plotted. 2. Fast Fourier Transform of input file was taken and then magnitude (dB) vs. frequency graph of the same was plotted. 3. Input value of pitch scale, time scale and vocal tract were stored in a variable „a‟, „b‟, „c‟ respectively which ranges from -1 to 1. 4. Using a polyphase filter implementation, resampling of the input at (P/Q) times of the original sampling rate is done. Resample applies an anti-aliasing (lowpass) FIR filter to input. Where, P = round (GValue*10); GValue = 2^c; Q=10. 5. Approximate pitch contour was calculated based on energy peaks for finding pitch marks and Path was found out using rridden function. By differentiating pitch
  • 3. Vivek Vijay Nar, Alice N. Cheeran, Souvik Banerjee / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.461-465 464 | P a g e marks pitch period was found and first and last pitch marks were removed. 6. Analysis segment center was found. Then segment was stretched and new segment was added and amount of overlap between windows was defined. 7. Output signal so acquired, was plotted to get graphs mentioned in first two steps. 8. Finally both input and output files were played and result was concluded. V. RESULTS In this study, a voice quality modification algorithm with TD-PSOLA modifier was implemented and tested. Normal female voice was taken as the reference and graphs plotted as shown in Fig.2 and Fig.3. Fig. 2 Time domain graph of input Fig. 3 Frequency domain graph of input Above signal is then modified in different voices by varying parameters. After changing Vocal Tract parameter from 0 to +1 and from 0 to -1, the change in frequency domain graphs is shown in Fig. 4 and 5. Fig. 4 FFT graph of VT= +1 Fig. 5 FFT graph of VT= -1 After changing Time Scale parameter from 0 to +1, signal is expanded in time domain as shown in Fig. 6 and when change from 0 to -1 signal shrinks as shown in Fig. 7. Fig. 6 FFT graph of TS= +1
  • 4. Vivek Vijay Nar, Alice N. Cheeran, Souvik Banerjee / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.461-465 465 | P a g e Fig. 7 FFT graph of TS= -1 After changing Pitch Scale parameter from 0 to +1 and from 0 to -1, the change in frequency domain graphs is shown in Fig. 8 and 9. Fig. 8 FFT graph of PS= +1 Fig. 9 FFT graph of PS= -1 VI. CONCLUSION AND FUTURE SCOPE As seen from above results, since male have larger Vocal Tract, when VT parameter increased to +1, husky male voice is heard and decreasing it to -1 result into small child voice. Since female have higher pitch, when PS parameter increased to +1, nasal female voice is heard and decreasing it to -1 result into male voice. Same is summarized in Table below. Table 1 Result of voice modification It is seen from above that vocal tract controls timbre of the voice signal whereas pitch varies tone of the same. Above results show that the algorithm can effectively modify input voice into the desired voice quality. Results of the simulation verify that the quality of the synthesized signal by TD-PSOLA technique depends on the precision of the analysis marking and the synthesis marking. In future, Algorithm with better analysis and synthesis marking can be implemented for better voice quality and voice modification technique can be used for voice conversion applications. REFERENCES [1] Kazi, Rehan A., Vyas M.N. Prasad, Jeeve Kanagalingam, Christopher M. Nutting, Peter Clarke, Peter Rhys-Evans, and Kevin J. Harrington, “Assessment of the Formant Frequencies in Normal and Laryngectomy Individuals Using Linear Predictive Coding”, Journal of Voice 21, no. 6:661- 668. [2] Sewall, Gregory K. MD; Jack Jiang, MD, PhD; and Charles N. Ford, MD, “Clinical Evaluation of Parkinson‟s-Related Dysphonia”, The Laryngoscope, 2006, 116:1740-1744. [3] J.D. MARKEL, A.H. GRAY Jr, ”Linear Prediction of Speech”, Springer Verlag, New York, pp. 10-42, 1976. [4] D.W. GRIFFIN, J.S. LIM, “Multi-Band Excitation Vocoder”, IEEE Trans. on ASSP, vol. ASSP-36, pp. 1223-1235, august 1988. [5] A. J. ABRANTES, J. S. MARQUES, I. M. TRANSCOSO, "Hybrid Sinusoidal Modeling of Speech without Voicing Decision", EUROSPEECH 91, pp. 231- 234. [6] E.MOULINES, F. CHARPENTIER, "Pitch Synchronous waveform Processing techniques for Text-To-Speech Synthesis using diphones", Speech Communication, Vol. 9, n°5-6. 1989. [7] T. DUTOIT, H. LEICH, "Improving the TD-PSOLA Text-To-Speech Synthesizer with a Specially Designed MBE Re- Synthesis of the Segments Database", Proc. -1 0 1 Vocal Tract Small Child Normal Female Husky Male Pitch Scale Normal Male Normal Female Nasal Female Time Scale Fast Normal Female Slow Parameter Values Parameter
  • 5. Vivek Vijay Nar, Alice N. Cheeran, Souvik Banerjee / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 3, May-Jun 2013, pp.461-465 466 | P a g e EUSIPCO 92, 25-28 august 92, Brussels, pp. 343-347. [8] M. Ryynanen, T. Virtanen, J. Paulus, A. Klapuri, “Accompaniment Separation and Karaoke Application Based on Automatic Melody Transcription”, IEEE International Conference on Multimedia and Expo, 2008, pp. 1147-1420. [9] E. Gomez, G. Peterschmitt, X. Amatriain, P. Herrera, “Content-Based Melodic Transformations of Audio Material for a Music Processing Application”, Proc of 6th International Conference on Digital Audio Effects, London, UK, Sept 2003. [10] Abdelkader Chabchoub and Adnan Cherif, “Implementation of the Arabic Speech Synthesis with TD-PSOLA Modifier”, International Journal of Signal System Control and Engineering Application, 2010, Volume: 3, Issue: 4, pp. 77-80. [11] Cheveigne,A. and H. Ahara,”A comparative evaluation of Fo estimation algorithm”, Proceedings of the Euro Speech Conference. (ESC’98), Norvege, pp: 453-467.