1. A TOOL TO CONVERT TEXT TO
SPEECH WITH EMOTIONS
EmoSpeak
Submitted to: Ms. Shikha Jain
Submitted by:
Akriti Saini (10503902)
Stuti Shukla (10503870)
2. What is NLP?
Natural language processing (NLP) is a field
of computer science, artificial intelligence,
and linguistics concerned with the interactions
between computers and human (natural) languages. As
such, NLP is related to the area of human–computer
interaction. Many challenges in NLP involve natural
language understanding, that is, enabling computers to
derive meaning from human or natural language input,
and others involve natural language generation.
The area of NLP we are concerned with is:
Text-to-Speech with emotions.
3. What is Text-to-Speech?
A text-to-speech (TTS) system converts normal
language text into speech.
The quality of a speech synthesizer is judged by its
similarity to the human voice and by its ability to
be understood clearly. An intelligible text-to-
speech program allows people with visual
impairments or reading disabilities to listen to
written works on a home computer.
4. Our Tool: EmoSpeak
EmoSpeak converts text to speech in such a way
that it takes into account all the emotions of the
text and incorporates all the extracted emotions
into speech.
The tool first identifies the various emotions in the
raw text and then modifies certain characteristics
of the voice in order to modulate it, and then
expresses the various emotions.
5. The tool is composed of two parts: a front-end and
a back-end. The front-end is responsible for text
normalization, pre-processing, or tokenization.
The back-end—often referred to as
the synthesizer—then converts the symbolic
linguistic representation into sound.
6. Voice Modulation
One of the goals of text-to-speech(TTS) systems is to
produce natural-sounding synthesized speech.
Towards this end various natural language
processing (NLP) tasks are performed to model the
prosodic aspects of the TTS.
One of the fundamental NLP task being used is the
part-of-speech (POS) tagging of the words in the
text.
7. The voice modulation aspect of the project. i.e.
changing certain characteristics of the voice based on
a particular emotion has various characteristics of
the voice that could be changed such as f0
frequency, f0 contour, f0 range, jitter, nasal
duration etc.
These characteristics are changed according to the
emotion, which is set by the user.
8. Implementation
For implementation purpose, the first task is to take a
pdf file as an input and convert it to the corresponding
text file.
The text is then tokenized and decision regarding the
class (emotional or neutral) to which it belongs is taken.
Upon deciding that the text belongs to the emotional
class, it is then required to identify the emotional
subcategory to which the text belongs- suppose ‘happy’
The above classification can be done by using WordNet
and WordNet-Affect. Now depending on emotions, the
voice can be accordingly modulated by varying the
intensity, time of pause between the words, pitch of the
voice.
10. Integrated Literature Survey
By exploring various research papers we infer that
there are various approaches available which can be
followed to implement our application. Our first task
should be to decide upon whether the text falls in
emotional or non-emotional (neutral) class.
The important thing that we came to know was that,
using WordNet and WordNet affect was the best way
in order to identify the emotions in a particular text,
because it had the maximum precision among all the
other procedures, like LSA
11. From the literature survey we also conclude that
there are various text-to-speech engines available
and our foremost task would be to choose an
appropriate engine according to the requirements.
We came across the researches in which emotional
text-to-speech engine has been implemented for the
Italian and Arabic languages.
12. Application and Significance of
the project
It can be used to inculcate the habits of reading books in
the children, as from human psychology it can be
inferred that the particular task when done or performed
beyond a certain limit, develops a liking for that
particular task. So by listening to various type of books
children will develop a habit of reading books.
It can also be used to supplement children’s reading
classes. A child learns easily especially when things are
pointed to him. They can listen to a voice reading the
contents of the book as they follow with their eyes. It can
be used as a tutor replacing the need of teacher to guide
children
13. By implementing expressive child-directed
storytelling in a text-to-speech application, it can be
useful in therapeutic education of children with
communication disorders. This can be done by
helping them to learn how to express their feeling
and try to communicate.
It can help visually impaired or the people with
certain reading disabilities to get the feel of reading a
book.