voice browser

Voice Browser
Sipna College of Engineering and Technology Page 1
INTRODUCTION
A voice browser is a device which interprets a (voice) markup language and is capable
of generating voice output and/or interpreting voice input and possibly other input/output
modalities . “ The definition of a voice browser, above is a broad one. The fact that the system
deals with speech is obvious given the firstword of the name , but what makes a software system
that interacts with the user via speed a “browser”. The information that the system uses (for
either domain data or dialog flow) is dynamic and comes somewhere from the Internet from an
end-user’s perspective, the moto is to provide a service similar to what graphical browsers of
HTML and related technologies do today, but on devices that are note equipped with full-
browsers or even the screens to support them.
A voice browser can simply be defined as an appliance or a gear which helps in
interpreting a markup language (the markup language referred here is 'voice') and producing a
voice output. It translates a given voice input into a voice output. It is the web browser which
provide the users with an interactive voice user interface. It is obvious from the first word of the
name that the system deals with pages that specify voice dialogues, just as our visual web pages
deals with HTML pages. But the question remains-how does a software system reciprocates to
the user via speech or voice browser? The software system procures its information from the
internet. From a user's outlook, the goal is to provide to the devices which do not have full-
browsers or even the screens to support them, a service which is similar to what the visual web
browsers and the related technologies offer today.[2]
Speech recognition technology is one from the fast growing engineering technologies.
Nearly 20% people of the world are suffering from various disabilities; many of them are blind
or unable to use their hands effectively. They can share information with people by operating
computer through voice input. Voice Browser is capable to recognize the speech and convert the
input audio into text; it also enables a user to perform operations such as open calculator,
WordPad, notepad, log off computer.

Voice Browser
LITERATURE REVIEW
HTML is designed to be a mark-up language. Many of the structures in a document, such
as hyperlinks, headings, tables and lists, are represented explicitly in the HTML file for the
document by ‘tags’. It is the task of a web-browsing program to interpret the tags, to format the
content and to present the information to the user visually. There are several possibilities to re-
represent the content through the audio channels. One possible approach is to purposely design
an audio document for the relevant web page. It may involve the author making an explicit
recording of the document or parts of the document. Though this seems like the best strategy to
ensure the author’s intent is accurately rendered, it means that authors must create two
documents for everything they write, which is obviously impractical. A similar approach is the
development of a mark-up language for use with voice browsing applications. This is the long-
term solution offered by the W3C group. All the web documents are expected to be marked up
according to a VoiceXML specification (W3C, 2000), and that browsing products need then only
read and interpret these voice-specific tags to produce an audio version of the document.
However this requires not only a global acceptance of the specification, but that all authors then
use this specification when designing their HTML documents. Otherwise, only certain web pages
will be ‘viewable’ using compliant voice browsing applications.[5]
Standardization to voice browsing technique were given by:
The World Wide Web Consortium (W3C) develops interoperable technologies
(specifications, guidelines, software, and tools) to lead the Web to its full potential as a forum for
information, commerce, communication, and collectiveunder standing. W3C which includes:
1 .Voice Browser Working Group
2. Speech Interface Framework
1] Voice Browser Working Group It was established on 26 March 1999 and re-chartered through
31 January 2009. W3C voice browser working group made the speech interface framework
possible . This framework allows developers to create speech enabled applications that are based
on Web technologies.

Voice Browser
The framework also provides developer with an environment that will be familiar to those. The
Aim of the W3C Working Group is to enable users to speak and listen to Web applications by
making standard languages for developing Web-based speech applications. This Working Group
concentrates on languages for capturing and producing speech and managing the conversation
between user and computer system, while a related Group, the Multimodal Interaction Working
Group, works on additional input modes including keyboard and mouse, ink and pen, etc. Its
recommendations have been reviewed by w3c group Members, by software developers, and
other interested parties, and are also endorsed by the Director as Web Standards.
2]Speech Interface Framework:
These framework includes: Voice XML: a language for creating audio dialogs that
feature synthesized speech, digitized audio, recognition of spoken and DTMF key input,
recording of spoken input, who are familiar with Web development techniques. So, applications
are written using parts of speech interface framework. Thus speech applications are written in
VoiceXML and are rendered through a Voice Browser. In much the same way as Web
applications are written in html and run on a Web browser.
As per estimation, over 85% of Interactive Voice Response (IVR) applications for
telephones (including mobile) use W3C's Voice XML standard. Voice Browser Working Group
are coordinating their efforts to make the Web available on more devices and in more situations.
telephony, and mixed initiative conversations.[1]
Some of its versions are:
• VoiceXML 1.0: designed for creating audio dialogs.
• VoiceXML 2.0: uses form interpretation algorithm(FIA).
• VoiceXML 2.1: 8 additional elements in FIA.
• Voice XML 3.0: relationship between semantics and syntax.

Voice Browser
WORKING
Voice-based web to make information accessible to users who may not be able to
read or write, or who do not have access to the Internet. Users can access the voice-based web
using a toll-free number, through a variety of ways including a voice recognition system or a
tone phone. Unlike a computer interface, a voice interface needs no keyboard, no mouse, no
screen, freeing users from these barriers to access and action. It requires no training. It is
accessible to anyone with a telephone. Voice is mobile—information can be sent and retrieved
from anywhere. Since customers can have access at anytime from anywhere, voice makes it
possible to use time more effectively. Fast and efficient, voice frees users from not only the
desktop, but even the laptop.
The user gives the request through the voice or text using phone ,personal computer or
Touch tone. The request goes to the voice browser. If the request is voice, speech recognition
converts voice into text. Checks the grammars and then using speech synthesis to convert text
into pre-recorded audio. The recorded audio should be store in the administrator. It should
display to the user.

Voice Browser
Fig 1. Block Diagram of Voice Browser
VoiceXML
scripts
Telephone
calls
Speech
recongnition
n
Request
through voice
Grammars
Voice
Browser
Audiofiles
Touch tone
Multimedia
files
Admin
Maintain
database
User
Request
through text
Resolve
request typeHTML
scripts

Voice Browser
Fig. 2. Uploading and downloading
User
Request via
touch tone
Feedbac
k
Request
via phone
Request
name
Downlo
ad
Upload
Send
to
Voice
xml
Gramm
ars
Audio
files
Speech
synthesis
Voice Browser
Administrator
Search
Permissi
on grant
Delete
member
s
Updatio
n
Manage
Data base
Maintain
information
Reslove
request
type
Receive
request
Serve
r

Voice Browser
User Interaction via Browser
Fig 3 : Sequence diagram
USER VISUAL
BROWSER
VOICE
BROWSER
ADMIN
request for home page
search content
send html files
voice request
send voice xml files
display
text or voice output
generate html files
grammar checking
pre-recorded audio

Voice Browser
Admin -Administrator has the authority for convert the voice into text,text into voice and then
displaying to the user.
ASR-Automatic Speech Recognition is to convert the speech into text.
Fig 4 Block diagram for conversion of voice into text.
In above diagram ,voice is as input which is to be converted into text data. Voice is
analog quantity thus it can handle by digital for that purpose in above diagram we use analog to
digital converter then this voice is get divide into two parts, every part is pass through Acoustic
model and Language model respectively. The output of this two parts are combine and pass
through speech engine. Speech engine further process the signal and final convert speech into
text.[1]
Acoustic Model
An acoustic model is created by taking audio recordings of speech, and their text
transcriptions, and using software to create statistical representations of the sounds that make up
each word. It is used by a speech recognition engine to recognize speech.

Voice Browser
Language Model
A language model is a file containing the probabilities of sequences of words. Language
models are used for dictation applications, whereas grammars are used in desktop command and
control or telephony interactive voice response (IVR) type applications.
Speech Engine
A speech engine is software that gives your computer the ability to play back text in a
spoken voice (referred to as text-to-speech or TTS).
VoiceXML
VoiceXML is a dialog markup language designed for telephony applications, where
users are restricted to voice and DTMF (touch tone) input. There are other languages: VoXML,
omniviewXML text.
Speech grammars
In most cases, user prompts are very carefully designed to encourage the user to answer
in a form that matches context free grammar rules. Speech Grammars allow authors to specify
rules covering the sequences of words that users are expected to say in particular contexts. These
contexual clues allow the recognition engine to focus on likely utterances, improving the chances
of a correct match.

Voice Browser
Differences Between Graphical & Voice Browsing
Visual and aural are two most important channels of information processing. While most
of the interaction with computers have been designed around the visual channel, there are
circumstances where voice based man-machine interaction becomes preferable, and in some
cases, necessary, given that voice based interaction comes naturally to humans and can be used
by illiterate people easily. Voice User Interfaces (VUIs), however, are linear and nonpersistent,
thus have serious implications on the working memory load. Compared to a visual interface,
VUIs (considering Interactive Voice Response system) is slow as access is sequential, rather than
random. Moreover, however robust a Speech Recognition (SR) platform may be, it can never
achieve 100% accuracy. This results in an error prone interaction. In addition, speech interaction
may require higher user attention, and take a longer time to complete tasks, as compared to using
Graphical User Interface (GUI).
Graphical browsing is more passive due to the persistence of the visual information.
Voice browsing is more active since the user has to issue commands. Graphical Browsers are
client-based, whereas Voice Browsers are server-based.[6]

Voice Browser
ADVANTAGES
 Less space requirements.
 Portable voice browsers can also be implemented.
 Practical interface for functionally blind users.
 Users can browse web while keeping there hands and eyes for other jobs.
 Voice interaction can escape the physical limitations on keypads and displays as mobile
devices become ever smaller.
APPLICATION

Voice Browser
The speech technology is supposed to grow rapidly. The voice portal market is going to reach
billions in just a few years. It is estimated by the kelsey group that voice browsing market will
reach 6.5 billion dollars, while OVUM estimates a world market of 26 billion dollars. Anyone
may guess the actual growth of the industry of voice technology due to variations in these
figures. It is very difficult to navigate on a WAP to scroll through many lists. Hands-free
interaction enables us to develop an easy communication between the user and the system. [3]
Voice browsing can be used to access three kinds of information:
(a)Business: information like automated telephone ordering services, support desks, order
tracking, airline arrival and departure information services, cinema and theatre booking services,
home banking services, etc can be retrieved using voice browsing very easily.
(b)Public: voice browser can be used to access services like local , national and international
news alongwith community information such as weather forecasting, traffic conditions, school
closure and events. it can also be used to gather information on national and international stock
market information and also business and e-commerce transactions.
(c) Personal use : It is used in accessing personal information like voice mails, personal
horoscope ,personal newsletter, calendars, address and telephone lists etc.

Voice Browser
FUTURE SCOPE
 Accuracy will become better and better by using better speech reorganization.
 Dictation speech recognition will gradually become accepted
 Greater use will be made of “intelligent systems” which will attempt to guess what the
speaker intended to say, rather than what was actually said, as people often misspeak and
make unintentional mistakes.
 Microphone and sound systems will be designed to adapt more quickly to changing
background noise levels, different environments, with better recognition of extraneous
material to be discarded.

Voice Browser
CONCLUSION
In order to make technology more familiar to the user its access should be made more
easier. As we know that visual internet access experiences various limitations such as people
who are physically handicapped (specially blind users) cannot use keypads or touch screens for
giving instructions. Above all these limitations today’s generation demands to use internet
independent of PC’s and also hands free access to it. For this VOICE BROWSING is an
intelligent idea. This allows user to access web even in situations like driving etc where user
operate web just by listening and speaking rather than typing. Thus at last we conclude that
Voice browsing provides a natural way of accessing webs. Now it is up to the developers to take
up some inventory measures in order to bring this technology to us in a more colorful way.[4]

Voice Browser
REFERENCE
[1]L. D. Catledge and J. E. Pitkow. Characterizing browsing strategies in the World Wide Web.
Computer Networks and ISDN Systems, 27(6):1065–1073, 1995.
[2]M.Bruynooghe and al.From Interpretation : towards the Global Optimization of prolog
Programs. In Proc. 1987 Symposium on logic programming, San Francisco,CA.
[3]International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume1 Issue
2 Nov 2012.
[4] Raman, T. V. (1996). Emacspeak - Direct Speech Access. ASSETS '96: The Second Annual
ACM Conference on Assistive Technologies, pp. 32-36, New York, ACM SIGCAPH.
[5]Beasley, R. et al.: Voice Application Development with VoiceXML. USA: Sams Publishing,
August 2001. (ISBN 0-672-32138)
[6] THE NEW ERA OF BROWSING -VOICE BROWSING Khushbu
1
, Manika Kapoor
2
,
Ayesha Tafsir
3
paper published at www.ijecs.in International Journal Of Engineering And
Computer Science ISSN:2319-7242 .

voice browser

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to voice browser

Similar to voice browser (20)

Recently uploaded

Recently uploaded (20)

voice browser