SlideShare ist ein Scribd-Unternehmen logo
1 von 71
Downloaden Sie, um offline zu lesen
Master Thesis
Model Driven Approach
In Telephony Voice Application
Development
by
Ilke Muhtaroglu
Matriculation Number: 248229
Aachen, April 10th
2006
Media Informatics
The Fraunhofer Institute for Media Communication IMK
Prof. Dr. Dr. h.c. Martin Reiser
Supervisors:
Prof. Dr. Dr. h.c. Martin Reiser (Fraunhofer-Institute for Media Communication)
Dr.-Ing. Joachim Köhler (Fraunhofer-Institute for Media Communication)
Dipl.-Ing. Wolfgang Schiffer (Cycos AG)
I assure that this work has been done solely by me without any further assistance but the
official support of the Fraunhofer-Institute for Media Communication (IMK). All the
literature used is listed in the bibliography.
Aachen, April 10th
2006
(Ilke Muhtaroglu)
Acknowledgments
As I wrote this last page of the report I would like to thank to all my friends and the
professors at the university. As well as my supervisors and colleges at the company. I
would like to thank my family also for their support through this work and also my
second family in Germany. And for someone’s’ special.
Abstract
Telephony applications are started to be widely used, with the improved technology it is
also possible to access the web content by using telephony systems. Model driven
software development is offering an alternative approach to programming, where
modeling gets more focused and important. By applying model driven software
development approach, application design and implementation are united and this
provides more rapid software development and abstraction from the source code. It
decreases the programming and new technology learning efforts, so that designers can
easily put more values to their applications by focusing on their business domains and
business requirements; instead of dealing with many implementation details and
requirements.
In this report, model driven software development application for telephony domain is
examined and related technologies regarding telephony application is described. Finally
the implementation regarding the concept is presented.
Table of Contents
1 Introduction................................................................................................................. 2
1.1 Motivation........................................................................................................... 2
1.2 Overview of Thesis............................................................................................. 3
1.3 Model Driven Approach ..................................................................................... 4
1.3.1 Assembling the component library........................................................... 10
1.3.2 Developing the domain-specific modeling language................................ 10
1.3.3 Developing the code generator ................................................................. 10
1.4 Telephony Voice Applications Deployment Architecture................................ 11
1.4.1 VoiceXML Platform................................................................................. 11
1.4.2 Speech Resources...................................................................................... 12
1.4.3 VoiceXML Application Side .................................................................... 15
1.5 Voice User Interface ......................................................................................... 18
1.5.1 Human – Machine Interaction through Speech ........................................ 18
1.5.2 Voice User Interface Principles at the Development Phases.................... 20
1.6 Modeling approach for VoiceXML .................................................................. 25
2 Model Driven Software Development...................................................................... 28
2.1 Concept ............................................................................................................. 28
2.2 Developing a MDSD Process ........................................................................... 30
2.3 UML Extension Mechanism............................................................................. 34
2.4 Functional Parts of a MDSD tool...................................................................... 38
3 Implementation ......................................................................................................... 41
3.1 Meta-Model....................................................................................................... 41
3.2 Model................................................................................................................ 43
3.3 Generated Code................................................................................................. 45
3.4 Development..................................................................................................... 46
3.4.1 Manually Written Code................................................................................. 46
3.4.2 XML Validation and XML Well-Formedness Check .................................. 47
3.4.3 Platform......................................................................................................... 48
4 Conclusion ................................................................................................................ 49
Table of Figures..................................................................................................................50
List of Tables......................................................................................................................51
Glossary..............................................................................................................................52
List of Abbreviations... ..................................................................................................... 53
Bibliography.......................................................................................................................54
Appendix............................................................................................................................ 57
1 Introduction
1.1 Motivation
The computers and its related disciplines are improving each decade dramatically, while
we can not see this fast paces in a year scale. The improving hardware resource of the
computers are accelerating and supporting the software that run on them. Software
development is also improving with new programming techniques and tools. If we
compare the level achieved now with the level in the 70ties, we can easily see the fast and
continuing improvement. These technologies also change our lifestyle as well. Computer
usage in a company is a must by now, in order to stay “up to date” in the competitive
business. Internet usage has a place in everyone’s life.
Many companies have already done far in realizing e-business to stay alive and
competitive in their business; every company use email now. Companies want to provide
access to their e-business infrastructure for their costumers, employees and business
partners when and where they need after the enhancements in “mobile technologies”.
With the recent advances in speech technologies both in software and hardware aspects
voice access to computers also became possible. Today the industry standard telephony
voice application programming language VoiceXML, powered with speech technologies,
enables 1 billion telephones to access to computers for transactions and services. The old
proprietary “touchtone” systems are now replaced with the open standards based, new
breed speech technologies and protocols like Voice over IP, Speaker Verification, Speech
Recognition, and Text-to-Speech (Speech Synthesis). Through the text VoiceXML term
will be used synonym for telephony voice applications.
Since Spoken Interaction is the most intuitive way that people use for communication
with each other, voice user interfaces appears to be easier and preferable way for
interaction with computers. The users of these voice user interfaces don’t need to learn
how to use a computer, how to deal with internet browsers and other new tools. This way
of interaction is also vital for disabled people who can’t use conventional computer
interaction devices (Mouse, Keyboard, Monitors…) in an easy way. For some years
Dictation, Command and Control tools are widely available as a supporting interaction
channel that leads to multimodality in applications. The natural result of such speech
enablement of Computers is more clever computers. Therefore the new users of
VoiceXML will tend to use voice applications more easily and each deployed application
will have more dependent users [2].
VoiceXML technology is about to boom, and about to bring a new aspect to our daily life
and to company’s interaction with their costumers. Although every company has the
potential to use VoiceXML applications in their business, at the moment it is not so easy
to develop such applications without an understanding of the underlying technologies to
some extent. In any deployed VoiceXML application architecture there are so many
technologies supporting this realization of turning a telephone talk to executing a
transaction at the database. All these technologies (Text-to-Speech Engine, Automated
3
Speech Recognition, VoiceXML Browser, Telephony Cards, Speech Signal Processing
Cards…) need their own specializations and understanding by the developers in order to
be used. Therefore there are many technology vendors cooperating with the established
standards to provide these technologies for the realization of the VoiceXML applications
at an enterprise scale. VoiceXML as an XML based language is similar to HTML, but it
is executed in a voice browser at the server side contrary to HTML. But they both serve
for the same purpose in sense of functionality; while VoiceXML is enablement of access
to Web by speech, HTML is the visualization of Web.
The Web sites of today are much more advanced and dynamic then those years of first
static informative web pages. Many web technologies like web browsers, scripting
languages and web servers are improved by the time; VoiceXML 2.0 is also in such a
way, together with his supporting technologies for telephony voice applications, at the
time being. While at the client side there is only necessity for a telephone, at the server
side there are still many issues that need to be improved. One of them is to ease the
programming of the VoiceXML applications and to shorten the implementation time in
the VoiceXML software lifecycle.
By applying a modeling driven development approach for VoiceXML applications, the
project teams can minimize their efforts for underlying technologies and source code
implementation. By modeling they can focus on their business goals, end-users,
application of voice user interface techniques and more. Modeling will also make the
applications independent of the implementation platform and the implementation
language. The applications will be more understandable, because of increasing the level
of abstraction from implementation language to the models that represent the
implementation concepts. By this approach, refining and modifying the application will
also be easier than understanding and changing the code.
1.2 Overview of Thesis
The first chapter of the thesis gives an introduction to the Model Driven Software
Development and the Telephony Voice Applications through problem description.
At the second chapter Model Driven Software Development concept is described in more
detail.
The third chapter describes the implementation and evaluation of this tool by means of
two scenarios.
The forth chapter provides a conclusion about the work.
4
1.3 Model Driven Approach
Software development is expensive: “Companies and governments in USA spend more
than $250 billion each year on information technology application development of
approximately 175,000 projects. Over 31% (54,250) of these projects will be cancelled
before they ever get completed in large companies, and only 9% of projects will come in
on time and on budget. While more than half of them will cost nearly twice their original
estimates.” [1]. But on the other hand demand for software is rising in developed
countries’ economies, in scientific areas as well as in our daily life. These facts bring
necessities and efforts on making software development more efficient in the sense of
both development costs and usability. Therefore software development progress’ itself is
under research and standardization in order to decrease these cancellation risks, in order
to make the development phase and the product more valuable. As in depicted in Figure
1, a software development lifecycle includes many iterative activities like [30]:
• Project definition
• Requirements analysis
• Risk management plan
• General design
• Architectural Design
• Feasibility study
• System requirements
• Software requirements
• Detailed system design
• Construction
• System implementation
• Quality assurance and testing
• Deployment
• System maintenance
5
Figure 1 Software Life Cycle [30]
In a Software Life Cycle all steps are important, but the initial Analysis and Design Steps
have a higher importance for the success and the future of the product. But the initial
steps before implementation are mostly considered as straightforward and self evident,
therefore appointing less time and effort compared to implementation phase in a mistaken
way. This wrong approach mostly results in a waste of project resources at the long term.
On the other hand Software implementation step takes relatively longer time than the
other steps, hence forcing the project team to pace through the initial steps before
implementation in a fast and furious way. This is especially the case when there is a quick
deployment constraint due to business goals of the project. Figure 2 shows how the three
main software activity efforts (Requirement analysis, Implementation, Testing) change
with time [31]. In this Figure we can see how implementation takes half of the efforts in a
project starting with prototyping and finishing with a recursive testing-implementation
cycle.
6
Figure 2 Software development: effort vs. activity [31]
At the moment the implementation process is supported with many technologies of
software development like third generation programming languages (3GL), software
development kits (SDK), software reuse, object oriented programming (OOP),
components and frameworks. But these technologies were not available at the 1950’s; by
the time software practitioners, industrial experts and academicians have raised the
abstraction of programming languages and developed new techniques to increase the
level of reuse in software construction. At the beginning there were only electrical
machines that were controlled with wires, and then assemblers came to take care of
generating sequences of ones and zeroes for those improving electronic machines. From
these abstractions then the first programming languages such as FORTRAN were born,
by which the formula calculation became a reality. Then C enabled portability among
hardware platforms. This pioneering brought the new techniques for structuring so that it
was easy to write, to understand and to maintain. Now we have Java, C++ with an object
oriented approach for structuring data and behavior together into classes.
In this improvement from a lower level of abstraction to the next, at each level the
developers invented new techniques to increase the easiness of their tasks and developed
new concepts which lead to the existence of the next abstraction level. And for each
abstraction a new tool was developed to map this abstraction to the previous level of
abstraction. Assemblers were developed to map the code to “ones and zeros”, then the
preprocessors, and then the compilers to support all these improving techniques and
concepts as depicted in Figure 3. In this pattern while the practitioners use a new
abstraction level, they also formalize the knowledge and set of methods for its use. Then
from this formalization and practices a new higher level language is born and also a tool
that maps it to the lower level. At the time being these formalization efforts for 3rd
Generation Languages born the “Modeling Approach” for software development, in
7
industry we can see this transition with the high use of UML and its standardization [1,
7].
Figure 3 Software development: effort vs. activity [31]
The development of the “Re-use” technique followed a similar way like abstraction of
programming languages. First the function phenomena in mathematics applied to
eliminate rewriting effort of code for similar contexts. Use of functions also provided the
desired effect against the limited memory available. Then grouping the functions and the
data brought the “Object” approach. With the increasing trend of reuse, the components
and the frameworks appeared. By them the common services and necessary
functionalities like security, network connections are provided for today’s applications.
This increasing granularity of reuse now enabled focusing more at the designing and
modeling the application for the project requirements, as well as new opportunities to
decrease the efforts for implementation [1].
8
Figure 4 Raising the level of Re-use [1]
Modeling raises the level of abstraction and hides today's programming languages, in the
same way that today's programming languages hide assembler. Symbols in a domain-
specific model map the domain elements, for which the application is designed. This can
be an abstraction higher than UML. Then the application is automatically generated from
these designs with domain-specific code generators that use the existing component’s
codes. For many years it is recognized that there is a vital difference between the
application’s domain and its code. On the other hand only UML alone does not solve the
gap between the domain solution and code; it just brought standardization for visualizing
and representing the code. It does what a blueprint does for an architect to design the
architecture of a building, and for helping the communication between an architect and
the builders of the house.
In Model Driven Approach or Architecture (MDA) the models are made form the
elements that represent the concepts from domain world, not the code world. This allows
the modeler to work in the domain world with the domains concepts, where he can focus
on solving the problem as he perceived it in his profession. Then this domain solution can
be automatically transformed to the solution of the code world. This automation is
possible because both the design language and the generators need to fit the requirements
of one domain. One expert developer defines a domain-specific language containing the
chosen domain's concepts and rules, and specifies the mapping from that to code in a
domain-specific code generator. Then the other developers can make models with the
modeling language and the code will be automatically generated. Since the experts have
specified the code generators, they produce products with better quality than could be
achieved by normal developers by hand. In Figure 5, we can see the traditional methods
for achieving the finished code from domain idea and also how MDA solutions map
domain idea to finished code [5].
9
Figure 5 From Domain Idea to Finished Product [5]
According to Domain Specific Modeling (DSM) three things are necessary for modeling
with automatic code generation: a domain-specific modeling language and editor, a
domain-specific code generator, and a domain-specific component library [5]. In order to
achieve this there is a necessity for an expert that has an understanding and experience in
this domain with knowledge about the architectures and component libraries of the
domain.
Figure 6 Main Steps for Domain Specific Modeling [5]
10
1.3.1 Assembling the component library
A domain-specific component library is not always necessary, but it makes the task of the
developer, or the so-called “modeler”, significantly easier. Often, code components
already exist from earlier development cycles, at least in the form of reusable pieces of
code. Further development of these pieces of code into true components is a relatively
easy task for the expert, requiring only the normal developer programming tools. In
addition to domain-specific components developed in-house, the library can of course
contain generic third-party components. Component library is nothing new; it is what the
developers often use as a part of their development processes.
1.3.2 Developing the domain-specific modeling language
A harder task is the development of the domain-specific modeling language; the concepts
and notations that will be used by the “modeler” to build his model. The experience and
the intuition of the expert, combined with hints from the component library, domain rules
and architects are the real sources of clues. The expert should also consider the issue of
code generation, which will be his next task: a good mapping from the domain meta-
model to the combinations of components will make that future task much easier.
Already available meta-modeling languages can be used here to describe and to “meta-
modelize” both the domain and their rules.
In the creation of a domain specific modeling language, a toolset that utilize this process
is vital. A toolset that allows rapid prototyping is practically a necessity: the expert can
make a part of the meta-model as a prototype, and instantly test it by making an example
model. Similarly such toolsets can help and guide the expert in his tasks, and then the
expert can concentrate on the hard problems, while the toolset create the menus, toolbars,
icons and behavior automatically.
1.3.3 Developing the code generator
The code generation definition forms the final task. In this phase the expert specifies how
the code can be automatically generated from the structures and the components in the
users' models. The modeling toolset should provide the necessary functionality for
creating such generation scripts, and should guide the expert wherever possible by
allowing him to reference and use the concepts in the meta-model.
Of all the phases, code generation phase probably varies the most between domains. In
some domains it will be possible to produce a large fraction of code with a simple code
generation scripting language, which is already provided in most modeling toolsets. In
some other domains, it may be necessary to use a relatively more powerful scripting
language to operate on the model data exported from the modeling editor. The most
important goal is that the “end modeler” should be able to use the code generation in a
simple way.
11
1.4 Telephony Voice Applications Deployment Architecture
Contained in the VoiceXML title there is an encapsulation of many hardware and
software technologies. At the user side there is a necessity for a telephone; it can be any
phone that has a microphone and a speaker like a handy, an old conventional telephone or
a “softphone” that runs in a computer and works with VoIP. The telephony network that
connects the telephone to the VoiceXML browser platform involve many telephony
network routers, switches and several signal transportation mediums and techniques like
wireless medium, VoIP, fiber optic lines, conventional telephone boxes (PSTN),
“softswitches”… But these telephony elements can be ignored for the sake of simplicity,
unless they create unexpected effects like latency in communication that will effect the
voice user interface and recursively other decisions in the a VoiceXML application. At
the left side of the Figure 7, these several kinds of telephone network elements and
telephone devices that connect to VoiceXML platform are represented.
1.4.1 VoiceXML Platform
VoiceXML platform is one of the most important elements in a VoiceXML application
deployment. It enables the processing and bridging between the application and the
telephone networks. It acts like an internet browser for the end-user’s telephone. A
VoiceXML platform has 3 main units:
• VoiceXML interpreter (VoiceXML Browser)
• Telephony Gateway
• Configuration and Management Unit
Externally, Automated Speech Recognition (ASR) and Text-to-Speech Engine (TTS)
resources are serving to the platform for speech recognition and speech synthesis tasks.
Figure 7 VoiceXML Deployment Architecture [11]
12
The telephony gateway unit enables the connection to various telephony networks via
telecom cards and provides necessary telephony software interfaces for controlling and
using them. It basically does the communication between the telephone network and
VoiceXML Browser. VoiceXML Browser is the core part of the VoiceXML platform. In
the VoiceXML Browser the VoiceXML documents are fetched, validated and then
interpreted with use of TTS and ASR resources. It is the triggering point for the actions in
the platform. The VoiceXML interpreter architecture of HP’s Open Call Platform is
presented in Figure 8. The units showed in this figure basically do the following:
Interpreter Context unit handles the incoming calls and associates them with VoiceXML
pages to be interpreted. Network Fetcher fetches the VoiceXML, grammar and audio files
with HTTP connections. VoiceXML processor executes the VoiceXML tags, attributes
and the commands in the document itself. Configuration and Statistic manager units
enable the configuration and management utility for the server; through these two units
each VoiceXML application can be monitored and configured, also the data for reporting
is generated here. The Document parser validates and parses the VoiceXML documents
from syntactical point of view. The Cache Manager Unit is, as its name implies,
responsible for caching of audio files, documents and other raw data. The Device
Controller has important functionalities like handling call controlling and interactions
with media resources like ASR/TTS servers [11].
Figure 8 Architecture of a VoiceXML interpreter [11]
Configuration and Management plays an important role for testing and improving the
deployed VoiceXML applications. Web based configuration and management interfaces
are widely available, but java based “Rich Client” applications are also used for enhanced
functionalities like real time call monitoring, and for further tuning and statistical reports.
In general these utilities are important in the VoiceXML application lifecycle.
1.4.2 Speech Resources
Automated Speech Recognition (ASR) and Text-to-Speech Engines (TTS) enable the
speech processing capability of the whole VoiceXML platform. ASR and TTS
technologies are result of the cooperative work of many professions, and a currently
13
improving technology. The ASR and TTS engines are dependent on the language
characteristics of the countries. Therefore most of the VoiceXML platforms are not
designed to include their own ASR and TTS Engines, but designed to cooperate with 3rd
party Speech Resources through vendor-independent interface standards. Media Resource
Control Protocol (MRCP) is such an application layer protocol that provides this
necessary vendor-independent interface between Speech Media Servers and Speech
Application Platforms. The major speech processing resources are [36]:
• Basic synthesizer — Speech synthesizer resource with very limited capabilities
that play out concatenated audio file clips.
• Speech synthesizer — Full capability speech synthesizer that produces human-
like speech using the Speech Synthesis Markup Language specifications.
• Recorder — The resource with end-pointing capabilities for detecting the
beginning and ending of spoken speech and saving it to an URI.
• DTMF recognizer — The DTMF-only recognizer detects the telephone touchtone
and reacts accordingly.
• Speech recognizer — Full speech recognizer that converts spoken speech to text
and interprets the results based on semantic tags in the grammar.
• Speaker verification — Authenticate a voice as belonging to a person by matching
the voice to one or more saved voiceprints.
MRCP mainly enables voice resource provider independency and the independency from
the location where they reside. MRCP as a client-server protocol uses Session Initiation
Protocol (SIP) and Real Time Session Protocol (RTSP) to make the Speech Resources
Servers an independent node in distributed networks. By providing a standard
communication protocol between the client (VoiceXML platform) and speech servers,
MRCP abstracts and encapsulates the underlying rapid enhancing speech technologies
and vendor dependent architecture and implementation of them. MRCP is not only to be
used in VoiceXML platforms, but for any place with a MRCP client.
Figure 9 Media Resource Control Protocol Architecture [36]
14
Speech Synthesizing or Text–to-Speech process has several steps to create a speech as
depicted in Figure 10. Speech Synthesis Markup Language (SSML) enables non-
procedural control for each of these stages to create a more natural speech. The first step
structure analysis parses the text to be synthesized from syntactical point of view, like
detecting the paragraphs, spaces and comas. Text normalization is the process where the
special symbols like dollar sign “$”, or Shortcut for Doctor “Dr.” are detected and
interpreted to find out how they should be spoken. After detecting the set of words to be
spoken, TTS engine decides pronunciation for each word. The TTS engine’s
pronunciation can be controlled by creating the pronunciation for that word with the use
of IPA (International Pronunciation Alphabet). Then the prosody analysis is done to
decide the pitch, rhythm and rate of speaking. By using the SSML tags at this level the
speed of the generated speech can be controlled. Waveform production is the final unit
where the audio waveform output is generated from the phonemes and prosodic
information. Currently as an alternative to machine synthesized metallic voices, there are
more natural sounding voices with concatenating very small human sound pieces, which
all together form the sound of a word. These small sound units (triphones, diphones) are
extracted from the speech of talented “transcriptioners” in the laboratories.
Figure 10 Speech Synthesizing Process [9]
Speech Recognition, contrary to Text-to-Speech Engine, takes the speech as input and
creates the recognized text. At Figure 11 we see that first step in speech recognition is
detecting the beginning and the end of speech. It listens for the silence to extract the
spoken utterance of the user. Then forwards this waveform to the Feature Extraction unit;
this unit transforms the end-pointed utterance into a sequence of feature vectors. Feature
vector is representing the useful measurable characteristics of the speech. Then this
feature vector is used as one of the parameters of the Recognizer.
The Recognizer uses Dictionary, Grammar and Acoustic model to find the best-matching
result of the grammar. In this figure Dictionary is the list of certain words and their
pronunciations, which are introduced to ASR engine to perform better especially with the
foreign words. The Grammar is the definition of the words, which caller is expected to
say to the system and which are to be understood. Rule-Based Grammar and Statistical
Language Model Grammars are two important Grammars. In Rule-based Grammar with
explicit rules, what recognizer should listen for is stated. Alternatively in Statistical
Grammars, there is a need for collecting a lot of sample responses from speakers and
transcription of them. While statistical grammars allow more flexibility and neutrality for
15
speakers, the rule based grammars makes the semantic extracting from the recognition
result more easy for the programmers. Acoustic Model is the recognizer’s internal
representation of the pronunciation of each possible phoneme or basic sounds. Acoustic
Model is different for each language and sometimes adaptable to the environmental noise
characteristics of the speakers.
Finally, the Recognizer unit of ASR uses Feature Vectors and the Recognition Model
(which is made of Dictionary, Acoustic Models and Grammar) to do the recognition
search for a best match from the Grammars. The result is measurable for its confidence
and most of the recognizers return their answer’s confidence measure. But in case that the
recognition result is poor with a low confidence, the recognizer can result with no
recognition to avoid returning a wrong result. Basically the recognizer is looking for its
best match between a given grammar and what is spoken. When he can not find any
match, it simply returns no recognition.
Figure 11 Speech Recognition Process [9]
1.4.3 VoiceXML Application Side
The VoiceXML platform communicates with the application side via HTTP. The Web
server at the application side serves to the platform in a request/reply manner. For
example when the caller triggers an action, like a request for a new VoiceXML
document, the VoiceXML browser forwards this request to the web server with the
related parameters and asks for the next VoiceXML page to present to the caller. Then
the Web server takes this request and after evaluating the request in cooperation with
Application server and Back-end Systems, it returns the dynamically generated
VoiceXML page back to the VoiceXML platform. The Grammars, Audio prompts are
also provided to the VoiceXML Browser to use in case of necessity as the caller browse
through the VoiceXML document. From the functionality point of view, we can easily
make an analogy between an HTML page and a VoiceXML document, also between
pictures of a HTML page and Grammars and Audio Prompts of a VoiceXML page.
16
Therefore a VoiceXML Browser and a HTML Browser have similar functionalities; one
serves for the visual interaction purposes with the application side, while the other
enables a speech interaction with the application side.
Figure 12 VoiceXML Architecture at Server Side
Due to its similar functionality VoiceXML uses the techniques and architecture already
developed and improved for visual web, as shown in the Figure 12. When we consider
the enhancements in Web server technologies at the last 10 years, we can see how the
VoiceXML has paced fast in a short time by using the techniques of the Web. VoiceXML
inherits the multi-tiered architecture of Web applications and gets empowered with the
“Model-View-Control” architecture due to providing a new “View” way for the
applications to be developed. When we look at a Java’s Multi-tiered Enterprise
Architecture in the figure below, we can see that Application Logic, where the business
requirements are formulated is separated from the View of the application. This loosening
enables the application to be viewed and used in different client environments. In this
Architecture the User Interface can be a Web page, a Java Applet, a Rich Client that run
at the client side and more... Web tier enable the access of potential Web clients like Web
Browser, WAP browser and also VoiceXML to the Application Logic or in other words
to the Business Tier. But the Rich clients that are implemented by different programming
languages can also provide different types of Graphical User Interface to the end-user
alternatively. Web Tier can be implemented with many technologies like Java Server
Pages (JSP), JavaScript (JS), Servlets, Active Server Pages (ASP) to provide the
necessary HTML, WAP, VoiceXML code to those Browsers.
17
Figure 13 Multi-tiered Application [33]
On the other hand Business Tier is the core part of the application, where the business
logic is implemented to solve the problems in the domain. Business Tier receives data
from the user, process that data coming from the user with the data available at the Back-
end System, and then via the user interface presents the data back. The necessary
interfaces and communications are developed at this tier to communicate both with the
client and the Back-end Systems. The final tier, Enterprise Information Tier, is the
encapsulation of components that provide necessary data via Databases and also the
encapsulation of services like web services or other enterprise applications.
As a result of loosening the application segments with multi-tiered architecture, a
programming pattern “Model – View – Control” turned out to be more practical to be
applied. In this approach core business model functionality is separated from the
presentation and control logic. This separation allows multiple views like HTML,
VoiceXML, and WAP to share the same enterprise model. In the Figure 14, model
represents enterprise data and the business rules that govern the access to the data. View
renders the contents of a model and specifies how that data should be presented, while
Controller translates the interactions coming from the View into actions to be performed
by the model and also selects the view for the response to the interaction. The main
advantage of this approach is the enablement of multiple views and clients by using the
same enterprise model.
18
Figure 14 Model-View-Control Pattern [34]
1.5 Voice User Interface
1.5.1 Human – Machine Interaction through Speech
From the end-user point of view, the VoiceXML is a series of continuing dialogs between
the computer and user itself about interrelated topics. He listens for the talk of the
machine and he replies to it again by talking. At this point the dialog is the base point for
communication and interaction between the computer (Machine) and the user. This kind
of interaction between a machine and its user is something not widely available and
practiced by the users yet. For a car driver or a computer user, it is certain that the user
will do his best in order to make the machine understand what he wants to be done. But
now there is only a speech channel used for the interaction and this is done via a
telephone, where there is only a microphone and a speaker available. Unlike talking with
a real person, now the user should be aware that he is talking with a computer and this
machine is not as capable as a human. This speech technology has still much to do, in
order to mimic a real person. What is available now is a machine trying to do his best,
unlike the case where the user is trying to do his best to use internet browser, to keep the
communication going on. The user interacting with speech should be aware that he is
talking with a machine, so that he doesn’t expect too much from the correspondent in
means of cleverness and other human aspects. This nature of communication with human
19
brings more effort also at the machine side; machine should do its best to mimic like a
real user; should consider always the user’s state of mind and other aspects while
communicating with him.
Users should be aware of the limited intelligence capabilities and should help the
understanding of computer. The auditory interaction between the machine and the human,
voice user interface, therefore needs skills from speech technology, user interface design,
cognitive psychology, linguistic and software development. Since the voice user interface
is only intractable part of the product, it desires a professional focus for the success of the
VoiceXML application to be deployed. Therefore the design environment of the
VoiceXML applications should not constrain the application of Voice User Interface
techniques, contrarily should ease it.
Voice user interface is roughly built up from the prompts, grammars and call flow. The
prompts are the recordings of the speech of a voice talent, or the synthesized speech that
is played to the user. Grammars are possible things the user can say in response to the
prompt, or with other words the possible things that computer should listen for and then
should understand. Computer can say anything, any text in means of the availability of
technology, but cannot understand the things that are not somehow in grammar and not
commanded to be listened for. Call flow, which also called as dialog logic, defines the
actions done by the system based on the conditions and properties and inputs from the
user. Unlike in the multimodal interfaces, where speech is one of the interaction channels,
in auditory interfaces the interaction has a unique design challenge because the message
exchanged are non-persistent and transient due to characteristics of speech
communication. The user is intended to create a persona (personality) in his mind, and
continues the communication with this sound and feel. The designer is expected to stay
consistent in his design with the character of persona he wants to create, since the user
creates a mental model for the person-computer he is communicating with.
The following type of applications suits well for VoiceXML due to the non-persistency
characteristic of auditory interaction medium [3]:
• VoiceXML is the best alternative for DTMF applications, where the user deals
with a limiting non-intuitive phone key pad. For these already deployed touchtone
applications VoiceXML can be intuitive and efficient solution by improving the
quality of interaction.
• Where the voice is the preferred mode of device interaction; for hands-free and
eyes-free activities like driving a car.
• For the telephone devices, which have generally very poor and limited user
interface for keypad and screen.
• For ubiquitous access by telephone; enabling the phone access to many systems
available anywhere for improving the reach.
20
• For the users that have physical disabilities.
• In cases where the users are motivated to use a VUI enabled service because it
saves time and money.
Unlikely, VUI applications are not suited for noisy environment and for situations in
which user may be talking with people and the device at the same time. In these cases the
speech recognition engine can not perform well. Also the applications that are visually
complex and have large information content are not ideal for voice user interface. But
under these cases even the application can be developed, if the user is mobile and speech
is the only way to access this service at that time.
1.5.2 Voice User Interface Principles at the Development Phases
A good voice user interface design follows 5 main principles. The first one is End-User
Input; during the design the designer should validate his decision from the point of
usability. He should always, at any level of design, use the user-input that is appropriate
for that phase. This is the key concept in a user-centered design paradigm. As the second
principle, the designer should find a solution to meet both the business goals and user
needs. He should find solutions and decide for trade-offs in order to meet the need of both
parts. The third principle: It is always more expensive to make a design decision late in
the project, so it is important to have analysis of the requirements and high level
functional design before going into details. The 4th
principle is to see the voice
application always from a higher point of view, so the designer does not get lost in
details. He should always keep a bird eye at the overall flow of conversation. Finally all
design decisions should be considered in context relevant to the Application need, User,
Language-use, and Persona.
With these voice user interface principles in mind, the designer should focus on and
should give several decisions during the 6 phases of the application development. These
phases can be summarized to Requirements Definition, High Level Design, Detailed
Design, Development, Testing and Tuning. At the Requirements Phase the designer
should achieve a detailed understanding of the application. If not already the designer is
familiar with the business, he should understand the business model and the motivation at
the company side to deploy such a service. Designer should get familiar with the
competitive environment of the company and the image the company trying to create.
The designer can speed up the understanding of the company easily by making contacts
with company personal. Then understanding the user takes the turn, designer should
examine the caller profiles. Are the callers using industry jargons or a special geographic
dialect? Do the callers have a level of technologic experience like using internet or
DTMF applications? What is the general state of mind, self image and mental model of
the callers? Such familiar questions should be answered as much as possible by the help
of interviews or surveys with costumers and company personals. After this understanding
the application takes part; which tasks and subtasks there will be. The complexity of these
tasks from the caller point of view and which information is supplied by the system to the
caller should also be examined. The usage characteristics of the application, like one-time
21
or repeated usage, the environment state or whether the user is willing to talk with an
automated system, these situations should be considered at the requirement definition.
The early work before going into detailed design provides many advantages for the sake
of the project. Therefore High-Level Design is important to guide the design and provide
decision elements. First task is deciding for “Key Design Criteria”, which will help for
the design trade-offs and for major focus of the design. It should be one to three short
items that should be always kept in mind during the design. Then a decision for the
Grammar type and dialog strategy should be given. There should be also a consistent
character of the application persona. The terminology should be recurrently and
consistently used through the application, parallel to the web site of the company. The
metaphors to create a useful mental model for the user should also be consistent both
between themselves and with the character the application presenting. People going into a
conversation even with a machine always make guesses about his correspondence’s
personality. The user takes many thinks into consideration like voice characteristics, the
words spoken, even the accent… These facts define the necessity for designing a persona
that is same like defining a character in a film scenario, by focusing on the personality
aspects that can be understood in dialogs. This persona will continue to extend the
company’s specific image and its brand.
The detailed design is the stage where the designer harvests all the details, aspects and
requirements of the project that will help him during the application development. For the
success and long-life of the deployment this phase has a crucial role. Therefore designer
should have a detailed design document where he discusses his design trade-offs and
design decisions. The detailed design has many aspects; at the application flow point of
view, the call flow will help the designer to formulate the flow between tasks, subtasks
with call flow elements. A dialog state is considered to be the smallest unit in the call
flow diagram. Such a dialog state includes [2]:
• Initial Prompt: this is what will be played initially when the flow comes to this
dialog state. This prompt can be dynamically generated based on the recent
history.
• Recognition Grammar: The specific information to be returned from the grammar,
like a city name, should be described. The high-level grammar definition at this
phase should be enough for the grammar developer to create the grammar at the
development phase.
• Error Handling: The designer must specify the handlings for those errors that may
occur due to recognition rejection, recognition timeouts, no-speech inputs and so
on. The designer should cleverly recover from those errors with his error prompts
by considering the context at the dialog state, in order to avoid from call hang-
ups.
• Universals: The commands that are always available in an application, like
“Help”, “Exit”, “Operator” should be sometimes overridden for better
22
functionalities. For example when a user says operator at the bill checking dialog
state, then “Operator” command should be cleverly overridden to connect the user
to an accounting operator, instead of a sales operator.
• Action Specification: The actions that may happen with a satisfied condition or
due to properties of the platform should be specified also for each dialog state. For
example a transition decision between two dialog states can happen with the
transfer of any logic and data from a backend.
The Call Flow design is the “bird look” on how the user should navigate through the
system, where the back-end information systems will be integrated, how will be the menu
structure, in which ways information collection will be done... As it is stated previously
the dialog states are the building stones of the call flow; with the other elements like
backend process, entry/exit states, and actions they define the call flow. Although there is
not a standard for defining the call flows and their elements, there are some guidelines to
follow in order to make them more understandable. The conventions or shapes should be
chosen carefully and should be used consistently. Informative names should be used for
those elements. The dialog state should be the lowest-level unit on the call flow; the
internal details of dialog state should be pictured separately.
Figure 15 Call Flow [2]
Prompts in a VoiceXML application create the most of the “Hear and Feel”, hence
enough detailed design effort should be given for deciding the prompts. The Words
should be chosen intelligently to satisfy the voice user interface concepts and principles.
While capturing the application persona other aspects, such like the experience of end-
user, should be always taken into consideration. The designer should carry out a
23
conversational design by writing the prompts and by reading out them loudly in their
conversational context. He should write sample dialogs between the system and user for
the considered tasks. Not only the successful dialogs but also the dialogs that cover the
error and help prompts should be worked out. This way the designer can get more near to
realistic and natural prompts, but this is not enough. Since written language and spoken
language is different for many aspects, the designer should experience the prompts by
reading them aloud.
The prompts to the user can be generated by text-to-speech engine where the prompt text
is highly dynamic and changing, like an email. But a voice actor can be used to record
those static prompts in order to provide a more natural sounding prompts. An important
issue is to have a consistency between voices when TTS and voice actor are together
used. Here a voice converting functionality can be used to keep the TTS voice similar to
voice actors.
The Designer at the detailed design phase should consider following principles to
optimize his call flows, prompts, grammars and other elements [2]:
• Minimize the cognitive load: Cognitive load can be summarized as the amount of
mental resources needed from the user for a task. It includes listening, attention,
memory, language processing, decision making, and problem solving and
speaking back. The designers carefully minimize the necessary information that
callers must hold in their short term memory and the amount of new concepts they
should learn. Designer should provide ways to recover, if the user gets lost
because of an attention loose or so.
• Accommodate conversational expectation: The end-users of VoiceXML
applications unconsciously have human-to-human conversation expectations.
With the persona in mind, designer should respond these natural expectations. For
example; the system shouldn’t use jargons that end-user won’t understand, it
should be explanatory for new concepts or it shouldn’t change conversation topic
to something unrelated.
• Maximize efficiency: The callers, especially the frequent ones, want speed and
efficiency. They won’t like to listen and to be asked the same information each
time. The designer should provide shortcuts to frequent tasks and should allow
barge-in functionality to skip the already understood prompts.
• Maximize clarity: The designer should always fight again possible ambiguities.
This is valid for every level in the application; at the prompt level designer should
avoid using synonym words that will lead to misunderstanding. At higher levels
callers’ mental model between dialogs should be taken into consideration so that
they don’t get lost and frustrated.
• Ensure high accuracy and error recovery: Recognition errors, no matter if it
happens because of the quality of the used recognition engine or due to the noisy
24
environment of the caller, must be avoided since they seriously reduce the caller
confidence and usability. Accuracy of a system can be calculated by
proportioning the number of correct and false accepts, as well as rejects.
Recovering intelligently from errors also needs detailed design effort. Recovering
from Rejects and Timeouts should be done by increasing the detail about the
requested information and by providing a help option, instead of forcing the user
to repeating his error. Confirmation is also another technique to check if the
recognized values are same with what intended by the user. “When to confirm”
and “How to confirm” are two important confirmation decision cases. The System
should also be intelligent enough to avoid repeating the errors, especially the
errors that are corrected. Avoiding the errors and ensuring high accuracy, which
increase the prestige of the system, must be always preferable than dealing with
recoveries and results of the errors.
Designers, due to being too near to their designs, are the worst people to judge the
usability of their designs. During the detailed design to get a good feedback and to
evaluate their design decisions end-user testing is “must”. Designer should refine his
design with a number of iterations of testing and design.
In order to get feedbacks from the user to improve the design, even before having the
prototype, a “Wizard of Oz” can be used. The “Wizard of Oz” simulates the behavior of a
working system by having a human that acts as a system. An example “Wizard of Oz”
tool in voice user interface design is “SUEDE” [6]. In this tool human performs virtual
speech recognition and generates the prompts by choosing the appropriate response from
prerecorded speech files. Using such a tool provides many advantages like chance of
early testing even before having a prototype. Using a “Wizard of OZ” also eliminates
getting stopped because of software bugs, when your aim was testing usability, and
getting early feedback for your design. Due to using a human for speech recognition,
“wizard of OZ” operates with high grammar coverage. This is also preferable instead of
using a poor, not finished grammar, which doesn’t cover enough responses of the user
and which will limit the testing. But also realistic grammar recognition work should be
carried out by that person that simulates system behavior.
The designer should choose the test personals carefully and also the test environment.
The tests should be preferably done by telephone, since telephone is the device where the
application will work by. This approach will make reaching the right sample of caller
(test user) population easier. Also it will be cheaper than bringing the test participants to
the office. Having the users in their real calling environment and state of mind will
provide also realistic test results. But the designer should provide enough information to
the test person by providing the task definitions that should be followed. The designer
should describe only the situation and tasks without saying the commands and strategies
for fulfilling those tasks. Test session should be copied to a tape for a detailed analyze.
The test person should know that this test effort is carried out for improving the design of
system, not for testing himself. A questionnaire should be provided to the test personal
with open-ended questions to get more idea about their experience and feedbacks. Finally
25
the designer should make a good analyze with the results of those tests; he should detect
the problems by looking at the symptoms that appear in the tests.
Before opening the application to the public use, there should be a pilot deployment for
testing and finding bugs at the operation time. This testing includes application testing,
recognition testing and usability testing. At application testing all the possible dialogs
should be traversed and the answer combinations should be tested as much as possible.
The test at this level should be done against the system to find out the bugs, so every error
should be noted. This testing will be Quality Assurance (QA) test that is executed for all
integrations, all conditions. After those functionality tests, the load test should be
performed by using software that simulates the calls to the system in order to check
efficiency and run-time errors in high usage cases. In recognition testing the application’s
grammar should be tested for the utterances supposed to be “accepted” and “rejected”.
The initial values for recognition parameters like “no-speech timeout”, “confidence-
threshold” should also be tested. Usability testing should be done again by random users
to get their idea about the complete system; the bottlenecks from the usability point of
view must be decided and corrected.
The tested application can be opened to public use after removing those critical errors. At
this phase the application tuning should be done to improve the application. The designer
must listen to some recorded calls to get idea about the working of the application at real-
time. He should also view the call logs and those reports generated by the platform like
task completion analysis, hang-up analysis and hotspot analysis. Such reporting and
analyzing utilities provided by the VoiceXML platform will make the task of designer
easier to tune the application for better performance.
Finally the deployed application needs to be continuously monitored against high call
rates and crashes. By the help of logs and reports the application should also periodically
tuned to adapt to changing caller profiles and usages.
1.6 Modeling approach for VoiceXML
Domain specific modeling represents a powerful way to share abilities and knowledge of
domain expert developers with “new into domain” developers. A powerful modeling
toolset can reduce the time and the effort required to develop a domain specific modeling
environment, along with its library and automatic code generation support. Industrial
applications of this approach show remarkable improvements in productivity and training
time; some of them speed the software development processes up to ten times faster. The
industrial benefits of such an increase in productivity are clear; with shorter product
lifespan, shortened training time, reduced implementation effort. At the moment such
results are necessary for the expanding VoiceXML. At the initial years of HTML web
page design tools were in strong demand, because everyone wants to create their own
web pages. VoiceXML is also going into such a high demand phase, as the companies
start to understand the value of voice enabling their IT systems for existing services and
for new voice enabled value adding services.
26
Model driven approach for generating VoiceXML applications is not only providing the
user a tool to design his call flows and then to generate a VoiceXML code automatically.
The task is to enable so-called “modeler”, who is an expert of his business problem and
his business domain but who has less knowledge and time for VoiceXML technologies,
to create new telephony solutions and to enable them to create telephony access to his
company services. In this modeling tool the modeled solutions should represent the
concepts and notations of the voice user interface techniques, business requirements, user
requirements, backend systems. Another result of applying a model driven approach is
abstracting the technological details of the underlying components of the application
architecture and providing an interface to configure those services, utilities of the
application server and VoiceXML browser. Expectations and requirements from such a
modeling tool can be formulated by the following points:
• The end-user of this modeling tool should be strongly enabled to apply the voice
user interface techniques, and the overall approach should not obstacle the
application of voice user interface techniques of an expert, on the contrary it
should guide him in application of these techniques.
• This tool for model driven approach should enable the end user for easy use of
the facilities that is provided by a VoiceXML document server or by a
VoiceXML Browser. More importantly the domain engineer should be able to
design his solution in a more easy way than before by using these facilities of the
servers.
• The graphical language’s items and rules should be abstracted and classified out
from several business domains for the telephony speech application solutions. It
should enable the visualization of overall application architecture, dialogs and
call flow, as well as the representation of business domain’s items.
• The tool, in which the solution designer will work on, should be intuitive to use
as much as possible and should follow graphical user interface techniques.
• The domain modeling tool should provide the best practice dialog components,
grammars, prompts in a library format. This library should be highly extendible
and improvable. This will facilitate the efforts for inexperienced designers and
will provide a way of sharing between designers.
• Such a tool should also enable a good and tidy code generation from the model to
the target code by mapping the meta-models. It should decrease the amount of
manual programming, but still it should provide to see and modify, extend the
code when necessary.
• The modeling environment should enable the “modeler” to debug, test and
deploy his solution.
27
• The tool should provide the utility for the configuration of both the VoiceXML
application server and the VoiceXML Browser.
• The tool itself should be based on open standards and should also be expandable
for more capabilities. The tool should also ease the exchange of modeled
solutions and the implementations between the users of the tool. This will
improve the communication of solutions between the users and will lead to
population of the library concept.
This telephony applications modeling tool should enable a faster, easier solution design.
It should result to a shorter training time, an increased productivity, and a high quality
solution and re-using previous design solutions. In addition to speeding and automating
the development, the tool should result to an understanding of the modeled design, should
also support communication and documentation of solution.
2 Model Driven Software Development
At this chapter the model driven software development (MDSD) is described in more
detail. At the first section the general idea and overview of MDSD is presented, then at
the second section the main steps of building MDSD architecture for a specific domain is
explained. With in the third section the extension mechanism for Unified Modeling
Language is described, this section gives also a brief introduction about UML, UML
diagrams, UML meta-models and how to extend the meta-models in order to fulfill the
missing design elements for specific domains. At the final section the functional parts of
a MDSD tool and the co-work between them is described.
2.1 Concept
At software development modeling is widely used to abstract and to present the
implementation decisions. By the wide use of Unified Modeling Language (UML), this
process got standardized and provided the common base to share the implementation
ideas between software developers. When UML is used for defining the implementation
concepts like the classes and functions there is only high level conceptual connection
between the model of the application and its source code. But by Model Driven Software
Development UML models do not serve only for documentation and communication of
implementation ideas; they serve for a stronger connection between the application model
and the application source code.
Model Driven Software Development (MDSD) can be compared to car design and
manufacturing process at automobile industry, in which the engineers design the car
models with specific Computer Aided Design (CAD) tools and let the robots finish the
cars through the production band. Such an approach was invented to enable production
automation and high productivity rate. Similarly this automation process can also be
applied to software development by analyzing a certain domain and then defining a
domain specific modeling language that will be used in the design process. After the
domain analyzing process and after having the domain specific language sample
applications can be modeled for that domain by a UML tool, and then by MDSD tools
these models can be transformed into code. Through this a higher automation potential
for finishing the applications and relatively higher productivity can be achieved. With
once defined domain modeling language and example application source codes and by
the use of a MDSD tool the designed application models can be transformed
automatically into application source code, so that a sense of software production band
can be build, which may lead to a higher reuse and software quality.
Analyzing a domain starts by examining the already available example applications and
the source codes of the applications. As a rule of thumb the more modular and the more
structural defined the source codes, it is much easier to figure out and categorize the
modeling elements for its domain. But with a bird’s eye view to the example application
29
source code the identification of actual software structure is difficult, because the
abstraction level of programming languages is at low level. This can be overcome by
reverse engineering (reverse engineering is the process of analyzing a subject system to
create representations of the system at a higher level of abstraction), which is provided by
most UML tools. Through this process the visualization of the source code at the UML
syntax is possible and this can provide additional clues for designing that domain’s
modeling language. The models designed through this domain’s modeling language will
not only serve for documentation but will also become an asset for the software and
serves as an acceleration and quality factor for the software development.
Table 1 shows the basic principles for developing the software by means of MDSD. As
the very first step the domain and the reference applications should be analyzed and
should be grouped into 3 categories: “General Code”, “Schematically repeating code”,
“Manually repeating code”. The “General Code” part of the reference application is the
code that doesn’t change and is same for all applications. It provides the basis for the
applications to work on; this part of the application code is named as framework or
platform. The “Schematically repeating code” is the code which is not exactly same
through the reference applications but has a similar structure. Finally there is the
“Manually repeating code”, which is specific for each reference application and must be
implemented separately. After categorizing the reference applications into 3 sub sections
as described, the second step is to construct a domain specific modeling language for the
schematically repeating code parts. This modeling language, which is an extension to the
existing UML language, will serve later for modeling the domain applications. Then the
models, which are designed by domain’s modeling language, are transformed to the
source code that represents the schematically repeating code category of the reference
applications.
30
Figure 16 Main idea of Model Driven Software Development [37]
2.2 Developing a MDSD Process
In order to apply Model Driven Software Development process at a specific domain for
improving software development, it is necessary to develop first a “Generative
Architecture”. In order to build this base the domain and its reference applications should
be examined and the code should be categorized into 3 categories like “Manually written
code”, “Schematically repeating code” and “General code”. The “General code” category
of the reference application is the platform where applications are built on. This can be a
web server, a J2EE platform or a .Net platform. The “Schematically repeating code”
category of the reference applications is the code that has similar structure and appears as
copy-paste code. This can be figured out by comparing the code parts that builds the main
structure through the applications. Defining the
“Schematically repeating code” carries a high importance for building successful and
productive “Generative Architecture”. It can be formulated as the more schematic code is
extracted the more flexible and effective the MDSD process. The “Manually written
code” category consists of the code part except the “General code” and “Schematically
repeating code” and is specific for each application. This code category is not aimed to be
generated through the transformation from the UML models, and it also serves as glue-
code between the generated code and platform code.
These schematically repeating code parts of the reference applications, the application
domain and already available UML models give hints for defining the domain specific
31
modeling language that will be used at modeling the applications. The MDSD tools
support many design formats both in textual and graphical mode; among them the
Unified Modeling Language (UML) is the industry wide accepted. UML is intended to be
used with all development, application domains and has also extensibility mechanism
through UML-Profile that enables the addition of domain specific design elements (called
meta-models). Through this it is possible to design the modeling language that serves for
modeling an application domain by extending the UML with stereotypes, tagged values
and constraints. A meta-model defines the construction elements that will be used in a
concrete model. Basically a model is instance of meta-models. The UML meta-model
consists of elements like Class, Operation, Attribute, Association, etc. UML extension
mechanism and meta-model concept is described at section 2.3 in more detail. To sum up
the schematically repeating code parts serve as the base to define the UML Profile that
will be used for modeling the domain applications. For example at a website application a
user registration code that is used often may look like as below:
Table 1 “Schematically repeating code” part of a website application [37]
Such a code section of the applications also serves as code that may be used at the
templates for the generation of source code through transformation of UML models to
source code. These code sections are represented with a UML meta-model element at the
domain specific design language and they serve as “Lego brick” at the design of the
applications in a UML tool. The templates serve as cartridges that will be used by the
generator to transform the meta-model, which are used at the models, to the application
source code. At the “Generative Architecture” the platform in which the applications will
run is already defined and this affects the templates implementation language that are
32
going to be written in the template but due to having a higher abstraction from the source
code to UML meta-model, platform doesn’t affect the UML meta-models that represent
this schematically representing code sections. Therefore an independency is achieved
between the application model and the implementation language that will be used. This
way it is possible to write templates at any language depending on the platform that the
applications are planed to run on. When the platform is supposed to be changed using the
corresponding templates of that new platform will also make the already designed
applications transformed to that platforms source code. So the source code at Table 2
serves as a Java code for the template that is represented by the UML meta-model at
Table 1below. This template code may be replaced with a c# implementation in order to
be used at a .Net platform. This independency between the UML representation of a
template that is generated from the “schematically repeating code” and the source code
implementation of that UML meta-model according to the platform makes one of the
important gains of MDSD.
Table 2 UML meta-model representation of “Schematically repeating code” part [37]
Above the connection between the reference applications and building the “Generative
Architecture” for that application domain is described. The importance of analyzing the
application domain and the reference codes to define a UML Profile, which will be used
at modeling the applications, is emphasized. Finally the use of schematically repeating
code part as a template and representation of it as a UML element is exampled.
After defining the “Generative Architecture” now it is possible to model the applications
of the domain by using the specified UML profile. At this point the applications are
modeled as usual by a UML tool with these new added UML meta-model elements.
Developer today exposed to very complex software infrastructures like application server,
database, open-source-frameworks, protocols, and interface technologies, and should
work wisely through these connected parts to form robust, maintainable software. With
this increasing complexity the software architecture gained more meaning and
importance, therefore worked on more. MDSD aims to improve the development
efficiency just like improving the software quality and reusability. This is especially done
by means of freeing the developer from the error prone routine development work. If the
source code in the reference applications is more schematic then the building generative
architecture is much easier. So the software architecture has an important role for the
33
software developer since it can be used as a blue-print through the implementation. Also,
the more the software architecture is worked on from the source code point of view; the
more schematic is the application programming. Schematic programming means a bigger
part of copy & paste programming with modification of the context. But this part of work
is mostly not a much an intellectual work. This error prone copy/paste/modify work can
be done through a generator. All these things lead to the generative software architecture.
It is necessary to provide a specific model of the application as an input and accordingly
the source code of the application will be produced as an output.
Transformation is an important concept of MDSD; it should be defined flexibly and
formally according to the specified profile. This is a condition to have the automation of
transforming the model to the code through a generator. Although it is possible to have a
transformation between two meta-models, most of the MDSD tools generates the source
code form the meta-model by using templates. The generators enable this through
transformation languages. They are scripting languages to define code generation
templates.
MDSD enables to improve the development efficiency just like improving the software
quality and reusability. This means especially to free the developer from the error prone
routine development work. Developer today exposed to very complex software
infrastructures like application server, database, open-source-frameworks, protocols, and
interface technologies, and should work wisely through these connected parts to form
robust, maintainable software. Therefore, with the increasing complexity, software
architecture gained more meaning and importance.
As a rule it is necessary to apply application domain specific UML-Profils. It also
enforces formal application design. The model to code transformation is done through
generator-templates and typically processed by generator framework, so that the
infrastructure code can be produced automatically from the architecture oriented design
model. The matter of fact is that the model should already have relevant information to
generate the infrastructure code.
At Figure 17, these concepts are depicted with how the overall MDSD process should be
carried out. First it is necessary to define the “Generative Architecture” for the
application domain. This step includes first analyzing the application domain and the
reference code to define a UML Profil that will be used to model the applications for a
specific domain. Then the templates should be written which will be used to generate the
source code of the applications by the generator through transforming the meta-models
used in the models to code. This step is mutually related with defining the domain
specific language, where the available schematic code parts can define the modeling
elements, some modeling elements may also represent a necessity for having specific
code templates. Since the application code will run on a specific platform, the templates
also should be suitable for the platform; for example for a J2EE platform the templates
should be written in Java, or for a Web Server, the templates should be written in HTML.
After having the “Generative Architecture” defined the applications can be modeled by
34
the domain specific UML Profil, then these models are read by the generator and
according to the model necessary templates are fetched from the available templates.
After having the code from the generator, necessary manual implementations are done for
the specific application so that it runs on the platform, for which the application is
designed.
Figure 17 Main Steps at developing MDSD process [37]
2.3 UML Extension Mechanism
The Unified Modeling Language (UML) is a general-purpose visual modeling language
that is used to specify, visualize, construct, and document the artifacts of a software
system. It captures decisions and understanding about systems that must be constructed.
It is used to understand, design, browse, configure, maintain and control information
about such systems. It is intended to be used with all development methods and
application domains. UML includes semantic concepts, notation and guidelines. It has
static, dynamic and organizational parts. It is also intended to be supported by interactive
visual modeling tools that have code generators.
The UML captures information about the static structure and dynamic behavior of a
system. The static structure defines the kinds of classes important to a system and to its
implementation, as well as the relationships among the classes. The dynamic behavior
defines the history of objects over the time and the communications among classes to
accomplish goals. Modeling a system from several separate but related viewpoints
35
permits it to be understood for different purposes.
The UML also contains organizational constructs for arranging models into packages that
permit software teams to partition large systems into workable pieces, to understand and
control dependencies among the packages, and to manage the versioning of model units
in a complex development environment. It contains constructs for representing
implementation decisions and for organizing run-time elements into components. UML is
not a programming language but a design language. Tools can provide code generators
that transform UML into programming languages, as well as visualization of the existing
application source code in UML.[39].
As a modeling language, it focuses on formulation of a software solution by modeling.
The model embodies the software solution regarding the subject, and the appropriate
application of this knowledge constitutes intelligence. As it applies to specifying systems,
it can be used to communicate "what" is required of a system, and "how" a system may
be realized or implemented. As it applies to visualizing systems, it can be used to visually
depict a system before it is realized. As it applies to constructing systems, it can be used
to guide the realization of a system similar to a "blueprint".
The UML is not:
A visual programming language, but a visual modeling language.
A tool or repository specification, but a modeling language specification.
A process, but enables processes.
As a general-purpose modeling language, it focuses on a set of concepts for acquiring,
sharing, and utilizing knowledge coupled with extensibility mechanisms.
To understand the architecture of the UML, consider how computer programs and
programming languages are related. There are many different programming languages
and each particular program is developed using a specific programming language. All of
these languages support various declarative constructs for declaring data, and procedural
constructs for defining the logic that manipulates data. Because a model is an abstraction,
each of these concepts may be captured in set of related models. Programming language
concepts are defined in a model called a meta-model. Each particular programming
language is defined in a model that utilizes and specializes the concepts within the meta-
model. Each program implemented in a programming language may be defined in a
model called a user model that utilizes and instantiates the concepts within the model of
the appropriate language. This scheme of a meta-model representing computer
programming constructs, models representing computer programming languages, and
user models representing computer programs exemplifies the architecture of the UML.
36
The UML is defined within a conceptual framework for modeling that consists of the
following four distinct layers or levels of abstraction like in Figure 18 4 Meta Levels of
UML [38]:
The meta-metamodel layer consists of the most basic elements on which the UML is
based. This level of abstraction is used to formalize the notion of a concept and define
a language for specifying meta-models.
The meta-model layer consists of those elements that constitute the UML, including
concepts from the object oriented and component oriented paradigms. Each concept
within this level is an instance (via stereotyping) of the meta-metamodel concept.
This level of abstraction is used to formalize paradigm concepts and define a
language for specifying models.
The model layer consists of UML models. This is the level at which modeling of
problems, solutions, or systems occur. Each concept within this level is an instance
(via stereotyping) of a concept within the meta-model layer. This level of abstraction
is used to formalize concepts and define a language for communicating expressions
regarding a certain subject. Models in this layer are often called class or type models.
The user model layer consists of those elements that exemplify UML models. Each
concept within this level is an instance (via classifying) of a concept within the model
layer and an instance (via stereotyping) of a concept within the meta-model layer.
This level of abstraction is used to formalize specific expressions regarding a give
subject. Models in this layer are often called object or instance models.
Figure 18 4 Meta Levels of UML
37
• Diagrams
Diagrams depict knowledge in a communicable form. The UML provides the
following diagrams, organized around architectural views, regarding models of
problems and solutions:
_ Use case diagrams depict the functionality of a system.
_ Class diagrams depict the static structure of a system.
_ Sequence diagrams depict an interaction among elements of a system
organized in time sequence.
_ Collaboration diagrams depict an interaction among elements of a system
and their relationship organized in time and space.
_ State diagrams depict the status conditions and responses of elements of a
system.
_ Activity diagrams depict the activities of elements of a system.
_ Component diagrams depict the organization of elements realizing a system.
_ Deployment diagrams depict the configuration of environment elements and
the mapping of elements realizing a system onto them.
• Extensibility Mechanism
Extension mechanisms allow you to customize and extend the UML for
particular application domains and result in a new UML dialect. The extensibility
mechanisms are constraints, tagged values and stereotypes:
Figure 19 UML Extension Mechanism
• Stereotypes
A stereotype is a kind of model element defined in the model itself. The
information content and form of stereotype are the same as those of an existing
kind of base model element, but its meaning and usage is different. A stereotype
is based on an existing model element. By using stereotypes it is possible to
introduce new meta-model elements to existing UML meta-model elements.
Stereotypes are shown by guillemots (<<,>>) It is also possible to create an icon
for a particular stereotype to replace a base element symbol.
38
• Constraints
Constraints are for specifying semantics or conditions that must be maintained as
true for model elements. Each constraint expression has an implicit interpretation
language, which may be a formal mathematical notation, a computer based
constraint language, or informal natural language. When the constraint language
is informal its interpretation is also informal and must be done by human.
Constraints are shown as expressions strings enclosed in braces.
• Tagged value
A tagged value is a pair of strings, a tag string and a value string that stores a
piece of information about an element. A tagged value may be attached to any
individual element in UML and can be used to store arbitrary information about
elements. Tagged values also provide a way to attach implementation dependent
information to elements. For example, a code generator needs additional
information about the kind of code to generate from a model, for such a purpose
certain tags can be used as flags to tell the code generator which implementation
to use. Tagged values are shown as strings with the tag name, an equal sign, and
the value. They are normally placed in lists inside braces
2.4 Functional Parts of a MDSD tool
In Figure 20 below the functional parts of a MDSD tool are depicted in more detail. This
MDSD generator framework is an open-source project and other MDSD tools have also
similar architecture. At the “Generative Architecture” part, it is necessary to design the
UML Profile which will serve as the modelling language for the application domain; this
is done by using available UML meta-model elements and by extending the relative UML
meta-model elements to represent the application domain at the modelling phase. After
designing the domain representing UML Profile, it is necessary to implement the
corresponding UML Profile meta-models in a programming language like Java in order
to access the properties of those model elements like Classes, Associations, Attributes,
and new extended UML meta-model elements. Within the openArchitectureWare tool
core UML diagram’s meta-models like Class Diagrams and State Diagrams are already
implemented. The platform that will host the applications, not directly effects the UML
Profile and the meta-model implementations since they are abstracted from the reference
codes to define the modelling elements that are going to be used for the application
domain. But the Platform, where the application will run, affects the code that should be
generated from the models that are designed. At this point there is a separation between
the application source code and the application design so that models can be designed for
a specific business domain and the code that is going to be generated can vary according
to the platform, where the application is desired to run. This is one of the important gains
of applying model driven software development, since this enables the platform
39
independent modelling where the models remain as assets for any platform. For each
platform corresponding source code templates should be written, for example in order to
transfer the model to a source code application that will run at a J2EE platform the
templates that are written in Java can be used, alternatively for a .Net platform the
templates written at C# can be used.
Templates enable the platform specific code generation. As described above the main
idea of MDSD development is to generate the schematically repeating code part of the
reference applications. These parts of the application code are copy-pasted to the
templates and combined with the properties read from meta-models to generate code
from models (once they are represented as instances of the meta-classes). The
openArchitectureWare templates are written in a special language (called xPand) that is
easy to learn and very well suited for writing templates (and nothing else).
After defining the design language for the application domain at the “Generative
Architecture” part, a UML tool can be use to model the applications of the specific
domain. For this process the already designed UML Profile can be imported into the
UML toolkit in order to design the applications. The application models that are designed
by a UML tool are exported as XMI-representation. XMI is a standard file format to
save/load and exchange UML models among the UML tools.
At the “Generator Framework”, first the models are read form the XMI-representation
and then each meta-model which is read from the UML model is mapped to the
corresponding meta-model implementation, which is defined at “Generative
Architecture”. The “Instantiator” normally maps each standard UML meta-model to the
already available meta-model implementations, which are provided by the
openArchitectureWare tool. For the new introduced meta-model elements, it is possible
to define their mappings between their meta-model implementation and their XMI-
representation. After the “Instantiator” reads the UML design, it instantiates the
corresponding meta-model implementations and then by the “Generator Backend” the
templates are fetched for each of the meta-models and transformed to source.
At the “Application” side, the application logic is added to the generated code by manual
coding, and then the application is ready to run at the selected Platform. The UML design
of the application is also stored as XMI-representation at the application side.
40
Figure 20 Functional parts of openArchitectureWare generator framework [37]
A more detailed model to code transformation is depicted at figure below: The generator
first reads the XMI-representation of the UML model and then this XML output is parsed
by the “Parser”. Afterwards the “meta-model Instantiator” instantiates the corresponding
meta-model class implementations and then the code generator fetches the templates for
each meta-model element. Finally these meta-model elements are transformed to code.
Figure 21 Functional parts of openArchitectureWare generator
3 Implementation
For the implementation as a UML tool, Poseidon for UML Community Edition is used
and as a model driven software development tool openArchitectureWare is used. To
validate and modify the transformed code an XML editor called oXygen is used and as a
platform web server Apache is used. As described in the second chapter for building
generative software architecture the telephony application domain meta-model is done as
start, dialog, end and service. And in the templates the source code is written in
VoiceXML.
3.1 Meta-Model
For telephony application from the standard UML class meta-model 4 new meta-model is
created these are “Start”, “Dialog”, “End” and “Service”. These meta-models are
designed by using Poseidon UML tool. Figure 22 shows the structure of the class
diagram.
The first “Start” meta-model element is to be used for welcoming the user to the
telephony application: it contains the Welcome prompt that will be played to the user
when the phone is opened, the error prompt that will be played when an error occurs, the
no-match prompt that is going to be played when a spoken input doesn’t match with the
available grammar, the no-input prompt that will be played if the user doesn’t give any
spoken input. It also has the parameter where the maintainer of the application is going to
be written.
The second design element is “Dialog” element; it is simply asking a question to the user
and listening for the answer, then it processes the input and plays back the recognition
result. It has 3 parameters that are to be filled by the modeler; these are “initialPrompt”,
Grammar, Universals: the first one directs a question prompt to the caller, second one is
the grammar that the recognition engine will listen for, and the third one is the universals
that the recognition engine will be listening for. These universals can be Help, Exit or
Operator. The user may fill these parameters by writing “Help: please repeat the last
sentence”, “Exit: thanks for calling us.” Or “Operator: wait I am connecting to the
operator.” There is also one parameter that returns the recognition result from the
recognition engine. This is “recognitionResult” parameter. There are also 5 methods
which are prompt, recognition, error handler, universals and action. These methods are
reading the input parameters that are going to be filled by the modeler and then they
generate the representing codes.
42
Figure 22 Class Diagram
The third design element is “Service” and it is used for representing a backend
connection to get a service from the system. It has 2 parameters these are serviceArg,
serviceName. The serviceArg parameter is taking the parameters that may be sent to the
service. The serviceName parameter is for choosing the service available. Only two
services are available at the web server platform. These are at the moment weather and
time services.
The fourth design element is “End” element; this is for representing the end of the
application and connecting the other design element for finishing. It has one parameter
like closingPrompt: that plays the closing prompt to the user.
43
3.2 Model
Two example scenarios are implemented using the meta-model generated for the
telephony application domain. The first one is a sample pizza ordering application which
starts by starts by welcoming the user to the telephony system and provides a menu where
the pizza type is asked like cheese; pepperoni and all dressed and then asked if the
customer will like to reorder a pizza in such a case; it sends the costumer back to
welcome menu otherwise it sends the user to goodbye dialog and then forwards to exit.
Figure 23 is representing this scenarios call flow pizza shop. The class diagram consists
of nine meta-model elements; namely Start, Welcome, pizzamenu, pepperoni, cheese,
alldressed, neworder, goodbye and exit. The meta-model element Start is derived from
start meta-model element, the Exit is derived from end meta-model element and the rest
of the elements are derived from dialog meta-model element.
The first meta-model element is “Start”, which is used in the answering of the phone.
Meta-model element consists of 4 tags which are namely noinput, nomatch, error,
maintainer. The values of the tags get by the class as an input. The tag noinput is
representing the prompt that will be played to caller when there is no input. The value of
the nomatch tag is played when there exists no match between the caller’s input and the
available grammar. The value of the error tag is used for playing if an error occurs. The
value of maintainer tag consists of an email address, which is used for emailing the
information regarding the application.
The second element is “welcome”, which is used for salutation. It has only one tag, which
is called initialPrompt. And the value of the initial prompt is played to welcome the user
to the pizza shop.
The third element “pizzamenu” that is derived from the dialog meta-model is for
prompting pizza types. It consists of two tags: initialPrompt and grammar. The value of
the tag initailPrompt is played to the customer to give information about the available
pizza types. The tag grammar is for prompting the types of the pizza, therefore the value
of the tag consists of three type which is available in the pizza shop.
The next meta-model elements of the class diagram are the available pizza types. There
are three meta-model elements, which are representing the kinds of pizzas. According to
condition tag value between the “pizzamenu” element and the cheese, pepperoni or all
dressed elements it moves to the next dialog element. According to the user input the next
step is selected. Briefly after the pizzamenu element the flow passes to the next cheese,
pepperoni or all dressed element which satisfies the condition.
Then comes the “neworder” dialog element where the user is prompted whether he would
like to go to the welcome menu again or to the goodbye dialog.
Then comes the “goodbye” dialog element where the user is prompted to call back soon.
There is finally the “exit” meta-model that is used at the end for closing the telephone
call.
44
Figure 23 Pizza Example - Class Diagram
At the second scenario a time service and a weather service is described. With a welcome
dialog first the user is welcomed and then it moves to the next dialog there it asks calls a
weather service and also a time service and finally goes to dialog end element.
3.3 Generated Code
The transformer transforms the UML models to the VoiceXML source code. Figure 24
shows an automatically generated VoiceXML sample code for the pizza shop example.
46
Figure 24 Sample Code
3.4 Development
Manually Written Code
As a basic requirement in every software development, some parts of the source code
might be necessitated to self adjusting modify. In such a case it is also possible to modify
the generated code manually. After the transformer converts the meta-model in to a
source code, the desired modifications can be done manually by using eclipse IDE.
47
XML Validation and XML Well-Formedness Check
A "Well Formed XML" document means that it has a correct XML syntax. Using the
Check XML Form function it is possible to check if the document conforms to the XML
syntax rules [40].
A "Valid" XML document is a "Well Formed" XML document, which also conforms to
the rules of a Document Type Definition (DTD), XML Schema or other type of schema,
which defines the legal elements of an XML document.
The validation of the transformed code is also done automatically by using the Oxygen
XML editor. When creating an XML document, errors can be introduced. When working
with large projects or many files, the probability that the errors will occur is even greater.
Determining that the project is error free can be time consuming and even frustrating. For
this reason <oXygen/> provides functions that enable easy error identification and rapid
error location. [40]
In Figure 25 the selected red part shows the tool bar that is used for validation and well-
formedness check.
Figure 25 Oxygen Editor
48
Platform
The platform can be started automatically as well. Tomcat button on the toolbar is used to
start running the web server. After getting the web server running, the telephony platform
fetches the generated code through the web server. In Figure 26 the red selection shows
the tomcat toolbar.
Figure 26 Platform toolbar
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis
ilke_Master_Thesis

Weitere ähnliche Inhalte

Ähnlich wie ilke_Master_Thesis

Communications... Unified or Not?
Communications... Unified or Not?Communications... Unified or Not?
Communications... Unified or Not?Maurice Duchesne
 
The Pros And Cons Of Long-Distance Communication
The Pros And Cons Of Long-Distance CommunicationThe Pros And Cons Of Long-Distance Communication
The Pros And Cons Of Long-Distance CommunicationSusan Kennedy
 
Communications... Unified or Not?
Communications... Unified or Not?Communications... Unified or Not?
Communications... Unified or Not?Maurice Duchesne
 
Wu8841 diffusion project mannis_k
Wu8841 diffusion project mannis_kWu8841 diffusion project mannis_k
Wu8841 diffusion project mannis_kkmannis
 
如何成为英雄.ppt
如何成为英雄.ppt如何成为英雄.ppt
如何成为英雄.pptwei mingyang
 
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...Francois Pouilloux
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Sandro D'Elia
 
Standarts Battles and Design Dominance
Standarts Battles and Design DominanceStandarts Battles and Design Dominance
Standarts Battles and Design DominanceFadhlan Husaini
 
Mobile phone development ifi
Mobile phone development ifiMobile phone development ifi
Mobile phone development ifiDeepak Bijlwan
 
Building Construction Project Summary
Building Construction Project SummaryBuilding Construction Project Summary
Building Construction Project SummaryMichelle Madero
 
Mini-Project – EECE 365, Spring 2021 You are to read an
Mini-Project – EECE 365, Spring 2021   You are to read an Mini-Project – EECE 365, Spring 2021   You are to read an
Mini-Project – EECE 365, Spring 2021 You are to read an IlonaThornburg83
 
The Future of Broadcasting_ Voice-Activated Technology.pdf
The Future of Broadcasting_ Voice-Activated Technology.pdfThe Future of Broadcasting_ Voice-Activated Technology.pdf
The Future of Broadcasting_ Voice-Activated Technology.pdfOffice24by7
 
Effective Printing Text using Bluetooth Technology from Android Application
Effective Printing Text using Bluetooth Technology from Android ApplicationEffective Printing Text using Bluetooth Technology from Android Application
Effective Printing Text using Bluetooth Technology from Android Applicationijtsrd
 
Technology Infrastructure For The Pervasive Vision, Does It Exist Yet?
Technology Infrastructure For The Pervasive Vision, Does It Exist Yet?Technology Infrastructure For The Pervasive Vision, Does It Exist Yet?
Technology Infrastructure For The Pervasive Vision, Does It Exist Yet?Olivia Moran
 

Ähnlich wie ilke_Master_Thesis (20)

Communications... Unified or Not?
Communications... Unified or Not?Communications... Unified or Not?
Communications... Unified or Not?
 
Bmmp10
Bmmp10Bmmp10
Bmmp10
 
Bmmp10
Bmmp10Bmmp10
Bmmp10
 
The Pros And Cons Of Long-Distance Communication
The Pros And Cons Of Long-Distance CommunicationThe Pros And Cons Of Long-Distance Communication
The Pros And Cons Of Long-Distance Communication
 
Phonologies @ Cluecon
Phonologies @ ClueconPhonologies @ Cluecon
Phonologies @ Cluecon
 
Communications... Unified or Not?
Communications... Unified or Not?Communications... Unified or Not?
Communications... Unified or Not?
 
Wu8841 diffusion project mannis_k
Wu8841 diffusion project mannis_kWu8841 diffusion project mannis_k
Wu8841 diffusion project mannis_k
 
如何成为英雄.ppt
如何成为英雄.ppt如何成为英雄.ppt
如何成为英雄.ppt
 
[EN] "Multilingual Information and Retrieval Systems Technology and Applicati...
[EN] "Multilingual Information and Retrieval Systems Technology and Applicati...[EN] "Multilingual Information and Retrieval Systems Technology and Applicati...
[EN] "Multilingual Information and Retrieval Systems Technology and Applicati...
 
[EN] Multilingual Information and Retrieval Systems, Technology and Applicati...
[EN] Multilingual Information and Retrieval Systems, Technology and Applicati...[EN] Multilingual Information and Retrieval Systems, Technology and Applicati...
[EN] Multilingual Information and Retrieval Systems, Technology and Applicati...
 
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708
 
10.1.1.510.6198
10.1.1.510.619810.1.1.510.6198
10.1.1.510.6198
 
Standarts Battles and Design Dominance
Standarts Battles and Design DominanceStandarts Battles and Design Dominance
Standarts Battles and Design Dominance
 
Mobile phone development ifi
Mobile phone development ifiMobile phone development ifi
Mobile phone development ifi
 
Building Construction Project Summary
Building Construction Project SummaryBuilding Construction Project Summary
Building Construction Project Summary
 
Mini-Project – EECE 365, Spring 2021 You are to read an
Mini-Project – EECE 365, Spring 2021   You are to read an Mini-Project – EECE 365, Spring 2021   You are to read an
Mini-Project – EECE 365, Spring 2021 You are to read an
 
The Future of Broadcasting_ Voice-Activated Technology.pdf
The Future of Broadcasting_ Voice-Activated Technology.pdfThe Future of Broadcasting_ Voice-Activated Technology.pdf
The Future of Broadcasting_ Voice-Activated Technology.pdf
 
Effective Printing Text using Bluetooth Technology from Android Application
Effective Printing Text using Bluetooth Technology from Android ApplicationEffective Printing Text using Bluetooth Technology from Android Application
Effective Printing Text using Bluetooth Technology from Android Application
 
Technology Infrastructure For The Pervasive Vision, Does It Exist Yet?
Technology Infrastructure For The Pervasive Vision, Does It Exist Yet?Technology Infrastructure For The Pervasive Vision, Does It Exist Yet?
Technology Infrastructure For The Pervasive Vision, Does It Exist Yet?
 

ilke_Master_Thesis

  • 1. Master Thesis Model Driven Approach In Telephony Voice Application Development by Ilke Muhtaroglu Matriculation Number: 248229 Aachen, April 10th 2006 Media Informatics The Fraunhofer Institute for Media Communication IMK Prof. Dr. Dr. h.c. Martin Reiser Supervisors: Prof. Dr. Dr. h.c. Martin Reiser (Fraunhofer-Institute for Media Communication) Dr.-Ing. Joachim Köhler (Fraunhofer-Institute for Media Communication) Dipl.-Ing. Wolfgang Schiffer (Cycos AG)
  • 2.
  • 3. I assure that this work has been done solely by me without any further assistance but the official support of the Fraunhofer-Institute for Media Communication (IMK). All the literature used is listed in the bibliography. Aachen, April 10th 2006 (Ilke Muhtaroglu)
  • 4.
  • 5. Acknowledgments As I wrote this last page of the report I would like to thank to all my friends and the professors at the university. As well as my supervisors and colleges at the company. I would like to thank my family also for their support through this work and also my second family in Germany. And for someone’s’ special.
  • 6. Abstract Telephony applications are started to be widely used, with the improved technology it is also possible to access the web content by using telephony systems. Model driven software development is offering an alternative approach to programming, where modeling gets more focused and important. By applying model driven software development approach, application design and implementation are united and this provides more rapid software development and abstraction from the source code. It decreases the programming and new technology learning efforts, so that designers can easily put more values to their applications by focusing on their business domains and business requirements; instead of dealing with many implementation details and requirements. In this report, model driven software development application for telephony domain is examined and related technologies regarding telephony application is described. Finally the implementation regarding the concept is presented.
  • 7.
  • 8. Table of Contents 1 Introduction................................................................................................................. 2 1.1 Motivation........................................................................................................... 2 1.2 Overview of Thesis............................................................................................. 3 1.3 Model Driven Approach ..................................................................................... 4 1.3.1 Assembling the component library........................................................... 10 1.3.2 Developing the domain-specific modeling language................................ 10 1.3.3 Developing the code generator ................................................................. 10 1.4 Telephony Voice Applications Deployment Architecture................................ 11 1.4.1 VoiceXML Platform................................................................................. 11 1.4.2 Speech Resources...................................................................................... 12 1.4.3 VoiceXML Application Side .................................................................... 15 1.5 Voice User Interface ......................................................................................... 18 1.5.1 Human – Machine Interaction through Speech ........................................ 18 1.5.2 Voice User Interface Principles at the Development Phases.................... 20 1.6 Modeling approach for VoiceXML .................................................................. 25 2 Model Driven Software Development...................................................................... 28 2.1 Concept ............................................................................................................. 28 2.2 Developing a MDSD Process ........................................................................... 30 2.3 UML Extension Mechanism............................................................................. 34 2.4 Functional Parts of a MDSD tool...................................................................... 38 3 Implementation ......................................................................................................... 41 3.1 Meta-Model....................................................................................................... 41 3.2 Model................................................................................................................ 43 3.3 Generated Code................................................................................................. 45 3.4 Development..................................................................................................... 46 3.4.1 Manually Written Code................................................................................. 46 3.4.2 XML Validation and XML Well-Formedness Check .................................. 47 3.4.3 Platform......................................................................................................... 48 4 Conclusion ................................................................................................................ 49 Table of Figures..................................................................................................................50 List of Tables......................................................................................................................51 Glossary..............................................................................................................................52 List of Abbreviations... ..................................................................................................... 53 Bibliography.......................................................................................................................54 Appendix............................................................................................................................ 57
  • 9.
  • 10. 1 Introduction 1.1 Motivation The computers and its related disciplines are improving each decade dramatically, while we can not see this fast paces in a year scale. The improving hardware resource of the computers are accelerating and supporting the software that run on them. Software development is also improving with new programming techniques and tools. If we compare the level achieved now with the level in the 70ties, we can easily see the fast and continuing improvement. These technologies also change our lifestyle as well. Computer usage in a company is a must by now, in order to stay “up to date” in the competitive business. Internet usage has a place in everyone’s life. Many companies have already done far in realizing e-business to stay alive and competitive in their business; every company use email now. Companies want to provide access to their e-business infrastructure for their costumers, employees and business partners when and where they need after the enhancements in “mobile technologies”. With the recent advances in speech technologies both in software and hardware aspects voice access to computers also became possible. Today the industry standard telephony voice application programming language VoiceXML, powered with speech technologies, enables 1 billion telephones to access to computers for transactions and services. The old proprietary “touchtone” systems are now replaced with the open standards based, new breed speech technologies and protocols like Voice over IP, Speaker Verification, Speech Recognition, and Text-to-Speech (Speech Synthesis). Through the text VoiceXML term will be used synonym for telephony voice applications. Since Spoken Interaction is the most intuitive way that people use for communication with each other, voice user interfaces appears to be easier and preferable way for interaction with computers. The users of these voice user interfaces don’t need to learn how to use a computer, how to deal with internet browsers and other new tools. This way of interaction is also vital for disabled people who can’t use conventional computer interaction devices (Mouse, Keyboard, Monitors…) in an easy way. For some years Dictation, Command and Control tools are widely available as a supporting interaction channel that leads to multimodality in applications. The natural result of such speech enablement of Computers is more clever computers. Therefore the new users of VoiceXML will tend to use voice applications more easily and each deployed application will have more dependent users [2]. VoiceXML technology is about to boom, and about to bring a new aspect to our daily life and to company’s interaction with their costumers. Although every company has the potential to use VoiceXML applications in their business, at the moment it is not so easy to develop such applications without an understanding of the underlying technologies to some extent. In any deployed VoiceXML application architecture there are so many technologies supporting this realization of turning a telephone talk to executing a transaction at the database. All these technologies (Text-to-Speech Engine, Automated
  • 11. 3 Speech Recognition, VoiceXML Browser, Telephony Cards, Speech Signal Processing Cards…) need their own specializations and understanding by the developers in order to be used. Therefore there are many technology vendors cooperating with the established standards to provide these technologies for the realization of the VoiceXML applications at an enterprise scale. VoiceXML as an XML based language is similar to HTML, but it is executed in a voice browser at the server side contrary to HTML. But they both serve for the same purpose in sense of functionality; while VoiceXML is enablement of access to Web by speech, HTML is the visualization of Web. The Web sites of today are much more advanced and dynamic then those years of first static informative web pages. Many web technologies like web browsers, scripting languages and web servers are improved by the time; VoiceXML 2.0 is also in such a way, together with his supporting technologies for telephony voice applications, at the time being. While at the client side there is only necessity for a telephone, at the server side there are still many issues that need to be improved. One of them is to ease the programming of the VoiceXML applications and to shorten the implementation time in the VoiceXML software lifecycle. By applying a modeling driven development approach for VoiceXML applications, the project teams can minimize their efforts for underlying technologies and source code implementation. By modeling they can focus on their business goals, end-users, application of voice user interface techniques and more. Modeling will also make the applications independent of the implementation platform and the implementation language. The applications will be more understandable, because of increasing the level of abstraction from implementation language to the models that represent the implementation concepts. By this approach, refining and modifying the application will also be easier than understanding and changing the code. 1.2 Overview of Thesis The first chapter of the thesis gives an introduction to the Model Driven Software Development and the Telephony Voice Applications through problem description. At the second chapter Model Driven Software Development concept is described in more detail. The third chapter describes the implementation and evaluation of this tool by means of two scenarios. The forth chapter provides a conclusion about the work.
  • 12. 4 1.3 Model Driven Approach Software development is expensive: “Companies and governments in USA spend more than $250 billion each year on information technology application development of approximately 175,000 projects. Over 31% (54,250) of these projects will be cancelled before they ever get completed in large companies, and only 9% of projects will come in on time and on budget. While more than half of them will cost nearly twice their original estimates.” [1]. But on the other hand demand for software is rising in developed countries’ economies, in scientific areas as well as in our daily life. These facts bring necessities and efforts on making software development more efficient in the sense of both development costs and usability. Therefore software development progress’ itself is under research and standardization in order to decrease these cancellation risks, in order to make the development phase and the product more valuable. As in depicted in Figure 1, a software development lifecycle includes many iterative activities like [30]: • Project definition • Requirements analysis • Risk management plan • General design • Architectural Design • Feasibility study • System requirements • Software requirements • Detailed system design • Construction • System implementation • Quality assurance and testing • Deployment • System maintenance
  • 13. 5 Figure 1 Software Life Cycle [30] In a Software Life Cycle all steps are important, but the initial Analysis and Design Steps have a higher importance for the success and the future of the product. But the initial steps before implementation are mostly considered as straightforward and self evident, therefore appointing less time and effort compared to implementation phase in a mistaken way. This wrong approach mostly results in a waste of project resources at the long term. On the other hand Software implementation step takes relatively longer time than the other steps, hence forcing the project team to pace through the initial steps before implementation in a fast and furious way. This is especially the case when there is a quick deployment constraint due to business goals of the project. Figure 2 shows how the three main software activity efforts (Requirement analysis, Implementation, Testing) change with time [31]. In this Figure we can see how implementation takes half of the efforts in a project starting with prototyping and finishing with a recursive testing-implementation cycle.
  • 14. 6 Figure 2 Software development: effort vs. activity [31] At the moment the implementation process is supported with many technologies of software development like third generation programming languages (3GL), software development kits (SDK), software reuse, object oriented programming (OOP), components and frameworks. But these technologies were not available at the 1950’s; by the time software practitioners, industrial experts and academicians have raised the abstraction of programming languages and developed new techniques to increase the level of reuse in software construction. At the beginning there were only electrical machines that were controlled with wires, and then assemblers came to take care of generating sequences of ones and zeroes for those improving electronic machines. From these abstractions then the first programming languages such as FORTRAN were born, by which the formula calculation became a reality. Then C enabled portability among hardware platforms. This pioneering brought the new techniques for structuring so that it was easy to write, to understand and to maintain. Now we have Java, C++ with an object oriented approach for structuring data and behavior together into classes. In this improvement from a lower level of abstraction to the next, at each level the developers invented new techniques to increase the easiness of their tasks and developed new concepts which lead to the existence of the next abstraction level. And for each abstraction a new tool was developed to map this abstraction to the previous level of abstraction. Assemblers were developed to map the code to “ones and zeros”, then the preprocessors, and then the compilers to support all these improving techniques and concepts as depicted in Figure 3. In this pattern while the practitioners use a new abstraction level, they also formalize the knowledge and set of methods for its use. Then from this formalization and practices a new higher level language is born and also a tool that maps it to the lower level. At the time being these formalization efforts for 3rd Generation Languages born the “Modeling Approach” for software development, in
  • 15. 7 industry we can see this transition with the high use of UML and its standardization [1, 7]. Figure 3 Software development: effort vs. activity [31] The development of the “Re-use” technique followed a similar way like abstraction of programming languages. First the function phenomena in mathematics applied to eliminate rewriting effort of code for similar contexts. Use of functions also provided the desired effect against the limited memory available. Then grouping the functions and the data brought the “Object” approach. With the increasing trend of reuse, the components and the frameworks appeared. By them the common services and necessary functionalities like security, network connections are provided for today’s applications. This increasing granularity of reuse now enabled focusing more at the designing and modeling the application for the project requirements, as well as new opportunities to decrease the efforts for implementation [1].
  • 16. 8 Figure 4 Raising the level of Re-use [1] Modeling raises the level of abstraction and hides today's programming languages, in the same way that today's programming languages hide assembler. Symbols in a domain- specific model map the domain elements, for which the application is designed. This can be an abstraction higher than UML. Then the application is automatically generated from these designs with domain-specific code generators that use the existing component’s codes. For many years it is recognized that there is a vital difference between the application’s domain and its code. On the other hand only UML alone does not solve the gap between the domain solution and code; it just brought standardization for visualizing and representing the code. It does what a blueprint does for an architect to design the architecture of a building, and for helping the communication between an architect and the builders of the house. In Model Driven Approach or Architecture (MDA) the models are made form the elements that represent the concepts from domain world, not the code world. This allows the modeler to work in the domain world with the domains concepts, where he can focus on solving the problem as he perceived it in his profession. Then this domain solution can be automatically transformed to the solution of the code world. This automation is possible because both the design language and the generators need to fit the requirements of one domain. One expert developer defines a domain-specific language containing the chosen domain's concepts and rules, and specifies the mapping from that to code in a domain-specific code generator. Then the other developers can make models with the modeling language and the code will be automatically generated. Since the experts have specified the code generators, they produce products with better quality than could be achieved by normal developers by hand. In Figure 5, we can see the traditional methods for achieving the finished code from domain idea and also how MDA solutions map domain idea to finished code [5].
  • 17. 9 Figure 5 From Domain Idea to Finished Product [5] According to Domain Specific Modeling (DSM) three things are necessary for modeling with automatic code generation: a domain-specific modeling language and editor, a domain-specific code generator, and a domain-specific component library [5]. In order to achieve this there is a necessity for an expert that has an understanding and experience in this domain with knowledge about the architectures and component libraries of the domain. Figure 6 Main Steps for Domain Specific Modeling [5]
  • 18. 10 1.3.1 Assembling the component library A domain-specific component library is not always necessary, but it makes the task of the developer, or the so-called “modeler”, significantly easier. Often, code components already exist from earlier development cycles, at least in the form of reusable pieces of code. Further development of these pieces of code into true components is a relatively easy task for the expert, requiring only the normal developer programming tools. In addition to domain-specific components developed in-house, the library can of course contain generic third-party components. Component library is nothing new; it is what the developers often use as a part of their development processes. 1.3.2 Developing the domain-specific modeling language A harder task is the development of the domain-specific modeling language; the concepts and notations that will be used by the “modeler” to build his model. The experience and the intuition of the expert, combined with hints from the component library, domain rules and architects are the real sources of clues. The expert should also consider the issue of code generation, which will be his next task: a good mapping from the domain meta- model to the combinations of components will make that future task much easier. Already available meta-modeling languages can be used here to describe and to “meta- modelize” both the domain and their rules. In the creation of a domain specific modeling language, a toolset that utilize this process is vital. A toolset that allows rapid prototyping is practically a necessity: the expert can make a part of the meta-model as a prototype, and instantly test it by making an example model. Similarly such toolsets can help and guide the expert in his tasks, and then the expert can concentrate on the hard problems, while the toolset create the menus, toolbars, icons and behavior automatically. 1.3.3 Developing the code generator The code generation definition forms the final task. In this phase the expert specifies how the code can be automatically generated from the structures and the components in the users' models. The modeling toolset should provide the necessary functionality for creating such generation scripts, and should guide the expert wherever possible by allowing him to reference and use the concepts in the meta-model. Of all the phases, code generation phase probably varies the most between domains. In some domains it will be possible to produce a large fraction of code with a simple code generation scripting language, which is already provided in most modeling toolsets. In some other domains, it may be necessary to use a relatively more powerful scripting language to operate on the model data exported from the modeling editor. The most important goal is that the “end modeler” should be able to use the code generation in a simple way.
  • 19. 11 1.4 Telephony Voice Applications Deployment Architecture Contained in the VoiceXML title there is an encapsulation of many hardware and software technologies. At the user side there is a necessity for a telephone; it can be any phone that has a microphone and a speaker like a handy, an old conventional telephone or a “softphone” that runs in a computer and works with VoIP. The telephony network that connects the telephone to the VoiceXML browser platform involve many telephony network routers, switches and several signal transportation mediums and techniques like wireless medium, VoIP, fiber optic lines, conventional telephone boxes (PSTN), “softswitches”… But these telephony elements can be ignored for the sake of simplicity, unless they create unexpected effects like latency in communication that will effect the voice user interface and recursively other decisions in the a VoiceXML application. At the left side of the Figure 7, these several kinds of telephone network elements and telephone devices that connect to VoiceXML platform are represented. 1.4.1 VoiceXML Platform VoiceXML platform is one of the most important elements in a VoiceXML application deployment. It enables the processing and bridging between the application and the telephone networks. It acts like an internet browser for the end-user’s telephone. A VoiceXML platform has 3 main units: • VoiceXML interpreter (VoiceXML Browser) • Telephony Gateway • Configuration and Management Unit Externally, Automated Speech Recognition (ASR) and Text-to-Speech Engine (TTS) resources are serving to the platform for speech recognition and speech synthesis tasks. Figure 7 VoiceXML Deployment Architecture [11]
  • 20. 12 The telephony gateway unit enables the connection to various telephony networks via telecom cards and provides necessary telephony software interfaces for controlling and using them. It basically does the communication between the telephone network and VoiceXML Browser. VoiceXML Browser is the core part of the VoiceXML platform. In the VoiceXML Browser the VoiceXML documents are fetched, validated and then interpreted with use of TTS and ASR resources. It is the triggering point for the actions in the platform. The VoiceXML interpreter architecture of HP’s Open Call Platform is presented in Figure 8. The units showed in this figure basically do the following: Interpreter Context unit handles the incoming calls and associates them with VoiceXML pages to be interpreted. Network Fetcher fetches the VoiceXML, grammar and audio files with HTTP connections. VoiceXML processor executes the VoiceXML tags, attributes and the commands in the document itself. Configuration and Statistic manager units enable the configuration and management utility for the server; through these two units each VoiceXML application can be monitored and configured, also the data for reporting is generated here. The Document parser validates and parses the VoiceXML documents from syntactical point of view. The Cache Manager Unit is, as its name implies, responsible for caching of audio files, documents and other raw data. The Device Controller has important functionalities like handling call controlling and interactions with media resources like ASR/TTS servers [11]. Figure 8 Architecture of a VoiceXML interpreter [11] Configuration and Management plays an important role for testing and improving the deployed VoiceXML applications. Web based configuration and management interfaces are widely available, but java based “Rich Client” applications are also used for enhanced functionalities like real time call monitoring, and for further tuning and statistical reports. In general these utilities are important in the VoiceXML application lifecycle. 1.4.2 Speech Resources Automated Speech Recognition (ASR) and Text-to-Speech Engines (TTS) enable the speech processing capability of the whole VoiceXML platform. ASR and TTS technologies are result of the cooperative work of many professions, and a currently
  • 21. 13 improving technology. The ASR and TTS engines are dependent on the language characteristics of the countries. Therefore most of the VoiceXML platforms are not designed to include their own ASR and TTS Engines, but designed to cooperate with 3rd party Speech Resources through vendor-independent interface standards. Media Resource Control Protocol (MRCP) is such an application layer protocol that provides this necessary vendor-independent interface between Speech Media Servers and Speech Application Platforms. The major speech processing resources are [36]: • Basic synthesizer — Speech synthesizer resource with very limited capabilities that play out concatenated audio file clips. • Speech synthesizer — Full capability speech synthesizer that produces human- like speech using the Speech Synthesis Markup Language specifications. • Recorder — The resource with end-pointing capabilities for detecting the beginning and ending of spoken speech and saving it to an URI. • DTMF recognizer — The DTMF-only recognizer detects the telephone touchtone and reacts accordingly. • Speech recognizer — Full speech recognizer that converts spoken speech to text and interprets the results based on semantic tags in the grammar. • Speaker verification — Authenticate a voice as belonging to a person by matching the voice to one or more saved voiceprints. MRCP mainly enables voice resource provider independency and the independency from the location where they reside. MRCP as a client-server protocol uses Session Initiation Protocol (SIP) and Real Time Session Protocol (RTSP) to make the Speech Resources Servers an independent node in distributed networks. By providing a standard communication protocol between the client (VoiceXML platform) and speech servers, MRCP abstracts and encapsulates the underlying rapid enhancing speech technologies and vendor dependent architecture and implementation of them. MRCP is not only to be used in VoiceXML platforms, but for any place with a MRCP client. Figure 9 Media Resource Control Protocol Architecture [36]
  • 22. 14 Speech Synthesizing or Text–to-Speech process has several steps to create a speech as depicted in Figure 10. Speech Synthesis Markup Language (SSML) enables non- procedural control for each of these stages to create a more natural speech. The first step structure analysis parses the text to be synthesized from syntactical point of view, like detecting the paragraphs, spaces and comas. Text normalization is the process where the special symbols like dollar sign “$”, or Shortcut for Doctor “Dr.” are detected and interpreted to find out how they should be spoken. After detecting the set of words to be spoken, TTS engine decides pronunciation for each word. The TTS engine’s pronunciation can be controlled by creating the pronunciation for that word with the use of IPA (International Pronunciation Alphabet). Then the prosody analysis is done to decide the pitch, rhythm and rate of speaking. By using the SSML tags at this level the speed of the generated speech can be controlled. Waveform production is the final unit where the audio waveform output is generated from the phonemes and prosodic information. Currently as an alternative to machine synthesized metallic voices, there are more natural sounding voices with concatenating very small human sound pieces, which all together form the sound of a word. These small sound units (triphones, diphones) are extracted from the speech of talented “transcriptioners” in the laboratories. Figure 10 Speech Synthesizing Process [9] Speech Recognition, contrary to Text-to-Speech Engine, takes the speech as input and creates the recognized text. At Figure 11 we see that first step in speech recognition is detecting the beginning and the end of speech. It listens for the silence to extract the spoken utterance of the user. Then forwards this waveform to the Feature Extraction unit; this unit transforms the end-pointed utterance into a sequence of feature vectors. Feature vector is representing the useful measurable characteristics of the speech. Then this feature vector is used as one of the parameters of the Recognizer. The Recognizer uses Dictionary, Grammar and Acoustic model to find the best-matching result of the grammar. In this figure Dictionary is the list of certain words and their pronunciations, which are introduced to ASR engine to perform better especially with the foreign words. The Grammar is the definition of the words, which caller is expected to say to the system and which are to be understood. Rule-Based Grammar and Statistical Language Model Grammars are two important Grammars. In Rule-based Grammar with explicit rules, what recognizer should listen for is stated. Alternatively in Statistical Grammars, there is a need for collecting a lot of sample responses from speakers and transcription of them. While statistical grammars allow more flexibility and neutrality for
  • 23. 15 speakers, the rule based grammars makes the semantic extracting from the recognition result more easy for the programmers. Acoustic Model is the recognizer’s internal representation of the pronunciation of each possible phoneme or basic sounds. Acoustic Model is different for each language and sometimes adaptable to the environmental noise characteristics of the speakers. Finally, the Recognizer unit of ASR uses Feature Vectors and the Recognition Model (which is made of Dictionary, Acoustic Models and Grammar) to do the recognition search for a best match from the Grammars. The result is measurable for its confidence and most of the recognizers return their answer’s confidence measure. But in case that the recognition result is poor with a low confidence, the recognizer can result with no recognition to avoid returning a wrong result. Basically the recognizer is looking for its best match between a given grammar and what is spoken. When he can not find any match, it simply returns no recognition. Figure 11 Speech Recognition Process [9] 1.4.3 VoiceXML Application Side The VoiceXML platform communicates with the application side via HTTP. The Web server at the application side serves to the platform in a request/reply manner. For example when the caller triggers an action, like a request for a new VoiceXML document, the VoiceXML browser forwards this request to the web server with the related parameters and asks for the next VoiceXML page to present to the caller. Then the Web server takes this request and after evaluating the request in cooperation with Application server and Back-end Systems, it returns the dynamically generated VoiceXML page back to the VoiceXML platform. The Grammars, Audio prompts are also provided to the VoiceXML Browser to use in case of necessity as the caller browse through the VoiceXML document. From the functionality point of view, we can easily make an analogy between an HTML page and a VoiceXML document, also between pictures of a HTML page and Grammars and Audio Prompts of a VoiceXML page.
  • 24. 16 Therefore a VoiceXML Browser and a HTML Browser have similar functionalities; one serves for the visual interaction purposes with the application side, while the other enables a speech interaction with the application side. Figure 12 VoiceXML Architecture at Server Side Due to its similar functionality VoiceXML uses the techniques and architecture already developed and improved for visual web, as shown in the Figure 12. When we consider the enhancements in Web server technologies at the last 10 years, we can see how the VoiceXML has paced fast in a short time by using the techniques of the Web. VoiceXML inherits the multi-tiered architecture of Web applications and gets empowered with the “Model-View-Control” architecture due to providing a new “View” way for the applications to be developed. When we look at a Java’s Multi-tiered Enterprise Architecture in the figure below, we can see that Application Logic, where the business requirements are formulated is separated from the View of the application. This loosening enables the application to be viewed and used in different client environments. In this Architecture the User Interface can be a Web page, a Java Applet, a Rich Client that run at the client side and more... Web tier enable the access of potential Web clients like Web Browser, WAP browser and also VoiceXML to the Application Logic or in other words to the Business Tier. But the Rich clients that are implemented by different programming languages can also provide different types of Graphical User Interface to the end-user alternatively. Web Tier can be implemented with many technologies like Java Server Pages (JSP), JavaScript (JS), Servlets, Active Server Pages (ASP) to provide the necessary HTML, WAP, VoiceXML code to those Browsers.
  • 25. 17 Figure 13 Multi-tiered Application [33] On the other hand Business Tier is the core part of the application, where the business logic is implemented to solve the problems in the domain. Business Tier receives data from the user, process that data coming from the user with the data available at the Back- end System, and then via the user interface presents the data back. The necessary interfaces and communications are developed at this tier to communicate both with the client and the Back-end Systems. The final tier, Enterprise Information Tier, is the encapsulation of components that provide necessary data via Databases and also the encapsulation of services like web services or other enterprise applications. As a result of loosening the application segments with multi-tiered architecture, a programming pattern “Model – View – Control” turned out to be more practical to be applied. In this approach core business model functionality is separated from the presentation and control logic. This separation allows multiple views like HTML, VoiceXML, and WAP to share the same enterprise model. In the Figure 14, model represents enterprise data and the business rules that govern the access to the data. View renders the contents of a model and specifies how that data should be presented, while Controller translates the interactions coming from the View into actions to be performed by the model and also selects the view for the response to the interaction. The main advantage of this approach is the enablement of multiple views and clients by using the same enterprise model.
  • 26. 18 Figure 14 Model-View-Control Pattern [34] 1.5 Voice User Interface 1.5.1 Human – Machine Interaction through Speech From the end-user point of view, the VoiceXML is a series of continuing dialogs between the computer and user itself about interrelated topics. He listens for the talk of the machine and he replies to it again by talking. At this point the dialog is the base point for communication and interaction between the computer (Machine) and the user. This kind of interaction between a machine and its user is something not widely available and practiced by the users yet. For a car driver or a computer user, it is certain that the user will do his best in order to make the machine understand what he wants to be done. But now there is only a speech channel used for the interaction and this is done via a telephone, where there is only a microphone and a speaker available. Unlike talking with a real person, now the user should be aware that he is talking with a computer and this machine is not as capable as a human. This speech technology has still much to do, in order to mimic a real person. What is available now is a machine trying to do his best, unlike the case where the user is trying to do his best to use internet browser, to keep the communication going on. The user interacting with speech should be aware that he is talking with a machine, so that he doesn’t expect too much from the correspondent in means of cleverness and other human aspects. This nature of communication with human
  • 27. 19 brings more effort also at the machine side; machine should do its best to mimic like a real user; should consider always the user’s state of mind and other aspects while communicating with him. Users should be aware of the limited intelligence capabilities and should help the understanding of computer. The auditory interaction between the machine and the human, voice user interface, therefore needs skills from speech technology, user interface design, cognitive psychology, linguistic and software development. Since the voice user interface is only intractable part of the product, it desires a professional focus for the success of the VoiceXML application to be deployed. Therefore the design environment of the VoiceXML applications should not constrain the application of Voice User Interface techniques, contrarily should ease it. Voice user interface is roughly built up from the prompts, grammars and call flow. The prompts are the recordings of the speech of a voice talent, or the synthesized speech that is played to the user. Grammars are possible things the user can say in response to the prompt, or with other words the possible things that computer should listen for and then should understand. Computer can say anything, any text in means of the availability of technology, but cannot understand the things that are not somehow in grammar and not commanded to be listened for. Call flow, which also called as dialog logic, defines the actions done by the system based on the conditions and properties and inputs from the user. Unlike in the multimodal interfaces, where speech is one of the interaction channels, in auditory interfaces the interaction has a unique design challenge because the message exchanged are non-persistent and transient due to characteristics of speech communication. The user is intended to create a persona (personality) in his mind, and continues the communication with this sound and feel. The designer is expected to stay consistent in his design with the character of persona he wants to create, since the user creates a mental model for the person-computer he is communicating with. The following type of applications suits well for VoiceXML due to the non-persistency characteristic of auditory interaction medium [3]: • VoiceXML is the best alternative for DTMF applications, where the user deals with a limiting non-intuitive phone key pad. For these already deployed touchtone applications VoiceXML can be intuitive and efficient solution by improving the quality of interaction. • Where the voice is the preferred mode of device interaction; for hands-free and eyes-free activities like driving a car. • For the telephone devices, which have generally very poor and limited user interface for keypad and screen. • For ubiquitous access by telephone; enabling the phone access to many systems available anywhere for improving the reach.
  • 28. 20 • For the users that have physical disabilities. • In cases where the users are motivated to use a VUI enabled service because it saves time and money. Unlikely, VUI applications are not suited for noisy environment and for situations in which user may be talking with people and the device at the same time. In these cases the speech recognition engine can not perform well. Also the applications that are visually complex and have large information content are not ideal for voice user interface. But under these cases even the application can be developed, if the user is mobile and speech is the only way to access this service at that time. 1.5.2 Voice User Interface Principles at the Development Phases A good voice user interface design follows 5 main principles. The first one is End-User Input; during the design the designer should validate his decision from the point of usability. He should always, at any level of design, use the user-input that is appropriate for that phase. This is the key concept in a user-centered design paradigm. As the second principle, the designer should find a solution to meet both the business goals and user needs. He should find solutions and decide for trade-offs in order to meet the need of both parts. The third principle: It is always more expensive to make a design decision late in the project, so it is important to have analysis of the requirements and high level functional design before going into details. The 4th principle is to see the voice application always from a higher point of view, so the designer does not get lost in details. He should always keep a bird eye at the overall flow of conversation. Finally all design decisions should be considered in context relevant to the Application need, User, Language-use, and Persona. With these voice user interface principles in mind, the designer should focus on and should give several decisions during the 6 phases of the application development. These phases can be summarized to Requirements Definition, High Level Design, Detailed Design, Development, Testing and Tuning. At the Requirements Phase the designer should achieve a detailed understanding of the application. If not already the designer is familiar with the business, he should understand the business model and the motivation at the company side to deploy such a service. Designer should get familiar with the competitive environment of the company and the image the company trying to create. The designer can speed up the understanding of the company easily by making contacts with company personal. Then understanding the user takes the turn, designer should examine the caller profiles. Are the callers using industry jargons or a special geographic dialect? Do the callers have a level of technologic experience like using internet or DTMF applications? What is the general state of mind, self image and mental model of the callers? Such familiar questions should be answered as much as possible by the help of interviews or surveys with costumers and company personals. After this understanding the application takes part; which tasks and subtasks there will be. The complexity of these tasks from the caller point of view and which information is supplied by the system to the caller should also be examined. The usage characteristics of the application, like one-time
  • 29. 21 or repeated usage, the environment state or whether the user is willing to talk with an automated system, these situations should be considered at the requirement definition. The early work before going into detailed design provides many advantages for the sake of the project. Therefore High-Level Design is important to guide the design and provide decision elements. First task is deciding for “Key Design Criteria”, which will help for the design trade-offs and for major focus of the design. It should be one to three short items that should be always kept in mind during the design. Then a decision for the Grammar type and dialog strategy should be given. There should be also a consistent character of the application persona. The terminology should be recurrently and consistently used through the application, parallel to the web site of the company. The metaphors to create a useful mental model for the user should also be consistent both between themselves and with the character the application presenting. People going into a conversation even with a machine always make guesses about his correspondence’s personality. The user takes many thinks into consideration like voice characteristics, the words spoken, even the accent… These facts define the necessity for designing a persona that is same like defining a character in a film scenario, by focusing on the personality aspects that can be understood in dialogs. This persona will continue to extend the company’s specific image and its brand. The detailed design is the stage where the designer harvests all the details, aspects and requirements of the project that will help him during the application development. For the success and long-life of the deployment this phase has a crucial role. Therefore designer should have a detailed design document where he discusses his design trade-offs and design decisions. The detailed design has many aspects; at the application flow point of view, the call flow will help the designer to formulate the flow between tasks, subtasks with call flow elements. A dialog state is considered to be the smallest unit in the call flow diagram. Such a dialog state includes [2]: • Initial Prompt: this is what will be played initially when the flow comes to this dialog state. This prompt can be dynamically generated based on the recent history. • Recognition Grammar: The specific information to be returned from the grammar, like a city name, should be described. The high-level grammar definition at this phase should be enough for the grammar developer to create the grammar at the development phase. • Error Handling: The designer must specify the handlings for those errors that may occur due to recognition rejection, recognition timeouts, no-speech inputs and so on. The designer should cleverly recover from those errors with his error prompts by considering the context at the dialog state, in order to avoid from call hang- ups. • Universals: The commands that are always available in an application, like “Help”, “Exit”, “Operator” should be sometimes overridden for better
  • 30. 22 functionalities. For example when a user says operator at the bill checking dialog state, then “Operator” command should be cleverly overridden to connect the user to an accounting operator, instead of a sales operator. • Action Specification: The actions that may happen with a satisfied condition or due to properties of the platform should be specified also for each dialog state. For example a transition decision between two dialog states can happen with the transfer of any logic and data from a backend. The Call Flow design is the “bird look” on how the user should navigate through the system, where the back-end information systems will be integrated, how will be the menu structure, in which ways information collection will be done... As it is stated previously the dialog states are the building stones of the call flow; with the other elements like backend process, entry/exit states, and actions they define the call flow. Although there is not a standard for defining the call flows and their elements, there are some guidelines to follow in order to make them more understandable. The conventions or shapes should be chosen carefully and should be used consistently. Informative names should be used for those elements. The dialog state should be the lowest-level unit on the call flow; the internal details of dialog state should be pictured separately. Figure 15 Call Flow [2] Prompts in a VoiceXML application create the most of the “Hear and Feel”, hence enough detailed design effort should be given for deciding the prompts. The Words should be chosen intelligently to satisfy the voice user interface concepts and principles. While capturing the application persona other aspects, such like the experience of end- user, should be always taken into consideration. The designer should carry out a
  • 31. 23 conversational design by writing the prompts and by reading out them loudly in their conversational context. He should write sample dialogs between the system and user for the considered tasks. Not only the successful dialogs but also the dialogs that cover the error and help prompts should be worked out. This way the designer can get more near to realistic and natural prompts, but this is not enough. Since written language and spoken language is different for many aspects, the designer should experience the prompts by reading them aloud. The prompts to the user can be generated by text-to-speech engine where the prompt text is highly dynamic and changing, like an email. But a voice actor can be used to record those static prompts in order to provide a more natural sounding prompts. An important issue is to have a consistency between voices when TTS and voice actor are together used. Here a voice converting functionality can be used to keep the TTS voice similar to voice actors. The Designer at the detailed design phase should consider following principles to optimize his call flows, prompts, grammars and other elements [2]: • Minimize the cognitive load: Cognitive load can be summarized as the amount of mental resources needed from the user for a task. It includes listening, attention, memory, language processing, decision making, and problem solving and speaking back. The designers carefully minimize the necessary information that callers must hold in their short term memory and the amount of new concepts they should learn. Designer should provide ways to recover, if the user gets lost because of an attention loose or so. • Accommodate conversational expectation: The end-users of VoiceXML applications unconsciously have human-to-human conversation expectations. With the persona in mind, designer should respond these natural expectations. For example; the system shouldn’t use jargons that end-user won’t understand, it should be explanatory for new concepts or it shouldn’t change conversation topic to something unrelated. • Maximize efficiency: The callers, especially the frequent ones, want speed and efficiency. They won’t like to listen and to be asked the same information each time. The designer should provide shortcuts to frequent tasks and should allow barge-in functionality to skip the already understood prompts. • Maximize clarity: The designer should always fight again possible ambiguities. This is valid for every level in the application; at the prompt level designer should avoid using synonym words that will lead to misunderstanding. At higher levels callers’ mental model between dialogs should be taken into consideration so that they don’t get lost and frustrated. • Ensure high accuracy and error recovery: Recognition errors, no matter if it happens because of the quality of the used recognition engine or due to the noisy
  • 32. 24 environment of the caller, must be avoided since they seriously reduce the caller confidence and usability. Accuracy of a system can be calculated by proportioning the number of correct and false accepts, as well as rejects. Recovering intelligently from errors also needs detailed design effort. Recovering from Rejects and Timeouts should be done by increasing the detail about the requested information and by providing a help option, instead of forcing the user to repeating his error. Confirmation is also another technique to check if the recognized values are same with what intended by the user. “When to confirm” and “How to confirm” are two important confirmation decision cases. The System should also be intelligent enough to avoid repeating the errors, especially the errors that are corrected. Avoiding the errors and ensuring high accuracy, which increase the prestige of the system, must be always preferable than dealing with recoveries and results of the errors. Designers, due to being too near to their designs, are the worst people to judge the usability of their designs. During the detailed design to get a good feedback and to evaluate their design decisions end-user testing is “must”. Designer should refine his design with a number of iterations of testing and design. In order to get feedbacks from the user to improve the design, even before having the prototype, a “Wizard of Oz” can be used. The “Wizard of Oz” simulates the behavior of a working system by having a human that acts as a system. An example “Wizard of Oz” tool in voice user interface design is “SUEDE” [6]. In this tool human performs virtual speech recognition and generates the prompts by choosing the appropriate response from prerecorded speech files. Using such a tool provides many advantages like chance of early testing even before having a prototype. Using a “Wizard of OZ” also eliminates getting stopped because of software bugs, when your aim was testing usability, and getting early feedback for your design. Due to using a human for speech recognition, “wizard of OZ” operates with high grammar coverage. This is also preferable instead of using a poor, not finished grammar, which doesn’t cover enough responses of the user and which will limit the testing. But also realistic grammar recognition work should be carried out by that person that simulates system behavior. The designer should choose the test personals carefully and also the test environment. The tests should be preferably done by telephone, since telephone is the device where the application will work by. This approach will make reaching the right sample of caller (test user) population easier. Also it will be cheaper than bringing the test participants to the office. Having the users in their real calling environment and state of mind will provide also realistic test results. But the designer should provide enough information to the test person by providing the task definitions that should be followed. The designer should describe only the situation and tasks without saying the commands and strategies for fulfilling those tasks. Test session should be copied to a tape for a detailed analyze. The test person should know that this test effort is carried out for improving the design of system, not for testing himself. A questionnaire should be provided to the test personal with open-ended questions to get more idea about their experience and feedbacks. Finally
  • 33. 25 the designer should make a good analyze with the results of those tests; he should detect the problems by looking at the symptoms that appear in the tests. Before opening the application to the public use, there should be a pilot deployment for testing and finding bugs at the operation time. This testing includes application testing, recognition testing and usability testing. At application testing all the possible dialogs should be traversed and the answer combinations should be tested as much as possible. The test at this level should be done against the system to find out the bugs, so every error should be noted. This testing will be Quality Assurance (QA) test that is executed for all integrations, all conditions. After those functionality tests, the load test should be performed by using software that simulates the calls to the system in order to check efficiency and run-time errors in high usage cases. In recognition testing the application’s grammar should be tested for the utterances supposed to be “accepted” and “rejected”. The initial values for recognition parameters like “no-speech timeout”, “confidence- threshold” should also be tested. Usability testing should be done again by random users to get their idea about the complete system; the bottlenecks from the usability point of view must be decided and corrected. The tested application can be opened to public use after removing those critical errors. At this phase the application tuning should be done to improve the application. The designer must listen to some recorded calls to get idea about the working of the application at real- time. He should also view the call logs and those reports generated by the platform like task completion analysis, hang-up analysis and hotspot analysis. Such reporting and analyzing utilities provided by the VoiceXML platform will make the task of designer easier to tune the application for better performance. Finally the deployed application needs to be continuously monitored against high call rates and crashes. By the help of logs and reports the application should also periodically tuned to adapt to changing caller profiles and usages. 1.6 Modeling approach for VoiceXML Domain specific modeling represents a powerful way to share abilities and knowledge of domain expert developers with “new into domain” developers. A powerful modeling toolset can reduce the time and the effort required to develop a domain specific modeling environment, along with its library and automatic code generation support. Industrial applications of this approach show remarkable improvements in productivity and training time; some of them speed the software development processes up to ten times faster. The industrial benefits of such an increase in productivity are clear; with shorter product lifespan, shortened training time, reduced implementation effort. At the moment such results are necessary for the expanding VoiceXML. At the initial years of HTML web page design tools were in strong demand, because everyone wants to create their own web pages. VoiceXML is also going into such a high demand phase, as the companies start to understand the value of voice enabling their IT systems for existing services and for new voice enabled value adding services.
  • 34. 26 Model driven approach for generating VoiceXML applications is not only providing the user a tool to design his call flows and then to generate a VoiceXML code automatically. The task is to enable so-called “modeler”, who is an expert of his business problem and his business domain but who has less knowledge and time for VoiceXML technologies, to create new telephony solutions and to enable them to create telephony access to his company services. In this modeling tool the modeled solutions should represent the concepts and notations of the voice user interface techniques, business requirements, user requirements, backend systems. Another result of applying a model driven approach is abstracting the technological details of the underlying components of the application architecture and providing an interface to configure those services, utilities of the application server and VoiceXML browser. Expectations and requirements from such a modeling tool can be formulated by the following points: • The end-user of this modeling tool should be strongly enabled to apply the voice user interface techniques, and the overall approach should not obstacle the application of voice user interface techniques of an expert, on the contrary it should guide him in application of these techniques. • This tool for model driven approach should enable the end user for easy use of the facilities that is provided by a VoiceXML document server or by a VoiceXML Browser. More importantly the domain engineer should be able to design his solution in a more easy way than before by using these facilities of the servers. • The graphical language’s items and rules should be abstracted and classified out from several business domains for the telephony speech application solutions. It should enable the visualization of overall application architecture, dialogs and call flow, as well as the representation of business domain’s items. • The tool, in which the solution designer will work on, should be intuitive to use as much as possible and should follow graphical user interface techniques. • The domain modeling tool should provide the best practice dialog components, grammars, prompts in a library format. This library should be highly extendible and improvable. This will facilitate the efforts for inexperienced designers and will provide a way of sharing between designers. • Such a tool should also enable a good and tidy code generation from the model to the target code by mapping the meta-models. It should decrease the amount of manual programming, but still it should provide to see and modify, extend the code when necessary. • The modeling environment should enable the “modeler” to debug, test and deploy his solution.
  • 35. 27 • The tool should provide the utility for the configuration of both the VoiceXML application server and the VoiceXML Browser. • The tool itself should be based on open standards and should also be expandable for more capabilities. The tool should also ease the exchange of modeled solutions and the implementations between the users of the tool. This will improve the communication of solutions between the users and will lead to population of the library concept. This telephony applications modeling tool should enable a faster, easier solution design. It should result to a shorter training time, an increased productivity, and a high quality solution and re-using previous design solutions. In addition to speeding and automating the development, the tool should result to an understanding of the modeled design, should also support communication and documentation of solution.
  • 36. 2 Model Driven Software Development At this chapter the model driven software development (MDSD) is described in more detail. At the first section the general idea and overview of MDSD is presented, then at the second section the main steps of building MDSD architecture for a specific domain is explained. With in the third section the extension mechanism for Unified Modeling Language is described, this section gives also a brief introduction about UML, UML diagrams, UML meta-models and how to extend the meta-models in order to fulfill the missing design elements for specific domains. At the final section the functional parts of a MDSD tool and the co-work between them is described. 2.1 Concept At software development modeling is widely used to abstract and to present the implementation decisions. By the wide use of Unified Modeling Language (UML), this process got standardized and provided the common base to share the implementation ideas between software developers. When UML is used for defining the implementation concepts like the classes and functions there is only high level conceptual connection between the model of the application and its source code. But by Model Driven Software Development UML models do not serve only for documentation and communication of implementation ideas; they serve for a stronger connection between the application model and the application source code. Model Driven Software Development (MDSD) can be compared to car design and manufacturing process at automobile industry, in which the engineers design the car models with specific Computer Aided Design (CAD) tools and let the robots finish the cars through the production band. Such an approach was invented to enable production automation and high productivity rate. Similarly this automation process can also be applied to software development by analyzing a certain domain and then defining a domain specific modeling language that will be used in the design process. After the domain analyzing process and after having the domain specific language sample applications can be modeled for that domain by a UML tool, and then by MDSD tools these models can be transformed into code. Through this a higher automation potential for finishing the applications and relatively higher productivity can be achieved. With once defined domain modeling language and example application source codes and by the use of a MDSD tool the designed application models can be transformed automatically into application source code, so that a sense of software production band can be build, which may lead to a higher reuse and software quality. Analyzing a domain starts by examining the already available example applications and the source codes of the applications. As a rule of thumb the more modular and the more structural defined the source codes, it is much easier to figure out and categorize the modeling elements for its domain. But with a bird’s eye view to the example application
  • 37. 29 source code the identification of actual software structure is difficult, because the abstraction level of programming languages is at low level. This can be overcome by reverse engineering (reverse engineering is the process of analyzing a subject system to create representations of the system at a higher level of abstraction), which is provided by most UML tools. Through this process the visualization of the source code at the UML syntax is possible and this can provide additional clues for designing that domain’s modeling language. The models designed through this domain’s modeling language will not only serve for documentation but will also become an asset for the software and serves as an acceleration and quality factor for the software development. Table 1 shows the basic principles for developing the software by means of MDSD. As the very first step the domain and the reference applications should be analyzed and should be grouped into 3 categories: “General Code”, “Schematically repeating code”, “Manually repeating code”. The “General Code” part of the reference application is the code that doesn’t change and is same for all applications. It provides the basis for the applications to work on; this part of the application code is named as framework or platform. The “Schematically repeating code” is the code which is not exactly same through the reference applications but has a similar structure. Finally there is the “Manually repeating code”, which is specific for each reference application and must be implemented separately. After categorizing the reference applications into 3 sub sections as described, the second step is to construct a domain specific modeling language for the schematically repeating code parts. This modeling language, which is an extension to the existing UML language, will serve later for modeling the domain applications. Then the models, which are designed by domain’s modeling language, are transformed to the source code that represents the schematically repeating code category of the reference applications.
  • 38. 30 Figure 16 Main idea of Model Driven Software Development [37] 2.2 Developing a MDSD Process In order to apply Model Driven Software Development process at a specific domain for improving software development, it is necessary to develop first a “Generative Architecture”. In order to build this base the domain and its reference applications should be examined and the code should be categorized into 3 categories like “Manually written code”, “Schematically repeating code” and “General code”. The “General code” category of the reference application is the platform where applications are built on. This can be a web server, a J2EE platform or a .Net platform. The “Schematically repeating code” category of the reference applications is the code that has similar structure and appears as copy-paste code. This can be figured out by comparing the code parts that builds the main structure through the applications. Defining the “Schematically repeating code” carries a high importance for building successful and productive “Generative Architecture”. It can be formulated as the more schematic code is extracted the more flexible and effective the MDSD process. The “Manually written code” category consists of the code part except the “General code” and “Schematically repeating code” and is specific for each application. This code category is not aimed to be generated through the transformation from the UML models, and it also serves as glue- code between the generated code and platform code. These schematically repeating code parts of the reference applications, the application domain and already available UML models give hints for defining the domain specific
  • 39. 31 modeling language that will be used at modeling the applications. The MDSD tools support many design formats both in textual and graphical mode; among them the Unified Modeling Language (UML) is the industry wide accepted. UML is intended to be used with all development, application domains and has also extensibility mechanism through UML-Profile that enables the addition of domain specific design elements (called meta-models). Through this it is possible to design the modeling language that serves for modeling an application domain by extending the UML with stereotypes, tagged values and constraints. A meta-model defines the construction elements that will be used in a concrete model. Basically a model is instance of meta-models. The UML meta-model consists of elements like Class, Operation, Attribute, Association, etc. UML extension mechanism and meta-model concept is described at section 2.3 in more detail. To sum up the schematically repeating code parts serve as the base to define the UML Profile that will be used for modeling the domain applications. For example at a website application a user registration code that is used often may look like as below: Table 1 “Schematically repeating code” part of a website application [37] Such a code section of the applications also serves as code that may be used at the templates for the generation of source code through transformation of UML models to source code. These code sections are represented with a UML meta-model element at the domain specific design language and they serve as “Lego brick” at the design of the applications in a UML tool. The templates serve as cartridges that will be used by the generator to transform the meta-model, which are used at the models, to the application source code. At the “Generative Architecture” the platform in which the applications will run is already defined and this affects the templates implementation language that are
  • 40. 32 going to be written in the template but due to having a higher abstraction from the source code to UML meta-model, platform doesn’t affect the UML meta-models that represent this schematically representing code sections. Therefore an independency is achieved between the application model and the implementation language that will be used. This way it is possible to write templates at any language depending on the platform that the applications are planed to run on. When the platform is supposed to be changed using the corresponding templates of that new platform will also make the already designed applications transformed to that platforms source code. So the source code at Table 2 serves as a Java code for the template that is represented by the UML meta-model at Table 1below. This template code may be replaced with a c# implementation in order to be used at a .Net platform. This independency between the UML representation of a template that is generated from the “schematically repeating code” and the source code implementation of that UML meta-model according to the platform makes one of the important gains of MDSD. Table 2 UML meta-model representation of “Schematically repeating code” part [37] Above the connection between the reference applications and building the “Generative Architecture” for that application domain is described. The importance of analyzing the application domain and the reference codes to define a UML Profile, which will be used at modeling the applications, is emphasized. Finally the use of schematically repeating code part as a template and representation of it as a UML element is exampled. After defining the “Generative Architecture” now it is possible to model the applications of the domain by using the specified UML profile. At this point the applications are modeled as usual by a UML tool with these new added UML meta-model elements. Developer today exposed to very complex software infrastructures like application server, database, open-source-frameworks, protocols, and interface technologies, and should work wisely through these connected parts to form robust, maintainable software. With this increasing complexity the software architecture gained more meaning and importance, therefore worked on more. MDSD aims to improve the development efficiency just like improving the software quality and reusability. This is especially done by means of freeing the developer from the error prone routine development work. If the source code in the reference applications is more schematic then the building generative architecture is much easier. So the software architecture has an important role for the
  • 41. 33 software developer since it can be used as a blue-print through the implementation. Also, the more the software architecture is worked on from the source code point of view; the more schematic is the application programming. Schematic programming means a bigger part of copy & paste programming with modification of the context. But this part of work is mostly not a much an intellectual work. This error prone copy/paste/modify work can be done through a generator. All these things lead to the generative software architecture. It is necessary to provide a specific model of the application as an input and accordingly the source code of the application will be produced as an output. Transformation is an important concept of MDSD; it should be defined flexibly and formally according to the specified profile. This is a condition to have the automation of transforming the model to the code through a generator. Although it is possible to have a transformation between two meta-models, most of the MDSD tools generates the source code form the meta-model by using templates. The generators enable this through transformation languages. They are scripting languages to define code generation templates. MDSD enables to improve the development efficiency just like improving the software quality and reusability. This means especially to free the developer from the error prone routine development work. Developer today exposed to very complex software infrastructures like application server, database, open-source-frameworks, protocols, and interface technologies, and should work wisely through these connected parts to form robust, maintainable software. Therefore, with the increasing complexity, software architecture gained more meaning and importance. As a rule it is necessary to apply application domain specific UML-Profils. It also enforces formal application design. The model to code transformation is done through generator-templates and typically processed by generator framework, so that the infrastructure code can be produced automatically from the architecture oriented design model. The matter of fact is that the model should already have relevant information to generate the infrastructure code. At Figure 17, these concepts are depicted with how the overall MDSD process should be carried out. First it is necessary to define the “Generative Architecture” for the application domain. This step includes first analyzing the application domain and the reference code to define a UML Profil that will be used to model the applications for a specific domain. Then the templates should be written which will be used to generate the source code of the applications by the generator through transforming the meta-models used in the models to code. This step is mutually related with defining the domain specific language, where the available schematic code parts can define the modeling elements, some modeling elements may also represent a necessity for having specific code templates. Since the application code will run on a specific platform, the templates also should be suitable for the platform; for example for a J2EE platform the templates should be written in Java, or for a Web Server, the templates should be written in HTML. After having the “Generative Architecture” defined the applications can be modeled by
  • 42. 34 the domain specific UML Profil, then these models are read by the generator and according to the model necessary templates are fetched from the available templates. After having the code from the generator, necessary manual implementations are done for the specific application so that it runs on the platform, for which the application is designed. Figure 17 Main Steps at developing MDSD process [37] 2.3 UML Extension Mechanism The Unified Modeling Language (UML) is a general-purpose visual modeling language that is used to specify, visualize, construct, and document the artifacts of a software system. It captures decisions and understanding about systems that must be constructed. It is used to understand, design, browse, configure, maintain and control information about such systems. It is intended to be used with all development methods and application domains. UML includes semantic concepts, notation and guidelines. It has static, dynamic and organizational parts. It is also intended to be supported by interactive visual modeling tools that have code generators. The UML captures information about the static structure and dynamic behavior of a system. The static structure defines the kinds of classes important to a system and to its implementation, as well as the relationships among the classes. The dynamic behavior defines the history of objects over the time and the communications among classes to accomplish goals. Modeling a system from several separate but related viewpoints
  • 43. 35 permits it to be understood for different purposes. The UML also contains organizational constructs for arranging models into packages that permit software teams to partition large systems into workable pieces, to understand and control dependencies among the packages, and to manage the versioning of model units in a complex development environment. It contains constructs for representing implementation decisions and for organizing run-time elements into components. UML is not a programming language but a design language. Tools can provide code generators that transform UML into programming languages, as well as visualization of the existing application source code in UML.[39]. As a modeling language, it focuses on formulation of a software solution by modeling. The model embodies the software solution regarding the subject, and the appropriate application of this knowledge constitutes intelligence. As it applies to specifying systems, it can be used to communicate "what" is required of a system, and "how" a system may be realized or implemented. As it applies to visualizing systems, it can be used to visually depict a system before it is realized. As it applies to constructing systems, it can be used to guide the realization of a system similar to a "blueprint". The UML is not: A visual programming language, but a visual modeling language. A tool or repository specification, but a modeling language specification. A process, but enables processes. As a general-purpose modeling language, it focuses on a set of concepts for acquiring, sharing, and utilizing knowledge coupled with extensibility mechanisms. To understand the architecture of the UML, consider how computer programs and programming languages are related. There are many different programming languages and each particular program is developed using a specific programming language. All of these languages support various declarative constructs for declaring data, and procedural constructs for defining the logic that manipulates data. Because a model is an abstraction, each of these concepts may be captured in set of related models. Programming language concepts are defined in a model called a meta-model. Each particular programming language is defined in a model that utilizes and specializes the concepts within the meta- model. Each program implemented in a programming language may be defined in a model called a user model that utilizes and instantiates the concepts within the model of the appropriate language. This scheme of a meta-model representing computer programming constructs, models representing computer programming languages, and user models representing computer programs exemplifies the architecture of the UML.
  • 44. 36 The UML is defined within a conceptual framework for modeling that consists of the following four distinct layers or levels of abstraction like in Figure 18 4 Meta Levels of UML [38]: The meta-metamodel layer consists of the most basic elements on which the UML is based. This level of abstraction is used to formalize the notion of a concept and define a language for specifying meta-models. The meta-model layer consists of those elements that constitute the UML, including concepts from the object oriented and component oriented paradigms. Each concept within this level is an instance (via stereotyping) of the meta-metamodel concept. This level of abstraction is used to formalize paradigm concepts and define a language for specifying models. The model layer consists of UML models. This is the level at which modeling of problems, solutions, or systems occur. Each concept within this level is an instance (via stereotyping) of a concept within the meta-model layer. This level of abstraction is used to formalize concepts and define a language for communicating expressions regarding a certain subject. Models in this layer are often called class or type models. The user model layer consists of those elements that exemplify UML models. Each concept within this level is an instance (via classifying) of a concept within the model layer and an instance (via stereotyping) of a concept within the meta-model layer. This level of abstraction is used to formalize specific expressions regarding a give subject. Models in this layer are often called object or instance models. Figure 18 4 Meta Levels of UML
  • 45. 37 • Diagrams Diagrams depict knowledge in a communicable form. The UML provides the following diagrams, organized around architectural views, regarding models of problems and solutions: _ Use case diagrams depict the functionality of a system. _ Class diagrams depict the static structure of a system. _ Sequence diagrams depict an interaction among elements of a system organized in time sequence. _ Collaboration diagrams depict an interaction among elements of a system and their relationship organized in time and space. _ State diagrams depict the status conditions and responses of elements of a system. _ Activity diagrams depict the activities of elements of a system. _ Component diagrams depict the organization of elements realizing a system. _ Deployment diagrams depict the configuration of environment elements and the mapping of elements realizing a system onto them. • Extensibility Mechanism Extension mechanisms allow you to customize and extend the UML for particular application domains and result in a new UML dialect. The extensibility mechanisms are constraints, tagged values and stereotypes: Figure 19 UML Extension Mechanism • Stereotypes A stereotype is a kind of model element defined in the model itself. The information content and form of stereotype are the same as those of an existing kind of base model element, but its meaning and usage is different. A stereotype is based on an existing model element. By using stereotypes it is possible to introduce new meta-model elements to existing UML meta-model elements. Stereotypes are shown by guillemots (<<,>>) It is also possible to create an icon for a particular stereotype to replace a base element symbol.
  • 46. 38 • Constraints Constraints are for specifying semantics or conditions that must be maintained as true for model elements. Each constraint expression has an implicit interpretation language, which may be a formal mathematical notation, a computer based constraint language, or informal natural language. When the constraint language is informal its interpretation is also informal and must be done by human. Constraints are shown as expressions strings enclosed in braces. • Tagged value A tagged value is a pair of strings, a tag string and a value string that stores a piece of information about an element. A tagged value may be attached to any individual element in UML and can be used to store arbitrary information about elements. Tagged values also provide a way to attach implementation dependent information to elements. For example, a code generator needs additional information about the kind of code to generate from a model, for such a purpose certain tags can be used as flags to tell the code generator which implementation to use. Tagged values are shown as strings with the tag name, an equal sign, and the value. They are normally placed in lists inside braces 2.4 Functional Parts of a MDSD tool In Figure 20 below the functional parts of a MDSD tool are depicted in more detail. This MDSD generator framework is an open-source project and other MDSD tools have also similar architecture. At the “Generative Architecture” part, it is necessary to design the UML Profile which will serve as the modelling language for the application domain; this is done by using available UML meta-model elements and by extending the relative UML meta-model elements to represent the application domain at the modelling phase. After designing the domain representing UML Profile, it is necessary to implement the corresponding UML Profile meta-models in a programming language like Java in order to access the properties of those model elements like Classes, Associations, Attributes, and new extended UML meta-model elements. Within the openArchitectureWare tool core UML diagram’s meta-models like Class Diagrams and State Diagrams are already implemented. The platform that will host the applications, not directly effects the UML Profile and the meta-model implementations since they are abstracted from the reference codes to define the modelling elements that are going to be used for the application domain. But the Platform, where the application will run, affects the code that should be generated from the models that are designed. At this point there is a separation between the application source code and the application design so that models can be designed for a specific business domain and the code that is going to be generated can vary according to the platform, where the application is desired to run. This is one of the important gains of applying model driven software development, since this enables the platform
  • 47. 39 independent modelling where the models remain as assets for any platform. For each platform corresponding source code templates should be written, for example in order to transfer the model to a source code application that will run at a J2EE platform the templates that are written in Java can be used, alternatively for a .Net platform the templates written at C# can be used. Templates enable the platform specific code generation. As described above the main idea of MDSD development is to generate the schematically repeating code part of the reference applications. These parts of the application code are copy-pasted to the templates and combined with the properties read from meta-models to generate code from models (once they are represented as instances of the meta-classes). The openArchitectureWare templates are written in a special language (called xPand) that is easy to learn and very well suited for writing templates (and nothing else). After defining the design language for the application domain at the “Generative Architecture” part, a UML tool can be use to model the applications of the specific domain. For this process the already designed UML Profile can be imported into the UML toolkit in order to design the applications. The application models that are designed by a UML tool are exported as XMI-representation. XMI is a standard file format to save/load and exchange UML models among the UML tools. At the “Generator Framework”, first the models are read form the XMI-representation and then each meta-model which is read from the UML model is mapped to the corresponding meta-model implementation, which is defined at “Generative Architecture”. The “Instantiator” normally maps each standard UML meta-model to the already available meta-model implementations, which are provided by the openArchitectureWare tool. For the new introduced meta-model elements, it is possible to define their mappings between their meta-model implementation and their XMI- representation. After the “Instantiator” reads the UML design, it instantiates the corresponding meta-model implementations and then by the “Generator Backend” the templates are fetched for each of the meta-models and transformed to source. At the “Application” side, the application logic is added to the generated code by manual coding, and then the application is ready to run at the selected Platform. The UML design of the application is also stored as XMI-representation at the application side.
  • 48. 40 Figure 20 Functional parts of openArchitectureWare generator framework [37] A more detailed model to code transformation is depicted at figure below: The generator first reads the XMI-representation of the UML model and then this XML output is parsed by the “Parser”. Afterwards the “meta-model Instantiator” instantiates the corresponding meta-model class implementations and then the code generator fetches the templates for each meta-model element. Finally these meta-model elements are transformed to code. Figure 21 Functional parts of openArchitectureWare generator
  • 49. 3 Implementation For the implementation as a UML tool, Poseidon for UML Community Edition is used and as a model driven software development tool openArchitectureWare is used. To validate and modify the transformed code an XML editor called oXygen is used and as a platform web server Apache is used. As described in the second chapter for building generative software architecture the telephony application domain meta-model is done as start, dialog, end and service. And in the templates the source code is written in VoiceXML. 3.1 Meta-Model For telephony application from the standard UML class meta-model 4 new meta-model is created these are “Start”, “Dialog”, “End” and “Service”. These meta-models are designed by using Poseidon UML tool. Figure 22 shows the structure of the class diagram. The first “Start” meta-model element is to be used for welcoming the user to the telephony application: it contains the Welcome prompt that will be played to the user when the phone is opened, the error prompt that will be played when an error occurs, the no-match prompt that is going to be played when a spoken input doesn’t match with the available grammar, the no-input prompt that will be played if the user doesn’t give any spoken input. It also has the parameter where the maintainer of the application is going to be written. The second design element is “Dialog” element; it is simply asking a question to the user and listening for the answer, then it processes the input and plays back the recognition result. It has 3 parameters that are to be filled by the modeler; these are “initialPrompt”, Grammar, Universals: the first one directs a question prompt to the caller, second one is the grammar that the recognition engine will listen for, and the third one is the universals that the recognition engine will be listening for. These universals can be Help, Exit or Operator. The user may fill these parameters by writing “Help: please repeat the last sentence”, “Exit: thanks for calling us.” Or “Operator: wait I am connecting to the operator.” There is also one parameter that returns the recognition result from the recognition engine. This is “recognitionResult” parameter. There are also 5 methods which are prompt, recognition, error handler, universals and action. These methods are reading the input parameters that are going to be filled by the modeler and then they generate the representing codes.
  • 50. 42 Figure 22 Class Diagram The third design element is “Service” and it is used for representing a backend connection to get a service from the system. It has 2 parameters these are serviceArg, serviceName. The serviceArg parameter is taking the parameters that may be sent to the service. The serviceName parameter is for choosing the service available. Only two services are available at the web server platform. These are at the moment weather and time services. The fourth design element is “End” element; this is for representing the end of the application and connecting the other design element for finishing. It has one parameter like closingPrompt: that plays the closing prompt to the user.
  • 51. 43 3.2 Model Two example scenarios are implemented using the meta-model generated for the telephony application domain. The first one is a sample pizza ordering application which starts by starts by welcoming the user to the telephony system and provides a menu where the pizza type is asked like cheese; pepperoni and all dressed and then asked if the customer will like to reorder a pizza in such a case; it sends the costumer back to welcome menu otherwise it sends the user to goodbye dialog and then forwards to exit. Figure 23 is representing this scenarios call flow pizza shop. The class diagram consists of nine meta-model elements; namely Start, Welcome, pizzamenu, pepperoni, cheese, alldressed, neworder, goodbye and exit. The meta-model element Start is derived from start meta-model element, the Exit is derived from end meta-model element and the rest of the elements are derived from dialog meta-model element. The first meta-model element is “Start”, which is used in the answering of the phone. Meta-model element consists of 4 tags which are namely noinput, nomatch, error, maintainer. The values of the tags get by the class as an input. The tag noinput is representing the prompt that will be played to caller when there is no input. The value of the nomatch tag is played when there exists no match between the caller’s input and the available grammar. The value of the error tag is used for playing if an error occurs. The value of maintainer tag consists of an email address, which is used for emailing the information regarding the application. The second element is “welcome”, which is used for salutation. It has only one tag, which is called initialPrompt. And the value of the initial prompt is played to welcome the user to the pizza shop. The third element “pizzamenu” that is derived from the dialog meta-model is for prompting pizza types. It consists of two tags: initialPrompt and grammar. The value of the tag initailPrompt is played to the customer to give information about the available pizza types. The tag grammar is for prompting the types of the pizza, therefore the value of the tag consists of three type which is available in the pizza shop. The next meta-model elements of the class diagram are the available pizza types. There are three meta-model elements, which are representing the kinds of pizzas. According to condition tag value between the “pizzamenu” element and the cheese, pepperoni or all dressed elements it moves to the next dialog element. According to the user input the next step is selected. Briefly after the pizzamenu element the flow passes to the next cheese, pepperoni or all dressed element which satisfies the condition. Then comes the “neworder” dialog element where the user is prompted whether he would like to go to the welcome menu again or to the goodbye dialog. Then comes the “goodbye” dialog element where the user is prompted to call back soon. There is finally the “exit” meta-model that is used at the end for closing the telephone call.
  • 52. 44 Figure 23 Pizza Example - Class Diagram
  • 53. At the second scenario a time service and a weather service is described. With a welcome dialog first the user is welcomed and then it moves to the next dialog there it asks calls a weather service and also a time service and finally goes to dialog end element. 3.3 Generated Code The transformer transforms the UML models to the VoiceXML source code. Figure 24 shows an automatically generated VoiceXML sample code for the pizza shop example.
  • 54. 46 Figure 24 Sample Code 3.4 Development Manually Written Code As a basic requirement in every software development, some parts of the source code might be necessitated to self adjusting modify. In such a case it is also possible to modify the generated code manually. After the transformer converts the meta-model in to a source code, the desired modifications can be done manually by using eclipse IDE.
  • 55. 47 XML Validation and XML Well-Formedness Check A "Well Formed XML" document means that it has a correct XML syntax. Using the Check XML Form function it is possible to check if the document conforms to the XML syntax rules [40]. A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD), XML Schema or other type of schema, which defines the legal elements of an XML document. The validation of the transformed code is also done automatically by using the Oxygen XML editor. When creating an XML document, errors can be introduced. When working with large projects or many files, the probability that the errors will occur is even greater. Determining that the project is error free can be time consuming and even frustrating. For this reason <oXygen/> provides functions that enable easy error identification and rapid error location. [40] In Figure 25 the selected red part shows the tool bar that is used for validation and well- formedness check. Figure 25 Oxygen Editor
  • 56. 48 Platform The platform can be started automatically as well. Tomcat button on the toolbar is used to start running the web server. After getting the web server running, the telephony platform fetches the generated code through the web server. In Figure 26 the red selection shows the tomcat toolbar. Figure 26 Platform toolbar