KasaDaka: a sustainable voice-service platform - Master thesis presentation André Baart

KasaDaka: a sustainable
voice-service platform
Developing a Voice Service
Development Kit
André Baart, Dec. 2017

Connecting the unconnected
How to provide the ‘unconnected’ with the
benefits of ICTs?
ICT for Development (ICT4D)
Sub-Saharan Africa, Sahel: (Mali, Burkina Faso,
Ghana)
● Low income (2EUR/day)
● Low literacy
● No internet, intermittent electricity
● Under-resourced languages (Bambara,
Mooré, etc)
Internet adoption very low, however mobile
phones have a high adoption rate, and GSM
coverage is good!

KasaDaka – ‘Talking Box’
Raspberry Pi
GSM
connection

KasaDaka platform
● Low-resource, low-cost (60 EUR)
● Almost completely open-source (and thus
free)
● Uses the existing infrastructure: GSM
network, simple mobile phones
● Telco-independent
Components and technologies:
● Asterisk
● VoiceXML
● VXI (proprietary VoiceXML browser)
● Apache
● MySQL
● Django (Python)
Some use-cases of voice-services:
● Citizen Journalism
● Market information
● Weather Information
● Animal health
● Diary value chain

Key to sustainable voice-services:
Local development
● Low cost is essential (2EUR/day avg. income)
● Foreign developers are expensive!
● A lack of local knowledge causes dependence
on foreign labor
● Development is the biggest expense
In order to allow local communities to afford
voice-services, local development is necessary.
● Greatly reduces development costs
● Reduces distance between developer and
user
● Local businesses, economic growth
However, local developers are hard to find!
Only 12K African GitHub accounts in 2015
Number of voice-service developers in Mali and
Burkina Faso: extremely low (thus not cheap!)
Solution: make it easier to develop voice-services
See: https://blog.ona.io/general/2015/01/01/github-africa-2015.html

Simplifying voice-service development
Hypothesis:
● Voice-services are comprised of a combination of interactions.
● These interactions can be generalized into a small set, e.g.
○ Menu with choices
○ Message/information playback
○ User voice input
○ User digit input
○ Language selection
● By providing building-blocks for these interactions, inexperienced users can build simple
voice-applications by deploying and customizing these building-blocks.
Goal: voice-service development in a graphical (web-)interface, no programming skills required.

Voice Service Development Kit
Development of voice-services from a locally hosted web-interface
● Based on the Django framework (MVC)
● Development through admin interface (screenshot)
● Voice-service structure stored in database
● VoiceXML generation including dynamic data from database
● Slot-and-filler TTS, support for all languages

Evaluation: ICT4D course @ VU
2017 ICT4D course:
● 31 students from varying backgrounds, most
no or little programming experience
● 10 applications developed
● 9/10 used VSDK to develop application
● 80% of applications functioned correctly
● 78% extended the VSDK with custom data
models
● 67% extended the VSDK with custom types of
interactions
Note: the set of provided interactions was minimal:
only menu/choice structures and playback of
messages
Survey key findings:
● Interaction building-blocks work well for
voice-service development, but included set
limited for complex use-cases
● Simple voice-services can be developed
quickly and easily, compared to writing VXML
● Expanding the functionalities of the VSDK has
a high learning curve (requires VXML, Django,
Python)
● Debugging voice-services is difficult, testing
takes up a long time
● Setting up a local development environment
is difficult (students did not have access their
own RPi)

Related work
Same principle, but:
● Not open-source, thus foreign dependency
● Expensive
● Require internet connectivity
● Require enterprise telephone connectivity (not available, expensive)
● Rely on the use of TTS/ASR, no support for under-resourced languages
● Do not support voice, only SMS

Conclusions
Building-block approach to voice-service development:
● Works for development of simple voice-services
● Does not require programming skills
● Less work than writing static VoiceXML
● Even less compared to writing VoiceXML generators for specific applications
So, the VSDK enables fast development of voice-service prototypes for users without programming skills.
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE vxml SYSTEM
"http://www.w3.org/TR/voicexml21/vxml.dtd">
<vxml xmlns="http://www.w3.org/2001/vxml"
version="2.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/vxml
http://www.w3.org/TR/2007/REC-voicexml21-20070619/vxml.xsd">
<property name="inputmodes" value="dtmf" />

<form id="language_form">
<field name="language_field">
<prompt>
<audio src="/uploads/pre_choice_option_nl.wav"/>
<audio src="/uploads/dutch_nl.wav"/>
<audio src="/uploads/post_choice_option_nl.wav"/>
<audio src="/uploads/1_nl.wav"/>
<audio src="/uploads/pre_choice_option_en.wav"/>
<audio src="/uploads/english_en.wav"/>
<audio src="/uploads/post_choice_option_en.wav"/>
<audio src="/uploads/2_en.wav"/>
<audio src="/uploads/pre_choice_option_fr.wav"/>
<audio src="/uploads/french_fr.wav"/>
<audio src="/uploads/post_choice_option_fr.wav"/>
<audio src="/uploads/3_fr.wav"/>
</prompt>
<grammar xml:lang="en-US" root = "MYRULE" mode="dtmf">
<rule id="MYRULE" scope = "public">
<one-of>
<item>1</item>
<item>2</item>
<item>3</item>
</one-of>
</rule>
</grammar>
<filled>
<if cond="language_field == '1'">
<assign name="language_id" expr="'1'"/>
<elseif cond="language_field == '2'" />
<elseif cond="language_field == '3'" />
<else/>
</if>
<goto next="#submit_form"/>
</filled>
</field>
</form>
<form id="submit_form">
<block>
<assign name="session_id" expr="'7'"/>
<assign name="caller_id" expr="'123'"/>
<submit next="/vxml/user/register/" method="post"
namelist="language_id session_id caller_id "/>
</block>
</form>
</vxml>

Limitations & future work
VSDK works well for simple prototypes, but not
(yet) for more complex applications
● Provide more types of interactions
● Solve problem of dynamic data model
generation
○ Linked data?
○ Data2Documents for VXML? (Ockeloen et al,
2016)
● Implement more sophisticated TTS (Justyna
Kleczar MSc thesis, 2017)
● Develop a better testing/debugging workflow
Are the conclusions also valid in the true ICT4D
context?
● Pilot Burkina Faso (Rainfall use-case)
● Train the first local voice-service developer
Other improvements:
● KasaDaka stack in Docker
● Alternative for proprietary VXML browser
● Fix limitations of Raspberry Pi in ICT4D
context
○ Power issues
○ Availability GSM dongles
Later, maybe:
● Do micropayments with mobile money (large
in Africa)
● Create a ‘bip’ voting system
● Data exchange between offline KasaDakas
● Connect sensors to RPi

KasaDaka: a sustainable voice-service platform - Master thesis presentation André Baart

Recommended

Recommended

More Related Content

Similar to KasaDaka: a sustainable voice-service platform - Master thesis presentation André Baart

Similar to KasaDaka: a sustainable voice-service platform - Master thesis presentation André Baart (20)

Recently uploaded

Recently uploaded (20)

KasaDaka: a sustainable voice-service platform - Master thesis presentation André Baart