Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
20161014IROS_WS
1. Cloud Robotics for Building
Conversational Robots
Komei Sugiura
National Institute of Information and Communications Tech., Japan
2. Beyond the Language Barrier:
NICT’s free software and cloud services
1. Speech to speech translation system: VoiceTra (2010)
>1M downloads.
High performance in translation to/from Asian languages
2. MCML Speech interaction SDK (2013)
The SDK enable the user to build WFST-
based multilingual dialogue systems.
3. Smartphone dialogue apps (2011)
Spoken dialogues and recommendation in tourist
guidance domains
4. Cloud robotics platform rospeex (2013)
40K unique users. Top level quality as dialogue-based TTS
in Japanese.
3. [New] Automatic captioning SDK for developers
http://www2.nict.go.jp/astrec-ast/mcml-sdk/index_en.html
Free of charge, but authentication required
Video
4. Motivation:
How can we build communicative robots to help people?
Smartphones and other consumer devices
Speech interfaces give benefit to
consumers
cf. Market size of speech recognition
¥88B@2013→¥170B@2018 (€1.5B)*
Show me today’s
schedule
* Estimation by NEDO, TSC Foresight Vol.8, 2015
Sushi restaurants
around here
Benefit for
QA/search
GPS Contacts Other context
info.
Current communication with robots
Insufficient benefit to consumers
??
??Throw
them away.
Is there any milk
in the fridge?
• Bad recognition accuracy
• User needs to specify [what,
where, how] as well as start/end
conditions
6. Background: Speech recognition/synthesis is bottleneck
for reducing cost in human-robot interactions
• Synthesized speech sounds
monotonous and unfriendly
• Speech recognition does not work
well than expected
XIMERA 3
(Text-reading)
Voice
talent
Target = Interactions with service robots
7. Rospeex:
A cloud robotics platform for multilingual spoken dialogues
• >40,000 unique users have used rospeex
• WER =7.9% (accuracy=92.1%) for IWSLT tst2011 (1st Place
Winner in IWSLT12, 13, 14)
• Top-level quality dialogue-oriented TTS
Python & C++ samples
are available
rospeex Search
* Free of charge for research
8. Rospeex’s positioning in robot dialogue quadrants
8
Cloud APIs
(Google, Microsoft, IBM,
NTT docomo, Wit.ai,…)
Free software
Commercial software
OpenHRI,
PocketSphinx, Festival
Cloud-based
Stand-alone
Robot
middleware-
compatible
Incompatibl
e
Does not work with
very low-spec PCs
Robotics-specific
logs are lost
Authentication
Low quality
Expensive
8
Distribution of rospeex users
rospeex applications (40k unique users)
Conversational agents in elderly care
facilities, service robots, humanoid,
dialogue agents, speech interface for car
navigation systems or smarthome devices,
…
9. Analysis: TTS requests depend heavily on individuals
• Question: Do developers use same sentences for TTS? If so, we can
speed up by introducing local cache.
Cache hit
Cache miss
• Analysis on top 88 users
– New requests = 50.4% on average
– An individual uses max. 200 unique sentences
Without a cloud platform, we
cannot conduct large-scale
analysis of robot developers
Introducing cache will
reduce comm. time
11. Multimodal language understanding
Kollar+ 2010
HRI 2010 Best Paper
• Input: Text, LRF, Image
• Output: path planning
• E.g. “Go down the hallway”
Iwahashi &
Sugiura+ 2010
• Input: Image and speech
• Output: object manipulation
• E.g. “Place-on Elmo”
Visual QA[2015-] • Input: Image and question
• Output: Answer
• E.g. “How many elephants are there?” -> “2”
Video
12. LCore: Multimodal Robot Language Acquisition
[Iwahashi, Sugiura, et al 2010]
Key features
• Fully grounded vocabulary
• Imitation learning
• Incremental & interactive learning
• Language independent
• Learning when to ask questions
12
13. HMM “Place-
on” Place X on Y
Imitation learning for spoken language understanding:
Re-ranking hypotheses using planned trajectories’ likelihood
• Transformation of reference-point-dependent HMMs*
– Input: verb ID, object ID(s)
e.g. <place-on, Object 1, Object 3>
– Transforms HMM from intrinsic coordinate system into world
coordinate system
HMM “Place-on”
World CS
Situation
Place X on Y
* Sugiura et al, IROS 2011 RoboCup Best Paper
14. HMM-based trajectory generation using dynamic features*
: state sequence
: HMM parameters
: time series of
(position,velocity,acceleration)
Maximum likelihood trajectory
*Tokuda, K. et al, “Speech parameter generation algorithms for HMM-based speech synthesis”, 2000
: vector of mean vectors
: matrix of covariance
matrices of each OPDF
: matrix of coefficients in
difference approximation
: time series of position
16. RoboCup@Home: Benchmark tests for domestic robots
• RoboCup@Home: The largest competition for domestic robots
– One of the major RoboCup leagues
– Focuses on human-robot interaction and mobile manipulation
– Robots are evaluated by 8 standardized and 3 demonstration tasks
• Scientific challenges
– Navigation in unknown environments (e.g. real shop), handling
everyday objects, spoken dialogues in very noisy environments, …
16
17. RoboCup@Home Standard Platform Leagues start in 2017
• Many teams need low-cost standardized platforms
• Companies know NAO’s success after selected as soccer-
Standard Platform (Softbank bought Aldebaran @100M USD )
Toyota HSR
• Main use case = partner robot for those who need care
• Lease-based
Softbank Pepper
• Already deployed in restaurants and shops
• Very low price
Both compatible with ROS
CFPs for HSR/Pepper users will be open soon