A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions

•

0 likes•119 views

The paper proposes a multimodal language understanding method called MMC-GAN for carry and place tasks. MMC-GAN uses a GAN to augment training data in the latent space, improving over single modality and dialog-based approaches. It trains an extractor network on multimodal inputs of language instructions, contexts and images to classify likely target areas. Evaluation shows MMC-GAN outperforms baselines, and the multimodal approach is needed to understand ambiguous instructions in context.

Science

Multimodal Language Understanding
for Carry and Place tasks
Aly Magassouba, Komei Sugiura and Hisashi Kawai
National Institute of Information and Communications Tech., Japan

Our target: service robots that understand ambiguous speech
Social Background
• Shortage of manpower that can physically
support people with disability
Challenge
• Understanding ambiguous instructions
from the linguistic and visual context in a
end-to-end approach
Ambiguity
• “Put away the sugar and milk bottle”
• Meaning: “Put the sugar on the kitchen
shelf and the milk in the fridge”

The difference between our approach and literature is Generative
Adversarial Nets (GAN) data augmentation in latent space
Related work:
• Dialog-based approach [Kollar10]
– Time consuming
• End-to-end approach [Hatori18]
– Grasping task/Large dataset
• LAC-GAN [Sugiura17]
– Single modality
Novelty:
– Multimodal spoken language
understanding with GAN data
augmentation
• Key technology
– GAN data augmentation in latent space
– Different from Classic GAN[Goodfellow14]
used for generation
[Bousmalis17]
fake
real
Discriminator
Generator
OR
[Zhang17]

Theoretical background of MultiModal Classifier GAN (MMC-GAN)
Cost function of Extractor
Cost function of Generator based on
Wasserstein method
Cost function of discriminator• Data augmentation in latent space
makes more data-efficient [Sugiura17]
• Extractor was fully-connected, not
adapted to visual and multimodal inputs

Input (b)
• Instruction: “Bring this towel to the
kitchen shelf”
• Context “the robot is holding the
towel”
• Depth image
Output label
• A4= Unlikely target area
Building Carry-and-Place Multimodal Dataset for validating our method
Input (a)
• Instruction: “Put the coke bottle on
the table”
• Context “the bottle has been
grasped”
• Depth image
Output label
• A1= Very likely target area
A1 212
A2 432
A3 398
A4 240
Total 1282
Data set distribution

MMC-GAN is more accurate thanks to the data augmentation property
Method GAN
type
Instruction Instruction
+Context
Image Instruction
+Context
+Image
CNN
(baseline)
- 59.4 60.2 61.1 82.2
MMC-GAN GAN 57.5* 59.5* 58.1 85.3
MMC-GAN CGAN 56.4* 56.7* 58.2 86.2
MMC-GAN WGAN 61.8 62.7 59.7 84.4
*Not all trials converge
Metric = test-set accuracy

Sample results: MMC-GAN emphasizes the relationship between
linguistic and visual features
CorrectpredictionIncorrectprediction
Confusion matrix

Summary
• Contribution
– Multimodal spoken language understanding with GAN data augmentation
• Method
– A GAN network based on latent space feature that classifies target areas
from ambiguous instructions
• Results
– Our method outperforms DNN
– Multimodal inputs are required to solve carry-and-place tasks

Similar to A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions

Multi-modal sources for predictive modeling using deep learning

Sanghamitra Deb

Determinantal point processes (DPPs) have received significant attention in the recent years as an elegant model for a variety of machine learning tasks, due to their ability to elegantly model set diversity and item quality or popularity. Recent work has shown that DPPs can be effective models for product recommendation and basket completion tasks. We present an enhanced DPP model that is specialized for the task of basket completion, the multi-task DPP. We view the basket completion problem as a multi-class classification problem, and leverage ideas from tensor factorization and multi-class classification to design the multi-task DPP model. We evaluate our model on several real-world datasets, and find that the multi-task DPP provides significantly better predictive quality than a number of state-of-the-art models.

Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five

recsysfr

2014 khmer protocols

c.titus.brown

Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science. In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.

The Power of Auto ML and How Does it Work

Ivo Andreev

A robot operating in a partially observable environment must perform sensing actions to achieve a goal, such as clearing the objects in front of a shelf to better localize a target object at the back, and estimate its shape for grasping. A POMDP is a principled framework for enabling robots to perform such information-gathering actions. Unfortunately, while robot manipulation domains involve high-dimensional and continuous observation and action spaces, most POMDP solvers are limited to discrete spaces. Recently, POMCPOW has been proposed for continuous POMDPs, which handles continuity using sampling and progressive widening. However, for robot manipulation problems involving camera observations and multiple objects, POMCPOW is too slow to be practical. We take inspiration from the recent work in learning to guide task and motion planning to propose a framework that learns to guide POMCPOW from past planning experience. Our method uses preference learning that utilizes both success and failure trajectories, where the preference label is given by the results of the tree search. We demonstrate the efficacy of our framework in several continuous partially observable robotics domains, including real-world manipulation, where our framework explicitly reasons about the uncertainty in off-the-shelf segmentation and pose estimation algorithms.

Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...

Jisu Han

Genetic Algorithms (GAs) are a metaheuristic search technique belonging to the class of Evolutionary Algorithms (EAs). They have been proven to be effective in addressing several problems in many fields but also suffer from scalability issues that may not let them find a valid application for real world problems. Thus, the aim of providing highly scalable GA-based solutions, together with the reduced costs of parallel architectures, motivate the research on Parallel Genetic Algorithms (PGAs). Cloud computing may be a valid option for parallelisation, since there is no need of owning the physical hardware, which can be purchased from cloud providers, for the desired time, quantity and quality. There are different employable cloud technologies and approaches for this purpose, but they all introduce communication overhead. Thus, one might wonder if, and possibly when, specific approaches, environments and models show better performance than sequential versions in terms of execution time and resource usage. This thesis investigates if and when GAs can scale in the cloud using specific approaches. Firstly, Hadoop MapReduce is exploited designing and developing an open source framework, i.e., elephant56, that reduces the effort in developing and speed up GAs using three parallel models. The performance of the framework is then evaluated through an empirical study. Secondly, software containers and message queues are employed to develop, deploy and execute PGAs in the cloud and the devised system is evaluated with an empirical study on a commercial cloud provider. Finally, cloud technologies are also explored for the parallelisation of other EAs, designing and developing cCube, a collaborative microservices architecture for machine learning problems. This thesis of PhD in Management & Information Technology was presented on 20th April 2017, University of Salerno, Fisciano, Italy.

Parallel Genetic Algorithms in the Cloud

Pasquale Salza

By testing a modelling approach that utilizes minimal rules and constraints against an explicit exhaustive mixed integer programming method the research presents an alternative approach. Trade offs of time, effort, compliance, configuration and usability are considered and analyzed. By utilizing approaches from Hegde et al. 2015, Ljubić et al. 2006, and Teitz et al. 1963 much of the traditionally manual process can be automated. Further we demonstrate that a hybrid approach can enhance the productivity and usability of network planning software for telecommunications.

Informs 2019 - Flexible Network Design Utilizing Non Strict Modeling Approaches

Fabion Kauker

For the full video of this presentation, please visit: https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2017-embedded-vision-summit-chiu For more information about embedded vision, please visit: http://www.embedded-vision.com Matthew Chiu, Founder of Almond AI, presents the "Designing CNN Algorithms for Real-time Applications" tutorial at the May 2017 Embedded Vision Summit. The real-time performance of CNN-based applications can be improved several-fold by making smart decisions at each step of the design process – from the selection of the machine learning framework and libraries used to the design of the neural network algorithm to the implementation of the algorithm on the target platform. This talk delves into how to evaluate the runtime performance of a CNN from a software architecture standpoint. It then explains in detail how to build a neural network from the ground up based on the requirements of the target hardware platform. Chiu shares his ideas on how to improve performance without sacrificing accuracy, by applying recent research on training very deep networks. He also shows examples of how network optimization can be achieved at the algorithm design level by making a more efficient use of weights before the model is compressed via more traditional methods for deployment in a real-time application.

"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...

Edge AI and Vision Alliance

Strata London - Deep Learning 05-2015

Turi, Inc.

kiran_bangar

KIRAN BANGAR

NS-CUK Joint Jouarl Club: JHLee, Review on "GraphMAE: Self-Supervised Masked...

ssuser4b1f48

Complex question answering (CQA) is used for human knowledge answering and community questions answering. CQA system is essential to overcome the complexities present in the question answering system. The existing techniques ignores the queries structure and resulting a significant number of noisy queries. The complex queries, distributed knowledge, composite approaches, templates, and ambiguity are the common challenges faced by the CQA. To solve these issues, this paper presents a new manta ray foraging optimized deep contextualized bidirectional long-short term memory based adaptive galactic swarm optimization (MDCBiLSTMAGSO) for CQA. At first, the given input question is preprocessed and the similarity assessment is performed to eliminate the misclassification. Afterwards, the attained keywords are mapped into applicant results to improve the answer selection. Next, a new similarity approach named InfoSelectivity is introduced for semantic similarity evaluation based on the closeness among elements. Then, the relevant answers are classified through the MDCBiLSTM and optimized by a new manta ray foraging optimization (MRFO). Finally, adaptive galactic swarm optimization (AGSO) resultant is the best output. The proposed scheme is implemented on the JAVA platform and the outputs of designed approach achieved the better results when compared with the existing approaches in average accuracy (98.2%).

Manta ray optimized deep contextualized bi-directional long short-term memor...

IJECEIAES

BRV CTO Summit Deep Learning Talk

Doug Chang

This talk will be a summary of the recent advances in deep learning research, current trends in the industry, and the opportunities that lie ahead. We will discuss topics in research such as: Transformers, GPT-3, BERT Neural Architecture Search, Evolutionary Search Distillation, self-learning NeRF Self-Attention Also shifting industry trends such as: The move to free data Rising importance of 3D vision Using synthetic data (Sim2Real) Mobile vision & Federated Learning

The Frontier of Deep Learning in 2020 and Beyond

NUS-ISS

Automated_attendance_system_project.pptx

Naveensai51

Machine Learning in e commerce - Reboot

Marion DE SOUSA

Large Scale Distributed Deep Networks

Hiroyuki Vincent Yamazaki

As many organizations are bundling large language models (LLMs) in their products, they face the problem of rigorous model selection. This talk gives a data-centric understanding of how LLMs are built and evaluated. We will discuss the limitations of current models and pay special attention to the available evaluation protocols. How do we distinguish good models from the others? What tasks and datasets should we try or avoid? How do we incorporate feedback from our users? We will present the guidelines the attendees can use in their future experiments.

[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models

DataScienceConferenc1

In this presentation at R user group, I share about the various advance techniques I used for Kaggle competitions. Includes: Interactive visualization via leaflet, geospatial clustering via local Moran's I, feature creation, text categorization via splitTag techniques and ensemble modeling. Full code can be downloaded here: https://github.com/thiakx/RUGS-Meetup Train / test data from Kaggle: http://www.kaggle.com/c/see-click-predict-fix/data Interactive map demo: http://www.thiakx.com/misc/playground/scfMap/scfMap.html

Musings of kaggler

Kai Xin Thia

The growing population of elders in the society calls for a new approach in care giving. By inferring what activities elderly are performing in their houses it is possible to determine their physical and cognitive capabilities. In this paper we show the potential of important discriminative classifiers namely the Soft-Support Vector Machines (C-SVM), Conditional Random Fields (CRF) and k-Nearest Neighbors (k-NN) for recognizing activities from sensor patterns in a smart home environment. We address also the class imbalance problem in activity recognition field which has been known to hinder the learning performance of classifiers. Cost sensitive learning is attractive under most imbalanced circumstances, but it is difficult to determine the precise misclassification costs in practice. We introduce a new criterion for selecting the suitable cost parameter C of the C-SVM method. Through our evaluation on four real world imbalanced activity datasets, we demonstrate that C-SVM based on our proposed criterion outperforms the state-of-the-art discriminative methods in activity recognition.

IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...

csandit

Similar to A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions (20)

Multi-modal sources for predictive modeling using deep learning

Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five

2014 khmer protocols

The Power of Auto ML and How Does it Work

Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...

Parallel Genetic Algorithms in the Cloud

Informs 2019 - Flexible Network Design Utilizing Non Strict Modeling Approaches

"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...

Strata London - Deep Learning 05-2015

kiran_bangar

NS-CUK Joint Jouarl Club: JHLee, Review on "GraphMAE: Self-Supervised Masked...

Manta ray optimized deep contextualized bi-directional long short-term memor...

BRV CTO Summit Deep Learning Talk

The Frontier of Deep Learning in 2020 and Beyond

Automated_attendance_system_project.pptx

Machine Learning in e commerce - Reboot

Large Scale Distributed Deep Networks

[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models

Musings of kaggler

IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...

Recently uploaded

Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx

DiariAli

www.seribangash.com The Mariana Trench is one of the most remarkable geological features on Earth. Here are some details about it: Location: The Mariana Trench is located in the western Pacific Ocean, east of the Mariana Islands. It stretches for about 2,550 kilometers (1,580 miles) and is known as the deepest part of the world's oceans. Depth: The trench reaches incredible depths, with its deepest point known as the Challenger Deep, which plunges down to approximately 10,984 meters (36,037 feet) below sea level. To put this into perspective, if Mount Everest, the tallest mountain on Earth, were placed at the bottom of the Challenger Deep, its peak would still be over 2 kilometers (1.25 miles) underwater. Formation: The Mariana Trench was formed by the subduction of the Pacific Plate beneath the Mariana Plate. This process creates a deep trench as the heavier Pacific Plate is forced beneath the lighter Mariana Plate. Geological Features: The trench is characterized by steep, V-shaped valleys, and its walls are composed of highly compressed sedimentary rock. At the bottom of the trench, there are also large amounts of marine sediment. Pressure: The pressure at the bottom of the Mariana Trench is immense, reaching over 1,000 times the pressure at the surface. This extreme pressure creates a challenging environment for exploration and makes it difficult for organisms to survive. Exploration: Despite its extreme conditions, the Mariana Trench has been the subject of numerous scientific expeditions and explorations. One of the most famous explorations was the dive to the Challenger Deep by Swiss scientist Jacques Piccard and U.S. Navy Lieutenant Don Walsh in 1960. More recently, in 2012, filmmaker James Cameron made a solo dive to the bottom of the Challenger Deep in the Deepsea Challenger submersible. Biological Discoveries: Despite the harsh conditions, the Mariana Trench is home to a surprising variety of life forms, including unique species of deep-sea fish, crustaceans, and microbial life. Some organisms have adapted to survive in the extreme pressure and darkness of the trench. Environmental Importance: Studying the Mariana Trench provides valuable insights into the geology, biology, and oceanography of the deep sea. It also helps scientists better understand the processes that shape the Earth's crust and the distribution of life in the oceans. Conservation: Due to its remote location and extreme depths, the Mariana Trench has remained relatively untouched by human activity. However, there is growing concern about the potential impacts of deep-sea mining and pollution on this fragile ecosystem, highlighting the need for conservation efforts to protect this unique environment. https://seribangash.com/barber-shop-business-complete-guide-for-beginners/ https://seribangash.com/legend-virat-kohli-in-cricket-history/

The Mariana Trench remarkable geological features on Earth.pptx

seri bangash

Conjugation, transduction and transformation

Areesha Ahmad

Dr. E. Muralinath_ Blood indices_clinical aspects

muralinath2

Stages in the normal growth curve

Areesha Ahmad

Site Acceptance Test .

Poonam Aher Patil

module for grade 9 for distance learning

levieagacer

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...

Silpa

Chemistry 5th semester paper 1st Notes.pdf

Sumit Kumar yadav

Criminology is the scientific study of crime, criminals,and the criminal justice system. It is an interdisciplinaryfield that draws upon knowledge and methodologiesfrom sociology, psychology, law, biology, statistics, andother related disciplines. Criminologists examine variousaspects of crime, including its causes, consequences,prevention, and control. Three broad models of criminal behaviorsare the following: psychological,sociological and biological models. The primary goal of criminology is to understand whyindividuals commit crimes and to develop effective strategiesfor crime prevention and reduction. Criminologists study the social, economic, and psychological factors that contributeto criminal behavior.

Exploring Criminology and Criminal Behaviour.pdf

rohankumarsinghrore1

The computation of anti-derivatives is just an in-tellectual challenge, we know how to take deriv-atives, but … can we invert the process? We call this Computing the indefinite integral . In the last presentation we have seen a few indefinite integrals (we called them bricks), but they did not include the anti-derivative of many functions! We are going to try and do better !

COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)

AkefAfaneh2

Introduction of DNA analysis in Forensic's .pptx

rohankumarsinghrore1

Context. WASP-76 b has been a recurrent subject of study since the detection of a signature in high-resolution transit spectroscopy data indicating an asymmetry between the two limbs of the planet. The existence of this asymmetric signature has been confirmed by multiple studies, but its physical origin is still under debate. In addition, it contrasts with the absence of asymmetry reported in the infrared (IR) phase curve. Aims. We provide a more comprehensive dataset of WASP-76 b with the goal of drawing a complete view of the physical processes at work in this atmosphere. In particular, we attempt to reconcile visible high-resolution transit spectroscopy data and IR broadband phase curves. Methods. We gathered 3 phase curves, 20 occultations, and 6 transits for WASP-76 b in the visible with the CHEOPS space telescope. We also report the analysis of three unpublished sectors observed by the TESS space telescope (also in the visible), which represents 34 phase curves. Results. WASP-76 b displays an occultation of 260±11 and 152±10 ppm in TESS and CHEOPS bandpasses respectively. Depending on the composition assumed for the atmosphere and the data reduction used for the IR data, we derived geometric albedo estimates that range from 0.05 ± 0.023 to 0.146 ± 0.013 and from <0.13 to 0.189 ± 0.017 in the CHEOPS and TESS bandpasses, respectively. As expected from the IR phase curves, a low-order model of the phase curves does not yield any detectable asymmetry in the visible either. However, an empirical model allowing for sharper phase curve variations offers a hint of a flux excess before the occultation, with an amplitude of ∼40 ppm, an orbital offset of ∼−30◦ , and a width of ∼20◦ . We also constrained the orbital eccentricity of WASP-76 b to a value lower than 0.0067, with a 99.7% confidence level. This result contradicts earlier proposed scenarios aimed at explaining the asymmetry observed in high-resolution transit spectroscopy. Conclusions. In light of these findings, we hypothesise that WASP-76 b could have night-side clouds that extend predominantly towards its eastern limb. At this limb, the clouds would be associated with spherical droplets or spherically shaped aerosols of an unknown species, which would be responsible for a glory effect in the visible phase curves.

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b

Sérgio Sacani

PSYCHOSOCIAL NEEDS. in nursing II sem pptx

Suji236384

www.whatsapp.com+917728919243 HOT & SEXY MODELS // COLLEGE GIRLS AVAILABLE FOR COMPLETE ENJOYMENT WITH HIGH PROFILE INDIAN MODEL AVAILABLE HOTEL & HOME ★ SAFE AND SECURE HIGH CLASS SERVICE AFFORDABLE RATE SATISFACTION,UNLIMITED ENJOYMENT. ★ All Meetings are confidential and no information is provided to any one at any cost. ★ EXCLUSIVE PROFILes Are Safe and Consensual with Most Limits Respected ★ Service Available In: - HOME *Star Hotel Service .In Call & Out call SeRvIcEs : ★ A-Level ★ Strip-tease ★ BBBJ (Bareback Blowjob)Receive advanced sexual techniques in different mode make their life more pleasurable. ★ Spending time in hotel rooms ★ BJ (Blowjob Without a Condom) ★ Completion (Oral to completion) ★ Covered (Covered blowjob Without a Condom)

Call Girls Ahmedabad +917728919243 call me Independent Escort Service

shivanisharma5244

fruit fly, this slide mainly made for pumpkin fruit fly, this is also known as drosophila melangastor, this type of fruit fly destroyed the mainly vegetables crops. if you want to known examples this types of fly which is destroy the pumpkin, tomato, brinjal, potato, bottle guard, ridge guard, bitter guard, cucumber, water melon, musk melon, bean, long bean and other many vegetables which has fruits. they distryed fruit fly. thank you...

pumpkin fruit fly, water melon fruit fly, cucumber fruit fly

PRADYUMMAURYA1

Grade 7 - Lesson 1 - Microscope and Its Functions

OrtegaSyrineMay

Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...

Monika Rani

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics

sakshisoni2385

Theoretical predictions and observational data indicate a class of sub-Neptune exoplanets may have water-rich interiors covered by hydrogen-dominated atmospheres. Provided suitable climate conditions, such planets could host surface liquid oceans. Motivated by recent JWST observations of K2-18 b, we self-consistently model the photochemistry and potential detectability of biogenic sulfur gases in the atmospheres of temperate sub-Neptune waterworlds for the first time. On Earth today, organic sulfur compounds produced by marine biota are rapidly destroyed by photochemical processes before they can accumulate to significant levels. Domagal-Goldman et al. suggest that detectable biogenic sulfur signatures could emerge in Archean-like atmospheres with higher biological production or low UV flux. In this study, we explore biogenic sulfur across a wide range of biological fluxes and stellar UV environments. Critically, the main photochemical sinks are absent on the nightside of tidally locked planets. To address this, we further perform experiments with a 3D general circulation model and a 2D photochemical model (VULCAN 2D) to simulate the global distribution of biogenic gases to investigate their terminator concentrations as seen via transmission spectroscopy. Our models indicate that biogenic sulfur gases can rise to potentially detectable levels on hydrogen-rich water worlds, but only for enhanced global biosulfur flux (20 times modern Earth’s flux). We find that it is challenging to identify DMS at 3.4 μm where it strongly overlaps with CH4, whereas it is more plausible to detect DMS and companion byproducts, ethylene (C2H4) and ethane (C2H6), in the mid-infrared between 9 and 13 μm. Unified Astronomy Thesaurus concepts: Exoplanet atmospheres (487); Exoplanet

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds

Sérgio Sacani

Recently uploaded (20)

Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx

The Mariana Trench remarkable geological features on Earth.pptx

Conjugation, transduction and transformation

Dr. E. Muralinath_ Blood indices_clinical aspects

Stages in the normal growth curve

Site Acceptance Test .

module for grade 9 for distance learning

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...

Chemistry 5th semester paper 1st Notes.pdf

Exploring Criminology and Criminal Behaviour.pdf

COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)

Introduction of DNA analysis in Forensic's .pptx

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b

PSYCHOSOCIAL NEEDS. in nursing II sem pptx

Call Girls Ahmedabad +917728919243 call me Independent Escort Service

pumpkin fruit fly, water melon fruit fly, cucumber fruit fly

Grade 7 - Lesson 1 - Microscope and Its Functions

Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds

A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions

1. Multimodal Language Understanding for Carry and Place tasks Aly Magassouba, Komei Sugiura and Hisashi Kawai National Institute of Information and Communications Tech., Japan

2. Our target: service robots that understand ambiguous speech Social Background • Shortage of manpower that can physically support people with disability Challenge • Understanding ambiguous instructions from the linguistic and visual context in a end-to-end approach Ambiguity • “Put away the sugar and milk bottle” • Meaning: “Put the sugar on the kitchen shelf and the milk in the fridge”

3. The difference between our approach and literature is Generative Adversarial Nets (GAN) data augmentation in latent space Related work: • Dialog-based approach [Kollar10] – Time consuming • End-to-end approach [Hatori18] – Grasping task/Large dataset • LAC-GAN [Sugiura17] – Single modality Novelty: – Multimodal spoken language understanding with GAN data augmentation • Key technology – GAN data augmentation in latent space – Different from Classic GAN[Goodfellow14] used for generation [Bousmalis17] fake real Discriminator Generator OR [Zhang17]

4. Theoretical background of MultiModal Classifier GAN (MMC-GAN) Cost function of Extractor Cost function of Generator based on Wasserstein method Cost function of discriminator• Data augmentation in latent space makes more data-efficient [Sugiura17] • Extractor was fully-connected, not adapted to visual and multimodal inputs

5. Structure of Extractor

6. Input (b) • Instruction: “Bring this towel to the kitchen shelf” • Context “the robot is holding the towel” • Depth image Output label • A4= Unlikely target area Building Carry-and-Place Multimodal Dataset for validating our method Input (a) • Instruction: “Put the coke bottle on the table” • Context “the bottle has been grasped” • Depth image Output label • A1= Very likely target area A1 212 A2 432 A3 398 A4 240 Total 1282 Data set distribution

7. MMC-GAN is more accurate thanks to the data augmentation property Method GAN type Instruction Instruction +Context Image Instruction +Context +Image CNN (baseline) - 59.4 60.2 61.1 82.2 MMC-GAN GAN 57.5* 59.5* 58.1 85.3 MMC-GAN CGAN 56.4* 56.7* 58.2 86.2 MMC-GAN WGAN 61.8 62.7 59.7 84.4 *Not all trials converge Metric = test-set accuracy

8. MMC-GAN is more accurate thanks to the data augmentation property Method GAN type Instruction Instruction +Context Image Instruction +Context +Image CNN (baseline) - 59.4 60.2 61.1 82.2 MMC-GAN GAN 57.5* 59.5* 58.1 85.3 MMC-GAN CGAN 56.4* 56.7* 58.2 86.2 MMC-GAN WGAN 61.8 62.7 59.7 84.4 *Not all trials converge MMC-GAN outperforms classic DNN Metric = test-set accuracy

9. MMC-GAN is more accurate thanks to the data augmentation property Method GAN type Instruction Instruction +Context Image Instruction +Context +Image CNN (baseline) - 59.4 60.2 61.1 82.2 MMC-GAN GAN 57.5* 59.5* 58.1 85.3 MMC-GAN CGAN 56.4* 56.7* 58.2 86.2 MMC-GAN WGAN 61.8 62.7 59.7 84.4 *Not all trials converge Metric = test-set accuracy Multimodal approach is required to solve the carry- and-place task

10. MMC-GAN is more accurate thanks to the data augmentation property Method GAN type Instruction Instruction +Context Image Instruction +Context +Image CNN (baseline) - 59.4 60.2 61.1 82.2 MMC-GAN GAN 57.5* 59.5* 58.1 85.3 MMC-GAN CGAN 56.4* 56.7* 58.2 86.2 MMC-GAN WGAN 61.8 62.7 59.7 84.4 *Not all trials converge WGAN is more stable Metric = test-set accuracy

11. Sample results: MMC-GAN emphasizes the relationship between linguistic and visual features CorrectpredictionIncorrectprediction Confusion matrix

12. Summary • Contribution – Multimodal spoken language understanding with GAN data augmentation • Method – A GAN network based on latent space feature that classifies target areas from ambiguous instructions • Results – Our method outperforms DNN – Multimodal inputs are required to solve carry-and-place tasks

A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions

Recommended

Recommended

More Related Content

Similar to A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions

Similar to A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions (20)

Recently uploaded

Recently uploaded (20)

A Multimodal Classifier Generative Adversarial Network for Carry and Place Tasks from Ambiguous Language Instructions