Alan Nichol, co-founder & CTO of Rasa talks about the role that transformer-based architectures play in the state-of-the-art models for dialogue and language understanding. Alan covers the dialogue transformer (aka the TED policy) as well as a new state-of-the-art lightweight, multitask transformer architecture for NLU: Dual Intent and Entity Transformer (DIET) designed by the Rasa research team.
Research Updates from Rasa: Transformers in NLU and Dialogue
1. Research Updates from Rasa:
Transformers in NLU and Dialogue
Alan Nichol
Co-Founder & CTO, Rasa
2. We’ll cover two recent research projects from Rasa
● Why we do research at Rasa
● DIET: new NLU architecture
● TED: new dialogue policy
● Q&A
● More resources
5. To do that, we’re building the standard infrastructure for conversational AI
@alanmnichol
Open Source Community Applied Research
6. *Cumulative Pypi and Github downloads
of Rasa open source tools
Downloads
2M+ 8,000+
Forum Members
300+
Contributors
Rasa X: downloaded in 135 countries
Downloads
Our community is friendly, global, and growing fast
RASA COMMUNITY
11. DIET is our new neural network architecture for NLU
💡 To understand how DIET works, check
our YouTube channel
What is DIET?
● New state of the art neural network architecture for NLU
● Predicts intents and entities together
● Plug and play pretrained language models
12. How to use DIET in your Rasa project
Here’s an example config.yml
Before the DIET model, you can specify any
featurizer.
In our experiments, we use:
● Sparse features (aka no pre-trained model)
● GloVe (word vectors)
● BERT (large language model)
● ConveRT (pre-trained encoder for
conversations)
13. Experiments on the NLU-benchmark dataset
● Repo is on github
● Domain: human-robot interaction (smart home setting)
● 64 different intents
● 54 different entity types
● ~26k labelled examples
Previous state of the art:
● HERMIT NLU (Vanzo, Bastianelli, and Lemon @ SIGdial 2019)
● uses ELMo embeddings
14. Result 1: DIET outperforms SotA even without any pretrained embeddings
Previous state of the art: intent: 87.55 entities: 84.74
@alanmnichol
18. Which featurizer is best depends on your dataset, so try different ones!
At Rasa, we don’t believe in “one size fits all”
machine learning
● We aim to provide sensible defaults and
suggestions
● BUT even more important that Rasa models
are easy to customize
Share your results and compare notes with 8000+
Rasa developers at forum.rasa.com
30. We found out that the Transformer Embedding Dialogue policy can untangle
sub-dialogues
@alanmnichol
paper
31. TED is available in Rasa 1.3 and up
The embedding policy (TED)
● better at handling unseen edge cases
● less likely to get confused when users
behave in highly unexpected ways
● used in combination with other policies
● Becoming the new default ML policy
(replacing KerasPolicy)
With all contextual assistants, please write tests!
@alanmnichol
32. So we now have the algorithms to handle this
@alanmnichol
33. But you also need training data!
@alanmnichol
Review conversations and
improve your assistant based
on what you learn
Collect
conversations
between users and
your assistant
Ship updates using
continuous
integration &
deployment
34. Build minimum
viable assistant Improve by
talking to the
assistant
Improve using
conversations
with real users
Improve using
conversations
with test users
Quality of assistant
Rasa Open Source (Local)
Rasa X (Server)
Rasa Open Source is an open
source framework for natural
language understanding, dialogue
management, and integrations.
Rasa X is a toolset used
to improve a contextual
assistant built using
Rasa Open Source.
Deploy your minimum viable assistant on a server and improve it using Rasa X
37. How can the transitions be effectively tested in a large
dialogue tree, to ensure that the policy works as expected?
38. Will Rasa provide a way to select the best policy based on my
use case and training data?
39. Does Rasa support multi-label classification for intents and
entities?
40. Is there a way to do cross domain transfer learning using
Rasa? (For instance, a healthcare assistant trained on
healthcare terminology to an IT help desk assistant)
43. ● Unpacking the TED Policy in Rasa Open Source ( Rasa Blog)
● Introducing DIET: state-of-the-art architecture that outperforms fine-tuning BERT
and is 6X faster to train (Rasa Blog)
● Rasa Algorithm Whiteboard - Diet Architecture 1: How it Works (YouTube)
● Rasa Algorithm Whiteboard - Diet Architecture 2: Design Decisions (YouTube)
Further Reading