Managing Dialog Strategy In Multiskill AI Assistant.pdf
1. DIALOG STRATEGY MANAGEMENT IN
MULTISKILL AI ASSISTANTS WITH
DEEPPAVLOV
Daniel Kornev, CPO & Dilyara Baymurzina, Researcher @ DeepPavlov.ai
2. I’M AVAILABLE TO CHAT DURING THIS SESSION
Click on “1:1 Chat,” then “Ask the Presenter/Moderator” button to submit your question.
After the session is over, connect with me via attendee chat by searching for my name.
14. Grand challenge: create a socialbot that can engage in a fun, high
quality conversation on popular societal topics for 20 minutes and
achieve an average rating of at least 4.0/5.0.
Alexa Prize 3 Winners:
1.Emora - $500K, 3.8/5.0, 7 min, 32 sec, Emora University
2.Chirpy Cardinal -- $100K, 3.17/5.0, Stanford University
3.Alquist -- 50k$, (2nd in ‘17, 3rd in ‘19, ‘20), Czech Technical University
17. • Open repository of NLP models
and pipelines
• easy to find and reuse NLP
components for development of
new skills or extension of
existing
• Open repository of
conversational skills
• alternative implementations of
the most popular skills
• Open hub for AI Assistant
distributions
• general and domainindustry
specific distributions of skill sets
DeepPavlov.ai +
22. DeepPavlov.ai
What is (all) harvesters’ status?
Intents
What is harvester status?
Prepare rover for a trip
domain.yml
intents:
- all_statuses_request
- status_request
[..]
- trip_request
responses:
utter_status_request:
- text: "The harvester {harv_id} is {harv_status}.“
[..]
nlu.md
## intent:all_statuses_request
- What is the harvesters status?
- What is the combines status?
[..]
stories.md
## harv_status + prepare_trip
* status_request
- utter_status_request
stories.md – training for dialogs
RASA Configs
nlu.md – training for intents & slots
domain.yml – basic ontology for skill
Simple and easy to use
23. DeepPavlov.ai
Works with GoBot
GoBotWrapper
Obtains data from DB
Generates NLG
stories.md – training for dialogs
Tutorial in Google Colab
nlu.md – training for intents & slots
domain.yml – basic ontology for skill
Full sample: use it to train your GoBot
and save its output to your Skill
GoBotWrapper
[..]
@app.route("/respond", methods=["POST"])
def respond():
[..]
dialogs = request.json["dialogs"]
for dialog in dialogs:
sentence = dialog['human_utterances'][-1]
['annotations'].get("spelling_preprocessing")
[..]
uttr_resp, conf = gobot(sentence)
response = gobot.getNlg(uttr_resp)
responses.append(response)
confidences.append(conf)
return jsonify(list(zip(responses, confidences)))
24. DeepPavlov.ai
<?xml version="1.0" encoding="UTF-8"?>
<aiml version="2.0">
<category>
<pattern>I AM ^ TIRED</pattern>
<template>
🙁
<random>
<li>Get some sleep<get name="name"/>.
You're very tired.</li>
<li>Have a rest and be happy! How can
I help you?</li>
</random>
</template>
</category>
[..]
</aiml>
</xml>
Assistant Profile (Name, Place, etc.)
Patterns
Greeting scenario
Topics
Looks up for patterns
Dialog Processing
Picks random pre-defined response
If not sure, confidence is low (0.2)
Returns response + confidence
25. DEMO TIME
Deepy as a Multiskill AI Assistant
DeepPavlov.ai +
> docker-compose up --build
> curl --location --request POST 'localhost:4242' --header 'Content-
Type: application/json' --data-raw '{"user_id": "name", "payload":
“what do I do here?"}'
26.
27. DeepPavlov.ai
Dream AI Assistant Demo
• Own scenarios + Wikidata & ODQA
Amazon Alexa
• Own scenarios (w/ 3rd Party integrations) + Evi
+ 3rd Party Skills
Yandex Alice
• Own scenarios (w/ 3rd Party integrations) +
Yandex Search + 3rd Party Skills
28. DeepPavlov.ai
Step 1
Dialog Management is complex
Correctly identify user’s utterance’s goal
Step 2 Correctly generate response(s)
Step 3 Correctly pick the best response
Alice: what’s happened in city hall?
Bot: [Which city hall [Entity Disambiguation]? Where (NYC | Local)? When (Today | Yesterday | at some time)?]
Domain:News: Occupy City Hall happened in July 2020 | Confidence: 0.85
Domain:Factoid: City Hall is the seat of New York City government | Confidence: 0.95
Retrieval: By 1969 City Hall was described as badly crowded because of Bellevue growing … | Confidence: 0.75
Alice: What’s happened in city hall?
Bot: City Hall is the seat of New York City government
29. DeepPavlov.ai
At Skill Selector Level you may have more than skill to choose from
Inside each the Skill you may have more than one context to choose from:
• Story/Scenario (in goal-oriented systems)
• Scenario/Dialog Tree (in chat-based systems)
Challenge: How to make sure that the response we’ve
chosen addresses the user’s goal the best?
At Response Selector Level you may have more than one hypothesis to choose
from
30.
31. DeepPavlov.ai
* DREAM technical report for the Alexa Prize 2019, Yuri Kuratov, et al. http://bit.ly/DP_Dream_TR_2020
Skill Selector
● Based on annotated human utterance and dialog state, in particular, topics and dialog
acts, picks up list of skills to try to generate response hypotheses
Skills
● Different types of skills: template-based, AIML, retrieval skills
● Each of selected skills may generate zero/one/several hypotheses
Response Selector
● Based on toxicity and blacklisted words annotations filters inappropriate hypotheses
● Based on dialog state, annotated hypotheses, their confidences and evaluation scores
picks up the best hypothesis
32. DeepPavlov.ai
Filtration
● Based on toxicity and blacklisted words annotations filters inappropriate hypotheses
Evaluation
● Calculates final single-value scores for hypotheses from confidences, conversation
evaluation scores
Hand-written Heuristics
● Gives priority to special cases, like high-priority intents, significantly increasing final score
Penalties
● Decreases final scores for repetitions
Prompts
● Adds link-to questions to final response if short with no requests reply from some
particular skills
* DREAM technical report for the Alexa Prize 2019, Yuri Kuratov, et al. http://bit.ly/DP_Dream_TR_2020
33. DeepPavlov.ai
● Final score depends on confidence which is assigned by hands/rules in template-
based skills
● Final score formula is empirically created
● No dependency on dialog acts
● Almost only single-turn dialog management
● Latency **
* Latency is partially solved with setting up timeout management and software optimization, but in
Conversational AI it ultimately requires powerful AI-optimized hardware like NVIDIA GPU clusters
34. DeepPavlov.ai
● Final score should not depend on confidence
● Final score should be calculated by one ranking model
● Some dialog acts require responses with particular dialog acts
● Priority to multi-turn scripted skills
* DREAM technical report for the Alexa Prize 2019, Yuri Kuratov, et al. http://bit.ly/DP_Dream_TR_2020
35. DeepPavlov.ai
There are at least 2 ways* look at a Conversation:
* S. Eggins & D. Slade, Analysing Casual Conversation.London: Cassell, 1997
Pragmatic Conversation
• Motivated by clear pragmatic purpose. Aka task-
oriented. Usually very short. Formal.
Casual Conversation
• NOT motivated by clear pragmatic purpose. Can
and often are lengthy. Informal, can have humor.
Aka chit-chat.
36. There are at least 4 different approaches to classify utterances & sentences:
Speech Acts*
• Work at utterance level. Hearer interprets
speaker’s intentions and tries to interpret
desired actions from hearer.
Dialog Acts
• Work at sentence level. Ascribe each sentence’s
dialog function to the entire utterance.
Speech Functions
• Work at utterance level. Similar to Speech Acts
but they produce utterance’s through its role in
Discourse.
Utterance Acts**
• Work at utterance level but
include body movements.
**Not applicable for us as we can’t see the person
*Original authors were not concerned with
Discourse
38. DeepPavlov.ai
Eggins and Martin
(1997)
Casual Conversation is about people, not facts
Discourse Strategy Advice:
In a conversation w/ user:
Explore their interpersonal relations through
confronting moves
39. Eggins and Slade
(1997)
Speech Functions control Discourse:
Give
information
Demand
information
Speech Acts
Discourse Moves
Speech Function Example:
open:initiate:give_opinion
40. Text OR
Voice
Input
TTS
(NeMo)
Spell
Checking
NeMo ASR Harvesters
Status
Chit-Chat
(AIML)
Emotion
BUILT-IN
SKILL
SELECTOR
RULE-BASED
RESPONSE
SELECTOR
Speech Function Classifier
Discourse Management
Speech Function Predictor
Speech
Function
Classifier
Discourse Mgmt
Speech
Function
Predictor
41. DeepPavlov.ai
Step 1
Part I: Use Speech Function to understand user’s goal
Classify user utterance’s Speech Function
Step 2 Predict the Speech Function for the best response
Step 3 Identify whether you understand user’s goal
Alice: what’s happened in city hall?
Speech Function: React.Rejoinder.Support.Track.Clarify
Speech Function Predictor: React.Rejoinder.Support.Track.Clarify | Confidence: 0.85
Speech Function Predictor: React.Respond.Support.Reply.Answer | Confidence: 0.79
Speech Function Predictor: React.Rejoinder.Confront.Response.Rechallenge | Confidence: 0.75
Speech Function Predictor: Predicts that it is a good idea to React.Rejoinder.Support.Track.Clarify
Bot: [Which city hall [Entity Disambiguation]? Where (NYC | Local)? When (Today | Yesterday | at some time)?
42. DeepPavlov.ai
Step 4 In each Skill, generate relevant Speech Function response
Step 5 For each hypothesis, predict Speech Function for user response
Domain:Factoid: Which City Hall? | React.Rejoinder.Support.Track.Clarify | Confidence: 0.85
Domain:News: When? | React.Rejoinder.Support.Track.Clarify | Confidence: 0.95
…
Speech Function Predictor: React.Rejoinder.Support.Track.Clarify | Confidence: 0.83
Speech Function Predictor: React.Respond.Support.Reply.Answer | Confidence: 0.73
Speech Function Predictor: React.Resoind.Confront.Reply.Disawow | Confidence: 0.65
Domain:Factoid:
Which City Hall?
Speech Function Predictor: React.Rejoinder.Support.Track.Clarify | Confidence: 0.81
Speech Function Predictor: React.Respond.Support.Reply.Answer | Confidence: 0.75
Speech Function Predictor: React.Resoind.Confront.Reply.Disawow | Confidence: 0.64
Domain:News:
When?
Part II: Use Speech Function Predictor to predict user’s response
43. DeepPavlov.ai
Step 6
Part III: Use Speech Function to understand user’s goal
In Response Selector, ignore irrelevant hypotheses
Step 7 In Response Selector, identify what Conversation path you’re in
Step 8 In Response Selector, give greenlight to hypothesis that is best
for the recognized Conversation type
Domain:Factoid: Which City Hall? | React.Rejoinder.Support.Track.Clarify | Confidence: 0.85
Domain:News: When? | React.Rejoinder.Support.Track.Clarify | Confidence: 0.95
User’s Utterance: React.Rejoinder.Support.Track.Clarify | Conversation Type: Casual
Domain:Factoid: Which City Hall? | React.Rejoinder.Support.Track.Clarify | Confidence: 0.85
Domain:News: When? | React.Rejoinder.Support.Track.Clarify | Confidence: 0.95
C. Type: Pragmatic
C. Type: Casual
45. Multiskill
orchestration
Conversa-
tionalskills
NLP
frameworks
ML platforms
Proprietary Open Source
▪ Multiskill orchestration
• DeepPavlov Agent is an engine for
conversational skill deployment and
orchestration
▪ Conversational skills
• DeepPavlov Dream is a collection of pre-
build conversational skills and a default
distribution package for Dream AI
Assistant
▪ NLP frameworks
• DeepPavlov Library provides pretrained
models and simple declarative approach
to build NLP processing pipelines
▪ ML platforms
• TensorFlow and PyTorch as backends
46. demo.deeppavlov.ai select [Deepy]
Web Demo
@deeppavlov_dream_ai_bot
TG Bot
github.com/deepmipt/deepy
Build Your AI Assistant:
Clone and build your own!
medium.com/deeppavlov
Read us:
forum.deeppavlov.ai
Talk to us: @DeepPavlovDreamDiscussio
ns
TG:
@DeepPavlov
Twitter/TG:
DeepPavlov.ai +