This document provides an overview of Frankbot, an AI assistant created by Anthropic to be helpful, harmless, and honest. It summarizes the key aspects of Frankbot, including its use of historical customer support data to train models to intercept and resolve common customer queries, its methodology for offline training and online processing, and how it is periodically refreshed and taught by customer support agents. Metrics are presented showing how Frankbot has helped increase customer satisfaction scores while reducing average first response times for customers.
Student profile product demonstration on grades, ability, well-being and mind...
ML Framework for auto-responding to customer support queries
1. Frankbot - ML framework for auto-responding
to customer support queries
2. Outline of the talk
● Introduction to Freshdesk
● Motivation and Objectives
● Datasets for model training
● Modeling Methodology
○ Offline training
○ Online processing
○ Onboarding a customer account
○ Periodic model refresh
○ Teach the bot
● Metrics and business impact
○ Understanding the metrics
○ Challenges and learnings
3. Introduction to Freshdesk
Freshdesk is a multi-channel cloud based customer support product, which enables businesses to
● Streamline all customer conversations in one place - these are conversations between the business and its end
customers
● Automate repetitive work and make support agents more efficient
● Enable support agents to collaborate with other teams to resolve issues faster
● Freshdesk tickets are a record of customer conversations across channels (read phone, chat, e-mail, social, etc.)
○ A typical conversation includes customer queries and agent responses
○ Frequently recurring customer queries are called T1 tickets
● Freshdesk currently has ~150,000 customers from across the world
Some statistics from companies using Freshdesk
● Average proportion of T1 tickets - 80%
● Average proportion of tickets with answers in the knowledge base - 60%
● Average proportion of tickets with answers in the ticket conversation - 70%
4. Motivation and Objectives
● To build a Machine learning based bot which can do the following
○ Intercept and auto-resolve T1 tickets which are frequently recurring in the support helpdesk
○ Leverage content from the business’ Knowledge base to answer T1 queries
○ Reduce time spent by support agents on T1 tickets, thereby enhancing their overall
productivity levels
○ Identify historical tickets which are similar to a new ticket - agents can resolve tickets faster by
looking up information contained in the similar ticket
● Enabling support agents to understand the different types of questions which are raised by
customers
● Help support agents create FAQs which can in turn enhance the bot’s self service potential
● Enable support agents to train the bot further by mapping customer queries to expected responses
6. Datasets for model training
● Source - Freshdesk data pertaining to customer (business) accounts
○ Includes tickets and Knowledge base articles, FAQs
○ Includes tickets from different channels such as e-mail, portal (raised on website),
chat, social and phone
● Data of different accounts - All active and paid accounts with at least 100 tickets in the
last 3 months.
● Training strategy
○ One model per account trained end-end
○ Embeddings trained at industry level, models at account level
Note: Tickets from email, portal-direct, chat and phone channels account for close to 95% of
the ticket volume
7. Modeling Methodology
FAQ Answerbot
Data Train - Historical ticket data + knowledge base - test tickets
Test - tickets in the last 10 days (no overlap with train)
Candidate responses - Articles/FAQs from the Knowledge base
Preprocessing Email cleaning - signature cleaning, cleaning forwarded emails, removal of
code constructs, non-ascii characters, salutation, text below signature
Primary preprocessing - unicode normalization, lower casing, punctuation
removal, stop words removal & stemming
Secondary preprocessing - bigram processing
L1 Layer Ensemble of LSA and W2V vector space embeddings
L1 Similarity metric Cosine similarity
8. Modeling Methodology
FAQ Answerbot
L2 Features 1. % word match between the query and candidate responses
2. % word match between words with similar parts-of-speech tags
3. Word mover distance
4. Ordered bigram and trigram counts
L2 Model RandomForest / XGBoost
Thresholds Based on L1 and L2 scores (with override levels)
9. Offline Model Training
Train data
Candidate
responses (n)
Test data
(m)
Preprocessing - Email cleaning, primary &
secondary preprocessing
L1 (Embedding) Layer -
training
Candidate
responses (n)
Test
vectors (m)
Pick top k responses based on
L1 scores (m*k)
Feature Creation
Preprocessing - missing value imputation,
outlier treatment, scaling
L2 (Classification)
Layer - training
Relevance Probability Vector ((m-t)*k)
Pick top 3 based on prob ((m-t)*3) +
evaluation
Train data
(t)
Candidate
responses (k)
Test data
(m-t)
Redis
S3
Write
Lookup/w
ord
vectors/idf
Write class
model object
Write L1 & L2 thresholds for gating and
ranking
10. Online Processing
I/P Query
Preprocessing - Email cleaning, primary &
secondary preprocessing
L1 (Embedding) Layer -
transformation
Candidate
response
vectors (n)
Query
vector
Pick top k based on similarity
(1*k)
Feature Creation
Preprocessing - missing value imputation,
outlier treatment, scaling
L2 (Classification)
Layer - prediction
Relevance Probability Vector (1*k)
Pick top 3 based on prob (1*3)
Redis
S3
Read
Lookup/w
ord
vectors/idf
Read class
model object
Read L1 & L2 thresholds for gating and
ranking
11. Onboarding a customer account
● Onboarding a new customer account involves extracting tickets and articles from the data
lake and training the L1 model (LSA)
● Onboarding also involves choosing the right pre-trained word embedding corresponding
to the account’s industry
○ Example industries : Retail, Financial services, SAS, Healthcare, Education
● An ensemble of LSA and W2V embeddings is used to generate L1 scores for each
(query, response) pairs
● A downstream classification (L2) model is trained to generate model confidence scores
for each (query, {response}) tuple
○ If enough data is not available for the concerned account, an industry-level L2
model is used
● Thresholding, i.e. deciding whether to answer a given query or not, is based on both L1
and L2 scores
12. ● Model refresh is key to ensuring that the models are up to date and stay relevant over
time
● This is done once a week; or as soon as an account accumulates a sizeable number of
new queries or Knowledge base updates
● It involves the following steps
○ Retraining the LSA model after including the newly accumulated data
○ Incremental training of word vectors with new data
○ Retraining the L2 (classification) model on recent data
■ The L2 model is trained by manually labeling if the responses from the L1 layer are
relevant or not (1/0)
■ A 3rd party company is engaged to label these responses
Periodic model refresh
13. Teach the bot
● Teach the bot is a feature that allows customer support agents to explicitly train the bot by
ingesting Q → A mappings
● When the Answerbot fails to respond to a query (Q), the agent can point the bot to the expected
response (A) which should have been returned
● If a suitable response (A) does not exist in the Knowledge base, it can be created on-the-fly
● This expected response (A) is consumed and mapped to be close to the query vector (Q) in the
L1 vector space
○ This ensures that article A would show up for future queries that are similar to Q
○ The same feature is re-purposed to resolve incorrect bot responses as well
○ This feature also helps to improve the overall coverage levels of the Answerbot
14. Metrics and business impact
Month
# Active
Clients
# Requests # Responded # Helpful # No Feedback % Deflection
May’18 97 10,805 6,075 1,657 1,868 15.34%
Jun’18 151 22,195 12,969 2,550 5,981 11.49%
July’18 182 30,376 19,330 3,792 5,669 12.48%
Aug’18 242 50,049 29,948 5,940 7,839 11.87%
Sep’18 347 63,587 38,064 8,308 10,112 13.07%
Oct’18 457 101,493 56,390 16,589 33,360 16.34%
Nov’18 478 130,687 78,902 25,680 46,555 19.65%
Dec’18 480 137,517 82,366 23,713 52,772 17.24%
● CSAT* - 79% with bots and 72% without bots
● Average First Response Time (overall) - 13 hrs with bots and 19 hrs without bots
*CSAT - Customer Satisfaction Score
15. Understanding the Metrics
● # Active clients - number of customers who are exposing the bot to their customers in their
support portal
● # Requests - number of requests that the bot gets
● # Responded - number of requests responded/answered by the bot
● # Helpful - number of requests where the bot responses were helpful
○ Alongside every bot response, a “Was this helpful?” message is also shown and the user’s
feedback is solicited. This helps in tracking helpful responses.
● # No Feedback - number of bot responses for which there was no feedback from users
● % Deflection - Ratio of the # Helpful and # Requests
16. Challenges and learnings
Challenges:
● Developing a preprocessing mechanism that can extract only the salient components from
messy emails
● Handling the complexity of storing and retrieving vector of floats (idfs, SVD components, word
vectors) for every account
● Serving predictions at low latency
● Handling kafka streams for updating content in real time - Spark streaming
● Usage of the right tools for monitoring and finding bugs in the codebase in a proactive manner
Lessons Learnt:
● Start with a simple model and add incremental improvements over a period of time
● Involve data engineers at the very beginning to create pipelines for data; front-end engineers for
making changes to the UI
● Define success metrics and inform stakeholders about what a reasonable target is
18. Appendix
Why are some suggestions not helpful to the
user?
● Query could relate to a new topic for which there may not be enough FAQs or articles
● Query could relate to an existing topic but may contain keywords which are not in the vocabulary
- This may result in low L1 and 2 confidence which may not satisfy the thresholds
● Query may be related to a particular action - Example: “Can you connect me to an agent?”
which is a question for a task completion bot that has intent detection capabilities
● Query may not have a question or issue - Example: “I have an open ticket 3335924”
● Query may be ambiguous or unclear - Example: “discussion”