SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Frankbot - ML framework for auto-responding
to customer support queries
Outline of the talk
● Introduction to Freshdesk
● Motivation and Objectives
● Datasets for model training
● Modeling Methodology
○ Offline training
○ Online processing
○ Onboarding a customer account
○ Periodic model refresh
○ Teach the bot
● Metrics and business impact
○ Understanding the metrics
○ Challenges and learnings
Introduction to Freshdesk
Freshdesk is a multi-channel cloud based customer support product, which enables businesses to
● Streamline all customer conversations in one place - these are conversations between the business and its end
customers
● Automate repetitive work and make support agents more efficient
● Enable support agents to collaborate with other teams to resolve issues faster
● Freshdesk tickets are a record of customer conversations across channels (read phone, chat, e-mail, social, etc.)
○ A typical conversation includes customer queries and agent responses
○ Frequently recurring customer queries are called T1 tickets
● Freshdesk currently has ~150,000 customers from across the world
Some statistics from companies using Freshdesk
● Average proportion of T1 tickets - 80%
● Average proportion of tickets with answers in the knowledge base - 60%
● Average proportion of tickets with answers in the ticket conversation - 70%
Motivation and Objectives
● To build a Machine learning based bot which can do the following
○ Intercept and auto-resolve T1 tickets which are frequently recurring in the support helpdesk
○ Leverage content from the business’ Knowledge base to answer T1 queries
○ Reduce time spent by support agents on T1 tickets, thereby enhancing their overall
productivity levels
○ Identify historical tickets which are similar to a new ticket - agents can resolve tickets faster by
looking up information contained in the similar ticket
● Enabling support agents to understand the different types of questions which are raised by
customers
● Help support agents create FAQs which can in turn enhance the bot’s self service potential
● Enable support agents to train the bot further by mapping customer queries to expected responses
Frankbot in production
Datasets for model training
● Source - Freshdesk data pertaining to customer (business) accounts
○ Includes tickets and Knowledge base articles, FAQs
○ Includes tickets from different channels such as e-mail, portal (raised on website),
chat, social and phone
● Data of different accounts - All active and paid accounts with at least 100 tickets in the
last 3 months.
● Training strategy
○ One model per account trained end-end
○ Embeddings trained at industry level, models at account level
Note: Tickets from email, portal-direct, chat and phone channels account for close to 95% of
the ticket volume
Modeling Methodology
FAQ Answerbot
Data Train - Historical ticket data + knowledge base - test tickets
Test - tickets in the last 10 days (no overlap with train)
Candidate responses - Articles/FAQs from the Knowledge base
Preprocessing Email cleaning - signature cleaning, cleaning forwarded emails, removal of
code constructs, non-ascii characters, salutation, text below signature
Primary preprocessing - unicode normalization, lower casing, punctuation
removal, stop words removal & stemming
Secondary preprocessing - bigram processing
L1 Layer Ensemble of LSA and W2V vector space embeddings
L1 Similarity metric Cosine similarity
Modeling Methodology
FAQ Answerbot
L2 Features 1. % word match between the query and candidate responses
2. % word match between words with similar parts-of-speech tags
3. Word mover distance
4. Ordered bigram and trigram counts
L2 Model RandomForest / XGBoost
Thresholds Based on L1 and L2 scores (with override levels)
Offline Model Training
Train data
Candidate
responses (n)
Test data
(m)
Preprocessing - Email cleaning, primary &
secondary preprocessing
L1 (Embedding) Layer -
training
Candidate
responses (n)
Test
vectors (m)
Pick top k responses based on
L1 scores (m*k)
Feature Creation
Preprocessing - missing value imputation,
outlier treatment, scaling
L2 (Classification)
Layer - training
Relevance Probability Vector ((m-t)*k)
Pick top 3 based on prob ((m-t)*3) +
evaluation
Train data
(t)
Candidate
responses (k)
Test data
(m-t)
Redis
S3
Write
Lookup/w
ord
vectors/idf
Write class
model object
Write L1 & L2 thresholds for gating and
ranking
Online Processing
I/P Query
Preprocessing - Email cleaning, primary &
secondary preprocessing
L1 (Embedding) Layer -
transformation
Candidate
response
vectors (n)
Query
vector
Pick top k based on similarity
(1*k)
Feature Creation
Preprocessing - missing value imputation,
outlier treatment, scaling
L2 (Classification)
Layer - prediction
Relevance Probability Vector (1*k)
Pick top 3 based on prob (1*3)
Redis
S3
Read
Lookup/w
ord
vectors/idf
Read class
model object
Read L1 & L2 thresholds for gating and
ranking
Onboarding a customer account
● Onboarding a new customer account involves extracting tickets and articles from the data
lake and training the L1 model (LSA)
● Onboarding also involves choosing the right pre-trained word embedding corresponding
to the account’s industry
○ Example industries : Retail, Financial services, SAS, Healthcare, Education
● An ensemble of LSA and W2V embeddings is used to generate L1 scores for each
(query, response) pairs
● A downstream classification (L2) model is trained to generate model confidence scores
for each (query, {response}) tuple
○ If enough data is not available for the concerned account, an industry-level L2
model is used
● Thresholding, i.e. deciding whether to answer a given query or not, is based on both L1
and L2 scores
● Model refresh is key to ensuring that the models are up to date and stay relevant over
time
● This is done once a week; or as soon as an account accumulates a sizeable number of
new queries or Knowledge base updates
● It involves the following steps
○ Retraining the LSA model after including the newly accumulated data
○ Incremental training of word vectors with new data
○ Retraining the L2 (classification) model on recent data
■ The L2 model is trained by manually labeling if the responses from the L1 layer are
relevant or not (1/0)
■ A 3rd party company is engaged to label these responses
Periodic model refresh
Teach the bot
● Teach the bot is a feature that allows customer support agents to explicitly train the bot by
ingesting Q → A mappings
● When the Answerbot fails to respond to a query (Q), the agent can point the bot to the expected
response (A) which should have been returned
● If a suitable response (A) does not exist in the Knowledge base, it can be created on-the-fly
● This expected response (A) is consumed and mapped to be close to the query vector (Q) in the
L1 vector space
○ This ensures that article A would show up for future queries that are similar to Q
○ The same feature is re-purposed to resolve incorrect bot responses as well
○ This feature also helps to improve the overall coverage levels of the Answerbot
Metrics and business impact
Month
# Active
Clients
# Requests # Responded # Helpful # No Feedback % Deflection
May’18 97 10,805 6,075 1,657 1,868 15.34%
Jun’18 151 22,195 12,969 2,550 5,981 11.49%
July’18 182 30,376 19,330 3,792 5,669 12.48%
Aug’18 242 50,049 29,948 5,940 7,839 11.87%
Sep’18 347 63,587 38,064 8,308 10,112 13.07%
Oct’18 457 101,493 56,390 16,589 33,360 16.34%
Nov’18 478 130,687 78,902 25,680 46,555 19.65%
Dec’18 480 137,517 82,366 23,713 52,772 17.24%
● CSAT* - 79% with bots and 72% without bots
● Average First Response Time (overall) - 13 hrs with bots and 19 hrs without bots
*CSAT - Customer Satisfaction Score
Understanding the Metrics
● # Active clients - number of customers who are exposing the bot to their customers in their
support portal
● # Requests - number of requests that the bot gets
● # Responded - number of requests responded/answered by the bot
● # Helpful - number of requests where the bot responses were helpful
○ Alongside every bot response, a “Was this helpful?” message is also shown and the user’s
feedback is solicited. This helps in tracking helpful responses.
● # No Feedback - number of bot responses for which there was no feedback from users
● % Deflection - Ratio of the # Helpful and # Requests
Challenges and learnings
Challenges:
● Developing a preprocessing mechanism that can extract only the salient components from
messy emails
● Handling the complexity of storing and retrieving vector of floats (idfs, SVD components, word
vectors) for every account
● Serving predictions at low latency
● Handling kafka streams for updating content in real time - Spark streaming
● Usage of the right tools for monitoring and finding bugs in the codebase in a proactive manner
Lessons Learnt:
● Start with a simple model and add incremental improvements over a period of time
● Involve data engineers at the very beginning to create pipelines for data; front-end engineers for
making changes to the UI
● Define success metrics and inform stakeholders about what a reasonable target is
Thank You
Appendix
Why are some suggestions not helpful to the
user?
● Query could relate to a new topic for which there may not be enough FAQs or articles
● Query could relate to an existing topic but may contain keywords which are not in the vocabulary
- This may result in low L1 and 2 confidence which may not satisfy the thresholds
● Query may be related to a particular action - Example: “Can you connect me to an agent?”
which is a question for a task completion bot that has intent detection capabilities
● Query may not have a question or issue - Example: “I have an open ticket 3335924”
● Query may be ambiguous or unclear - Example: “discussion”

Weitere ähnliche Inhalte

Ähnlich wie ML Framework for auto-responding to customer support queries

1) question add targets to balanced score card
1) question  add targets to balanced score card1) question  add targets to balanced score card
1) question add targets to balanced score cardsmile790243
 
Learn data science with r programming
Learn data science with r programmingLearn data science with r programming
Learn data science with r programmingRonikSharma1
 
Learn data science with r programming
Learn data science with r programmingLearn data science with r programming
Learn data science with r programmingNikhilsharma1159
 
Learn data science with r programming (1)
Learn data science with r programming (1)Learn data science with r programming (1)
Learn data science with r programming (1)Sagag55
 
Learn data science with r programming
Learn data science with r programmingLearn data science with r programming
Learn data science with r programmingKeshavSain2
 
Choose the Right Problems to Solve with ML by Spotify PM
Choose the Right Problems to Solve with ML by Spotify PMChoose the Right Problems to Solve with ML by Spotify PM
Choose the Right Problems to Solve with ML by Spotify PMProduct School
 
Padma Jalneela updated
Padma Jalneela updatedPadma Jalneela updated
Padma Jalneela updatedPadma Jalneela
 
AI Talks Live - ML.NET and NLP (with ONNX)
AI Talks Live - ML.NET and NLP (with ONNX)AI Talks Live - ML.NET and NLP (with ONNX)
AI Talks Live - ML.NET and NLP (with ONNX)Mauro Bennici
 
Sai Krishna_Resume
Sai Krishna_ResumeSai Krishna_Resume
Sai Krishna_Resumesai krishna
 
CV - Luthfi Mohamad Latief
CV - Luthfi Mohamad LatiefCV - Luthfi Mohamad Latief
CV - Luthfi Mohamad Latieffahriyah
 
Artificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInArtificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInBill Liu
 
Teaching Data-driven Video Processing via Crowdsourced Data Collection
Teaching Data-driven Video Processing via Crowdsourced Data CollectionTeaching Data-driven Video Processing via Crowdsourced Data Collection
Teaching Data-driven Video Processing via Crowdsourced Data CollectionMatthias Trapp
 
Sales Training
Sales TrainingSales Training
Sales Trainingkktv
 
Karith_Rungwattana_Resume 201603 v 1.0
Karith_Rungwattana_Resume 201603 v 1.0Karith_Rungwattana_Resume 201603 v 1.0
Karith_Rungwattana_Resume 201603 v 1.0Karith Rungwattana
 
Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022SkillCertProExams
 
2022-October In-person Meetup-Barcelona Admins Group.pdf
2022-October In-person Meetup-Barcelona Admins Group.pdf2022-October In-person Meetup-Barcelona Admins Group.pdf
2022-October In-person Meetup-Barcelona Admins Group.pdfanimuscrm
 

Ähnlich wie ML Framework for auto-responding to customer support queries (20)

1) question add targets to balanced score card
1) question  add targets to balanced score card1) question  add targets to balanced score card
1) question add targets to balanced score card
 
Learn data science with r programming
Learn data science with r programmingLearn data science with r programming
Learn data science with r programming
 
Learn data science with r programming
Learn data science with r programmingLearn data science with r programming
Learn data science with r programming
 
Learn data science with r programming (1)
Learn data science with r programming (1)Learn data science with r programming (1)
Learn data science with r programming (1)
 
Learn data science with r programming
Learn data science with r programmingLearn data science with r programming
Learn data science with r programming
 
Resume shutima p_dataeng01
Resume shutima p_dataeng01Resume shutima p_dataeng01
Resume shutima p_dataeng01
 
Choose the Right Problems to Solve with ML by Spotify PM
Choose the Right Problems to Solve with ML by Spotify PMChoose the Right Problems to Solve with ML by Spotify PM
Choose the Right Problems to Solve with ML by Spotify PM
 
Padma Jalneela updated
Padma Jalneela updatedPadma Jalneela updated
Padma Jalneela updated
 
AI Talks Live - ML.NET and NLP (with ONNX)
AI Talks Live - ML.NET and NLP (with ONNX)AI Talks Live - ML.NET and NLP (with ONNX)
AI Talks Live - ML.NET and NLP (with ONNX)
 
Sai Krishna_Resume
Sai Krishna_ResumeSai Krishna_Resume
Sai Krishna_Resume
 
CV - Luthfi Mohamad Latief
CV - Luthfi Mohamad LatiefCV - Luthfi Mohamad Latief
CV - Luthfi Mohamad Latief
 
Quest Back 2010
Quest Back 2010Quest Back 2010
Quest Back 2010
 
Artificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInArtificial Intelligence at LinkedIn
Artificial Intelligence at LinkedIn
 
Preshanth without information
Preshanth without informationPreshanth without information
Preshanth without information
 
Teaching Data-driven Video Processing via Crowdsourced Data Collection
Teaching Data-driven Video Processing via Crowdsourced Data CollectionTeaching Data-driven Video Processing via Crowdsourced Data Collection
Teaching Data-driven Video Processing via Crowdsourced Data Collection
 
Sales Training
Sales TrainingSales Training
Sales Training
 
Karith_Rungwattana_Resume 201603 v 1.0
Karith_Rungwattana_Resume 201603 v 1.0Karith_Rungwattana_Resume 201603 v 1.0
Karith_Rungwattana_Resume 201603 v 1.0
 
Resume_Suneeta
Resume_SuneetaResume_Suneeta
Resume_Suneeta
 
Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022
 
2022-October In-person Meetup-Barcelona Admins Group.pdf
2022-October In-person Meetup-Barcelona Admins Group.pdf2022-October In-person Meetup-Barcelona Admins Group.pdf
2022-October In-person Meetup-Barcelona Admins Group.pdf
 

Kürzlich hochgeladen

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 

Kürzlich hochgeladen (20)

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 

ML Framework for auto-responding to customer support queries

  • 1. Frankbot - ML framework for auto-responding to customer support queries
  • 2. Outline of the talk ● Introduction to Freshdesk ● Motivation and Objectives ● Datasets for model training ● Modeling Methodology ○ Offline training ○ Online processing ○ Onboarding a customer account ○ Periodic model refresh ○ Teach the bot ● Metrics and business impact ○ Understanding the metrics ○ Challenges and learnings
  • 3. Introduction to Freshdesk Freshdesk is a multi-channel cloud based customer support product, which enables businesses to ● Streamline all customer conversations in one place - these are conversations between the business and its end customers ● Automate repetitive work and make support agents more efficient ● Enable support agents to collaborate with other teams to resolve issues faster ● Freshdesk tickets are a record of customer conversations across channels (read phone, chat, e-mail, social, etc.) ○ A typical conversation includes customer queries and agent responses ○ Frequently recurring customer queries are called T1 tickets ● Freshdesk currently has ~150,000 customers from across the world Some statistics from companies using Freshdesk ● Average proportion of T1 tickets - 80% ● Average proportion of tickets with answers in the knowledge base - 60% ● Average proportion of tickets with answers in the ticket conversation - 70%
  • 4. Motivation and Objectives ● To build a Machine learning based bot which can do the following ○ Intercept and auto-resolve T1 tickets which are frequently recurring in the support helpdesk ○ Leverage content from the business’ Knowledge base to answer T1 queries ○ Reduce time spent by support agents on T1 tickets, thereby enhancing their overall productivity levels ○ Identify historical tickets which are similar to a new ticket - agents can resolve tickets faster by looking up information contained in the similar ticket ● Enabling support agents to understand the different types of questions which are raised by customers ● Help support agents create FAQs which can in turn enhance the bot’s self service potential ● Enable support agents to train the bot further by mapping customer queries to expected responses
  • 6. Datasets for model training ● Source - Freshdesk data pertaining to customer (business) accounts ○ Includes tickets and Knowledge base articles, FAQs ○ Includes tickets from different channels such as e-mail, portal (raised on website), chat, social and phone ● Data of different accounts - All active and paid accounts with at least 100 tickets in the last 3 months. ● Training strategy ○ One model per account trained end-end ○ Embeddings trained at industry level, models at account level Note: Tickets from email, portal-direct, chat and phone channels account for close to 95% of the ticket volume
  • 7. Modeling Methodology FAQ Answerbot Data Train - Historical ticket data + knowledge base - test tickets Test - tickets in the last 10 days (no overlap with train) Candidate responses - Articles/FAQs from the Knowledge base Preprocessing Email cleaning - signature cleaning, cleaning forwarded emails, removal of code constructs, non-ascii characters, salutation, text below signature Primary preprocessing - unicode normalization, lower casing, punctuation removal, stop words removal & stemming Secondary preprocessing - bigram processing L1 Layer Ensemble of LSA and W2V vector space embeddings L1 Similarity metric Cosine similarity
  • 8. Modeling Methodology FAQ Answerbot L2 Features 1. % word match between the query and candidate responses 2. % word match between words with similar parts-of-speech tags 3. Word mover distance 4. Ordered bigram and trigram counts L2 Model RandomForest / XGBoost Thresholds Based on L1 and L2 scores (with override levels)
  • 9. Offline Model Training Train data Candidate responses (n) Test data (m) Preprocessing - Email cleaning, primary & secondary preprocessing L1 (Embedding) Layer - training Candidate responses (n) Test vectors (m) Pick top k responses based on L1 scores (m*k) Feature Creation Preprocessing - missing value imputation, outlier treatment, scaling L2 (Classification) Layer - training Relevance Probability Vector ((m-t)*k) Pick top 3 based on prob ((m-t)*3) + evaluation Train data (t) Candidate responses (k) Test data (m-t) Redis S3 Write Lookup/w ord vectors/idf Write class model object Write L1 & L2 thresholds for gating and ranking
  • 10. Online Processing I/P Query Preprocessing - Email cleaning, primary & secondary preprocessing L1 (Embedding) Layer - transformation Candidate response vectors (n) Query vector Pick top k based on similarity (1*k) Feature Creation Preprocessing - missing value imputation, outlier treatment, scaling L2 (Classification) Layer - prediction Relevance Probability Vector (1*k) Pick top 3 based on prob (1*3) Redis S3 Read Lookup/w ord vectors/idf Read class model object Read L1 & L2 thresholds for gating and ranking
  • 11. Onboarding a customer account ● Onboarding a new customer account involves extracting tickets and articles from the data lake and training the L1 model (LSA) ● Onboarding also involves choosing the right pre-trained word embedding corresponding to the account’s industry ○ Example industries : Retail, Financial services, SAS, Healthcare, Education ● An ensemble of LSA and W2V embeddings is used to generate L1 scores for each (query, response) pairs ● A downstream classification (L2) model is trained to generate model confidence scores for each (query, {response}) tuple ○ If enough data is not available for the concerned account, an industry-level L2 model is used ● Thresholding, i.e. deciding whether to answer a given query or not, is based on both L1 and L2 scores
  • 12. ● Model refresh is key to ensuring that the models are up to date and stay relevant over time ● This is done once a week; or as soon as an account accumulates a sizeable number of new queries or Knowledge base updates ● It involves the following steps ○ Retraining the LSA model after including the newly accumulated data ○ Incremental training of word vectors with new data ○ Retraining the L2 (classification) model on recent data ■ The L2 model is trained by manually labeling if the responses from the L1 layer are relevant or not (1/0) ■ A 3rd party company is engaged to label these responses Periodic model refresh
  • 13. Teach the bot ● Teach the bot is a feature that allows customer support agents to explicitly train the bot by ingesting Q → A mappings ● When the Answerbot fails to respond to a query (Q), the agent can point the bot to the expected response (A) which should have been returned ● If a suitable response (A) does not exist in the Knowledge base, it can be created on-the-fly ● This expected response (A) is consumed and mapped to be close to the query vector (Q) in the L1 vector space ○ This ensures that article A would show up for future queries that are similar to Q ○ The same feature is re-purposed to resolve incorrect bot responses as well ○ This feature also helps to improve the overall coverage levels of the Answerbot
  • 14. Metrics and business impact Month # Active Clients # Requests # Responded # Helpful # No Feedback % Deflection May’18 97 10,805 6,075 1,657 1,868 15.34% Jun’18 151 22,195 12,969 2,550 5,981 11.49% July’18 182 30,376 19,330 3,792 5,669 12.48% Aug’18 242 50,049 29,948 5,940 7,839 11.87% Sep’18 347 63,587 38,064 8,308 10,112 13.07% Oct’18 457 101,493 56,390 16,589 33,360 16.34% Nov’18 478 130,687 78,902 25,680 46,555 19.65% Dec’18 480 137,517 82,366 23,713 52,772 17.24% ● CSAT* - 79% with bots and 72% without bots ● Average First Response Time (overall) - 13 hrs with bots and 19 hrs without bots *CSAT - Customer Satisfaction Score
  • 15. Understanding the Metrics ● # Active clients - number of customers who are exposing the bot to their customers in their support portal ● # Requests - number of requests that the bot gets ● # Responded - number of requests responded/answered by the bot ● # Helpful - number of requests where the bot responses were helpful ○ Alongside every bot response, a “Was this helpful?” message is also shown and the user’s feedback is solicited. This helps in tracking helpful responses. ● # No Feedback - number of bot responses for which there was no feedback from users ● % Deflection - Ratio of the # Helpful and # Requests
  • 16. Challenges and learnings Challenges: ● Developing a preprocessing mechanism that can extract only the salient components from messy emails ● Handling the complexity of storing and retrieving vector of floats (idfs, SVD components, word vectors) for every account ● Serving predictions at low latency ● Handling kafka streams for updating content in real time - Spark streaming ● Usage of the right tools for monitoring and finding bugs in the codebase in a proactive manner Lessons Learnt: ● Start with a simple model and add incremental improvements over a period of time ● Involve data engineers at the very beginning to create pipelines for data; front-end engineers for making changes to the UI ● Define success metrics and inform stakeholders about what a reasonable target is
  • 18. Appendix Why are some suggestions not helpful to the user? ● Query could relate to a new topic for which there may not be enough FAQs or articles ● Query could relate to an existing topic but may contain keywords which are not in the vocabulary - This may result in low L1 and 2 confidence which may not satisfy the thresholds ● Query may be related to a particular action - Example: “Can you connect me to an agent?” which is a question for a task completion bot that has intent detection capabilities ● Query may not have a question or issue - Example: “I have an open ticket 3335924” ● Query may be ambiguous or unclear - Example: “discussion”