SlideShare ist ein Scribd-Unternehmen logo
1 von 71
Paul Lo
Data Analytics Manager @ Uber, Asia-Pacific Community Operation Central team
paullo0106@gmail.com | paul.lo@uber.com | http://paullo.myvnc.com/blog/
Transforming the Call Center with Text Mining
and Deep Learning for Better User Experience
PythonPH Sep. 2018 (https://www.meetup.com/pythonph/events/254444065/)
Project #1
Text ming tool to unlock user insights
Python lib: natural language processing,
topic modeling
Self-introduction
Who am I?
What does our analytics team do for
Asia-Pacific?
Project #2
Artificial Intelligence revolution in call
centers: deep learning-based bot
Python lib: machine learning related
such as tensorflow, keras, sklearn,
numpy, and etc.
Transforming the Call Center with Text Mining and Deep Learning for Better User Experience
Table of contents
Transforming the Call
Center with Text Mining
and Deep Learning for
Better User Experience
Self-introduction
Skills: Full stack software engineer (Java/ Python) → Data Analyst (R/ Python, databases, machine learning)
Journey: Taipei → Shanghai → Manila
Self-introduction
Uber Shanghai → Uber Manila (APAC Community Operation Central Analytics team)
Scope of Community Operation in Uber APAC
Scope
10+ languages in ~20 locations
Central Team
In
Manila
India
Singapore (South East and North Asia)
Australia
APAC A&I
2017 Year-end
Analytics & Insights is the team responsible for building the analyses,
models, and tools to aid operational and strategic decision making for the
APAC Region. We are also dedicated to furthering Uber’s collective
analytical capability.
Self-introduction
Uber still has awesome team (Analytics, S&P, PM, and etc) based in Manila!!
Improving user experience is one of our core mission
Improve user experience
Drive down defect rate
Optimize operational efficiency
Manage the cost of business operation
Project #1:
Text mining and NLP for use experience
enhancement
Acknowledgement: Troy James Palanca, Lorenzo Ampil
Value proposition
Speed up the workflow on user experience enhancement
Defect rate and issue type
Leaderboard
Community
Operation
Product,
Engineering,
and etc.
User
feedback
database
Root cause analysis
and recommended
feature or policy
changes
Review
customer
feedback in
tickets
User experience
enhancement
Value proposition
Speed up the workflow on user experience enhancement
Defect rate and issue type
Leaderboard
Community
Operation
Product,
Engineering,
and etc.
User
feedback
database
Root cause analysis
and recommended
feature or policy
changes
Review
customer
feedback in
tickets
User experience
enhancement
Making this process more efficient
Issue type dashboard as a high-level data source
Mockup
Dashboard
Problem
How can we quickly get the insights from users’ feedback?
Problem
Reviewing tickets
manually to diagnose
the root cause is not
scalable and
unsystematic
Ticket dataset
Driver > Trips > Fare … > … > Technical issue
ticket
ticket ticket
ticket
ticket
ticket
ticket
ticket
ticket
ticket
ticket
ticket
ticket
ticket
ticket
ticket
ticket
ticket
ticket
ticket
Problem
How can we quickly get the insights from users’ feedback?
Solution
Use topic modeling
techniques to
efficiently group tickets
and assign them to
reasonably named
topics.
Ticket dataset
Driver > Trips > Fare … > … > Technical issue
App stuck/ crash
(35%)
Fare calculation
Dispute
(15%)
GPS issue
(55%)
Key features of our solution
Using Topic modeling based tool to learn pain points from our users
Ticket snippet with user profile: respective ticket
samples are displayed when clicking on a keyword
Word cloud view: user can switch to
this view to see most relevant (tf-idf
score) keywords in each topic
>>DEMO
Sample results
“Fare Disputes” in one of the city we operate are
mainly about payments, airport issues, and wrong
riders:
● Credit cards and other modes of payment
(18%)
● Overcharging (28.8%)
● Wrong profiles being billed (12.8%)
● Airport terminal issues (12.9%)
● Someone else taking the trip (12.5%)
Sample results
Lots of “rude”, “loud music”, “drunk”, and “slam door” keywords
were detected as the pain points of our NY driver partners
Sample results
More than 10% of driver cancellation
tickets in Singapore are related to car
seat rules for child safety: many
sample tickets show that drivers want to
reimburse their cancellation fee due to
their riders bringing children without prior
notice.
Tool architecture
Computing node
(any Uber servers)
Data collection
Data preparation
LDA model training
Web server
(AWS node)
Html and json
files from
training results
User Interface
(d3js)
Train the model for each country with top issues
monthly
Web 1.0 design with the focus on computing node
Workflow overview
Data input: ticket text as raw
data
Output: topic model clusters
Unlocking support insights from textual content
Sample ~50,000 tickets for
each training in each issue
category
Workflow overview
Data Preparation
(text processing)
Extract useful information and
transform corpus to a sparse
matrix
Data Modeling
(Latent Dirichlet
Allocation)
Main computation to perform
topic modeling
Data input: ticket text as raw
data
Output: topic model clusters
Unlocking support insights from textual content
Sample ~50,000 tickets for
each training in each issue
category
Text processing library: nltk, BeautifulSoup, re, TextBlob
LDA library: gensim.ldamodel.LdaModel and pyLDAvis
Workflow overview
Data Preparation
(text processing)
Extract useful information and
transform corpus to a sparse
matrix
Data Modeling
(Latent Dirichlet
Allocation)
Main computation to perform
topic modeling
Data input: ticket text as raw
data
Output: topic model clusters
Unlocking support insights from textual content
Sample ~50,000 tickets for
each training in each issue
category
Remove invalid words:
● Numbers
● Html tags
● Custom dictionary
Stemming and lemmatization
Tokenization
TFIDF (Term Frequency Inverse Document
Frequency)
Workflow overview
Data Preparation
(text processing)
Extract useful information and
transform corpus to a sparse
matrix
Data Modeling
(Latent Dirichlet
Allocation)
Main computation to perform
topic modeling
Data input: ticket text as raw
data
Output: topic model clusters
Unlocking support insights from textual content
Sample ~50,000 tickets for
each training in each issue
category
Remove invalid words:
● Numbers
re.sub(r'd+', '', text)
● Html tags
BeautifulSoup(document).get_text()
BeautifulSoup(document).find_all(‘b’)
● Custom dictionary
Stemming and lemmatization
Tokenization
TFIDF (Term Frequency Inverse Document
Frequency)
Workflow overview
Data Preparation
(text processing)
Extract useful information and
transform corpus to a sparse
matrix
Data Modeling
(Latent Dirichlet
Allocation)
Main computation to perform
topic modeling
Data input: ticket text as raw
data
Output: topic model clusters
Unlocking support insights from textual content
Sample ~50,000 tickets for
each training in each issue
category
Remove invalid words
Stemming and lemmatization: Reduce inflectional
forms and sometimes derivationally related forms of a
word to a common base form. For instance:
○ cancel, cancels, cancelled -> cancel
○ riders, rider -> rider
Tokenization
TFIDF (Term Frequency Inverse Document
Frequency)
Workflow overview
Data Preparation
(text processing)
Extract useful information and
transform corpus to a sparse
matrix
Data Modeling
(Latent Dirichlet
Allocation)
Main computation to perform
topic modeling
Data input: ticket text as raw
data
Output: topic model clusters
Unlocking support insights from textual content
Sample ~50,000 tickets for
each training in each issue
category
Remove invalid words
Stemming and lemmatization
Tokenization: Part-of-speech based word
detection
TFIDF (Term Frequency Inverse Document
Frequency)
Workflow overview
Data Preparation
(text processing)
Extract useful information and
transform corpus to a sparse
matrix
Data Modeling
(Latent Dirichlet
Allocation)
Main computation to perform
topic modeling
Data input: ticket text as raw
data
Output: topic model clusters
Unlocking support insights from textual content
Sample ~50,000 tickets for
each training in each issue
category
Remove invalid words
Stemming and lemmatization
Tokenization: Part-of-speech based word
detection
TFIDF (Term Frequency Inverse Document
Frequency) Common practice to score each term
with weighted frequency and relevance
Data Preparation (Natural Language Processing)
Using TFIDF to filter the most important keywords
Machine Learning
Model
Data Preparation (Natural Language Processing)
Using TFIDF to filter the most important keywords
Machine Learning
Model
Term frequency
Inverse Document
Frequency
Workflow overview
Data Preparation
(text processing)
Extract useful information and
transform corpus to a sparse
matrix
Data Modeling
(Latent Dirichlet
Allocation)
Main computation to perform
topic modeling
Data input: ticket text as raw
data
Output: topic model clusters
Data preparation for text processing can be very time-consuming
Sample ~50,000 tickets for
each training in each issue
category
Remove invalid words:
Stemming and lemmatization
Tokenization
TFIDF (Term Frequency Inverse Document
Frequency)
Speed up data processing
Pandas runs on a single thread by default
A pandas DataFrame with 50k+ rows
Data Preparation
text_processing() is a heavy function
contains many things:
● Tokenization
● Removal of numbers, html tags, and
other invalid words
● Stemming and lemmatization
● TFIDF
df['content'].apply(text_processing)
→ single thread by default
Speed up data processing
Pandas runs on a single thread by default
Worker 1
Worker 2
Worker N
keywords
Data processing speedup trick in Pandas
Pandas runs on a single thread by default
1
2
3
4
5
6
7
8
9
10
Many handy text processing libraries
TextBlob and spaCy
Tokenization Sentence correction
.correct()
Part of speech
.tags
Sentiment analysis
.sentiment.polarity
NLP Library
(TextBlob)
(spaCy)
Workflow overview
Data Preparation
(text processing)
Extract useful information and
transform corpus to a sparse
matrix
Data Modeling
(Latent Dirichlet
Allocation)
Main computation to perform
topic modeling
Data input: ticket text as raw
data
Output: topic model clusters
Unlocking support insights from textual content - but how?
Sample ~50,000 tickets for
each training in each issue
category
LDA:
- Unsupervised learning
- Bag of words
- “topic distribution”
Usage:
lda = LdaModel(corpus=corpus,
id2word=dictionary,
num_topics=4,
random_state=some_number)
lda.show_topics()
Latent Dirichlet Allocation model
General concept of this model
Unsupervised learning method - does not
require any class labels; similar to clustering
‘Bag of words’ model - uses word counts in
messages without regard for its order
(Peter owe Alice money = Alice owe Peter
money)
Estimated iteratively - Starts with random
initialization then adjusts probabilities to
reduce perplexity / increase fit
Doc 1 Doc 2 Doc 3 Doc n...
(topic) FruitsFruits
document-topic
probabilities
30% health (topic
1)
60% fruits
(topic 2)
10% disease
(topic 3)
Latent Dirichlet Allocation model
Model implementation and visualization
Data Preparation
(text processing)
Extract useful information and
transform corpus to a sparse
matrix
Data Modeling
(Latent Dirichlet
Allocation)
Main computation to perform
topic modeling
Data input: ticket text as raw
data
Output: topic model clusters
Sample ~50,000 tickets for
each training in each issue
category
Usage:
lda = LdaModel(corpus=corpus,
id2word=dictionary,
num_topics=4,
random_state=some_number)
lda.show_topics()
from pyLDAvis.gensim import prepare, save_html
from gensim.models import LdaModel
Future work and learnings
Data Preparation
(text processing)
Extract useful information and
transform corpus to a sparse
matrix
Data Modeling
(Latent Dirichlet
Allocation)
Main computation to perform
topic modeling
Customization is needed
● Not suited for
specific issue
category
● Build own
dictionary for the
removal of
irrelevant words
Data input: ticket text as raw
data
Output: topic model clusters
How to make the results more “actionable”?
● # of topic for convergence
● Time and performance
tradeoff
● Other ”Deep NLP” model ?
Bad result
examples
Project #1
Text ming tool to unlock user insights
Python lib: natural language processing,
topic modeling
Self-introduction
Who am I?
What does our analytics team do for
Asia-Pacific?
Project #2
Artificial Intelligence revolution in call
centers: deep learning-based bot
Python lib: machine learning related
such as tensorflow, keras, sklearn,
numpy, and etc.
Transforming the Call Center with Text Mining and Deep Learning for Better User Experience
Table of contents
Transforming the Call
Center with Text Mining
and Deep Learning for
Better User Experience
Product owner: Huaixiu Zheng and Yichia Wang in Uber’s Applied Machine Learning team
Project #2:
Artificial Intelligence revolution in call centers
CSR’s sample workflow for user in a call center
How does our users submit an issue?
CSR’s sample workflow for user in a call center
Online support via in-app-help
User
CSRContact
Ticket
Response
Select
Issue Category
Write Message
Confirm
Issue Category
Lookup info. &
Knowledge Base
Select Action
Write response
using a Reply
Template
The issue for call center operation: scalability and cost
The growth comes at a price again….
Solution? Let’s start from a basic sample
“I want to change my rating for a rider”
API-less solution to the basic sample
We can ‘program’ the pre-defined logic for certain tickets with Selenium or Chrome Script
element mapping
element mapping
End-to-end solution
Web interaction
Read and Write (click and input text)
Knowledge base
● Keyword recognition
● Web element id dictionary
● (Natural Language Processing)
Policy engine
Program the flow aligning with policy/ SOP
Monitoring and logging
● Real-time gsheet API logging
● Monitoring and alert trigger
Ticket
answering
bot
The business impact of a simple bot-solving solution
3k+ weekly solves
A team of
18 CSR
28k USD
monthly
What’s the problem with this solution?
What’s the problem with this solution?
“Scalability”
The difference between Programming and Machine Learning
Outputs =
Agents’
responses
Inputs =
Contact
Ticket
Our machine learning solution design
Why go with “Semi-automated” assistance rather than real robot?
Pros:
- Scalable solution to all (+ new) ticket-types
- Flexible and safer application as human
can still evaluate it and make the final call
Cons: Not fully automated to replace the agent
workforce completely.
Product designed by Hugh Williams, Huaixiu Zheng, Yi-Chia Wang in Applied Machine Learning team
Our machine learning solution design
‘Assistant to CSR’ - Provide suggestions for reply and actions
Issue category suggestion
Action suggestion
10M+ tickets
Correct response from
agents to these 10M+
tickets
Technical model training Product design
Typical Machine Learning process
Note: picture from “Mark Peng’s “General Tips for participating Kaggle Competitions” on Slideshare
Typical Machine Learning process
Model selection
ML 101:
Start with simple model first
Data source: https://eng.uber.com/cota-v2/
Deep Learning Architecture
Reference: Uber AML Lab: http://eng.uber.com/cota
Sample code with Keras for a simple CNN
Deep Learning Architecture
Reference: Uber AML Lab: http://eng.uber.com/cota
Essay: COTA: Improving the Speed and Accuracy of Customer Support through Ranking and
Deep Networks
Development environment for Deep learning model training
How does model training look like?
>> DEMO
Main codebase + data set
Feature engineering and feature importance
Trade off between capacity and interpretability
“Capacity” “Interpretability”
Feature engineering and feature importance
What are the important features? Very easy to learn that in simpler model
Feature engineering and feature importance
What are the important features? Very easy to get explanation in simpler models
Feature engineering and feature importance
What are the important features? NN model is like our brain’s intuition … blackbox
Feature engineering and feature importance
What are the important features?
Sklearn: Recursive feature elimination
(sklearn.feature_selection.RFE)
Mockup
dataset
Feature engineering and feature importance
What are the important features?
Time on model training >>> prediction
Shuffle each feature to create noise…. on the testing set
Mockup
dataset
Python tips: be cautious about the underlying “copy implementation”
np.random.shuffle
What’s the value of
my_list2?
A. [1, 2, 3, 4, 5]
B. [2, 5, 1, 4, 3]
np.random.shuffle
What’s the value of
my_list2?
A. [1, 2, 3, 4, 5]
B. [2, 5, 1, 4, 3]
Python tips: be cautious about the underlying “copy implementation”
np.random.permutation
Python tips: be cautious about the underlying “copy implementation”
np.random.permutation
from copy import deepcopy
mylist2 = deepcopy(my_list)
Python tips: be cautious about the underlying “copy implementation”
Feature engineering and feature importance
What are the important features?
Shuffle each feature to create noise…. on the testing set
Mockup
example
Issue category suggestion
Action suggestion
Product design
Last stop: making business Impact
Ensure KPI measurement is well-planned in the beginning
User
CSRContact
Ticket
Response
Select
Issue Category
Write Message
Confirm
Issue Category
Lookup info. &
Knowledge Base
Select Action
Write response
using a Reply
Template
Last stop: making business Impact
Identify key business metrics, and cautiously conduct and monitor A/A and A/B testing
Source: https://eng.uber.com/cota-v2/
Look forward to collaborating! http://careers.uber.com
Paul Lo
Data Analytics Manager @ Uber
paul.lo@uber.com | paullo0106@gmail.com | | http://paullo.myvnc.com/blog/
Q&A

Weitere ähnliche Inhalte

Was ist angesagt?

PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introductionakira-ai
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Ed Fernandez
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 
ETL & Machine Learning
ETL & Machine LearningETL & Machine Learning
ETL & Machine LearningLuthfi Hariz
 
Mentoring Session with Innovesia: Advance Robotics
Mentoring Session with Innovesia: Advance RoboticsMentoring Session with Innovesia: Advance Robotics
Mentoring Session with Innovesia: Advance RoboticsDony Riyanto
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopGrant Ingersoll
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLHimadri Mishra
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AINing Jiang
 
Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Nisha Talagala
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016MLconf
 
Wolfram alpha A Computational Knowledge Engine Interesting Technology
Wolfram alpha A Computational Knowledge Engine  Interesting Technology Wolfram alpha A Computational Knowledge Engine  Interesting Technology
Wolfram alpha A Computational Knowledge Engine Interesting Technology Manish Kumar
 
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...Data Science Milan
 

Was ist angesagt? (19)

PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction
 
Ml product page
Ml product pageMl product page
Ml product page
 
Ml product page
Ml product pageMl product page
Ml product page
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Intro_to_ML
Intro_to_MLIntro_to_ML
Intro_to_ML
 
ETL & Machine Learning
ETL & Machine LearningETL & Machine Learning
ETL & Machine Learning
 
Mentoring Session with Innovesia: Advance Robotics
Mentoring Session with Innovesia: Advance RoboticsMentoring Session with Innovesia: Advance Robotics
Mentoring Session with Innovesia: Advance Robotics
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC Hadoop
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoML
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
 
Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
Wolfram alpha A Computational Knowledge Engine Interesting Technology
Wolfram alpha A Computational Knowledge Engine  Interesting Technology Wolfram alpha A Computational Knowledge Engine  Interesting Technology
Wolfram alpha A Computational Knowledge Engine Interesting Technology
 
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
 

Ähnlich wie [PythonPH] Transforming the call center with Text mining and Deep learning (Case study@Uber)

[Taipei.py] improving user experience with text mining and deep learning in Uber
[Taipei.py] improving user experience with text mining and deep learning in Uber[Taipei.py] improving user experience with text mining and deep learning in Uber
[Taipei.py] improving user experience with text mining and deep learning in UberPaul Lo
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translationbdonaldson
 
Serving Information Needs of Knowledge Workers
Serving Information Needs of Knowledge WorkersServing Information Needs of Knowledge Workers
Serving Information Needs of Knowledge WorkersDebdoot Mukherjee
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesVarun Nathan
 
Map Reduce amrp presentation
Map Reduce amrp presentationMap Reduce amrp presentation
Map Reduce amrp presentationrenjan131
 
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...NETWAYS
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...Gabriel Moreira
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...Gabriel Moreira
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycleDatabricks
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesVarun Nathan
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesVarun Nathan
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsGabriel Moreira
 
3 Software Estmation.ppt
3 Software Estmation.ppt3 Software Estmation.ppt
3 Software Estmation.pptSoham De
 
Best Data Science Online Training in Hyderabad
  Best Data Science Online Training in Hyderabad  Best Data Science Online Training in Hyderabad
Best Data Science Online Training in Hyderabadbharathtsofttech
 
DCXS best selfcare-solutions DynamicFAQ
DCXS best selfcare-solutions DynamicFAQDCXS best selfcare-solutions DynamicFAQ
DCXS best selfcare-solutions DynamicFAQLilianBernardin
 

Ähnlich wie [PythonPH] Transforming the call center with Text mining and Deep learning (Case study@Uber) (20)

[Taipei.py] improving user experience with text mining and deep learning in Uber
[Taipei.py] improving user experience with text mining and deep learning in Uber[Taipei.py] improving user experience with text mining and deep learning in Uber
[Taipei.py] improving user experience with text mining and deep learning in Uber
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translation
 
Serving Information Needs of Knowledge Workers
Serving Information Needs of Knowledge WorkersServing Information Needs of Knowledge Workers
Serving Information Needs of Knowledge Workers
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queries
 
Map Reduce amrp presentation
Map Reduce amrp presentationMap Reduce amrp presentation
Map Reduce amrp presentation
 
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queries
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queries
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
3 Software Estmation.ppt
3 Software Estmation.ppt3 Software Estmation.ppt
3 Software Estmation.ppt
 
Best Data Science Online Training in Hyderabad
  Best Data Science Online Training in Hyderabad  Best Data Science Online Training in Hyderabad
Best Data Science Online Training in Hyderabad
 
DCXS best selfcare-solutions DynamicFAQ
DCXS best selfcare-solutions DynamicFAQDCXS best selfcare-solutions DynamicFAQ
DCXS best selfcare-solutions DynamicFAQ
 
Bootcamp_AIAppsUCSD.pptx
Bootcamp_AIAppsUCSD.pptxBootcamp_AIAppsUCSD.pptx
Bootcamp_AIAppsUCSD.pptx
 

Kürzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

[PythonPH] Transforming the call center with Text mining and Deep learning (Case study@Uber)

  • 1. Paul Lo Data Analytics Manager @ Uber, Asia-Pacific Community Operation Central team paullo0106@gmail.com | paul.lo@uber.com | http://paullo.myvnc.com/blog/ Transforming the Call Center with Text Mining and Deep Learning for Better User Experience PythonPH Sep. 2018 (https://www.meetup.com/pythonph/events/254444065/)
  • 2. Project #1 Text ming tool to unlock user insights Python lib: natural language processing, topic modeling Self-introduction Who am I? What does our analytics team do for Asia-Pacific? Project #2 Artificial Intelligence revolution in call centers: deep learning-based bot Python lib: machine learning related such as tensorflow, keras, sklearn, numpy, and etc. Transforming the Call Center with Text Mining and Deep Learning for Better User Experience Table of contents Transforming the Call Center with Text Mining and Deep Learning for Better User Experience
  • 3. Self-introduction Skills: Full stack software engineer (Java/ Python) → Data Analyst (R/ Python, databases, machine learning) Journey: Taipei → Shanghai → Manila
  • 4. Self-introduction Uber Shanghai → Uber Manila (APAC Community Operation Central Analytics team)
  • 5. Scope of Community Operation in Uber APAC Scope 10+ languages in ~20 locations Central Team In Manila India Singapore (South East and North Asia) Australia
  • 6. APAC A&I 2017 Year-end Analytics & Insights is the team responsible for building the analyses, models, and tools to aid operational and strategic decision making for the APAC Region. We are also dedicated to furthering Uber’s collective analytical capability.
  • 7. Self-introduction Uber still has awesome team (Analytics, S&P, PM, and etc) based in Manila!!
  • 8. Improving user experience is one of our core mission Improve user experience Drive down defect rate Optimize operational efficiency Manage the cost of business operation
  • 9. Project #1: Text mining and NLP for use experience enhancement Acknowledgement: Troy James Palanca, Lorenzo Ampil
  • 10. Value proposition Speed up the workflow on user experience enhancement Defect rate and issue type Leaderboard Community Operation Product, Engineering, and etc. User feedback database Root cause analysis and recommended feature or policy changes Review customer feedback in tickets User experience enhancement
  • 11. Value proposition Speed up the workflow on user experience enhancement Defect rate and issue type Leaderboard Community Operation Product, Engineering, and etc. User feedback database Root cause analysis and recommended feature or policy changes Review customer feedback in tickets User experience enhancement Making this process more efficient
  • 12. Issue type dashboard as a high-level data source Mockup Dashboard
  • 13. Problem How can we quickly get the insights from users’ feedback? Problem Reviewing tickets manually to diagnose the root cause is not scalable and unsystematic Ticket dataset Driver > Trips > Fare … > … > Technical issue ticket ticket ticket ticket ticket ticket ticket ticket ticket ticket ticket ticket ticket ticket ticket ticket ticket ticket ticket ticket
  • 14. Problem How can we quickly get the insights from users’ feedback? Solution Use topic modeling techniques to efficiently group tickets and assign them to reasonably named topics. Ticket dataset Driver > Trips > Fare … > … > Technical issue App stuck/ crash (35%) Fare calculation Dispute (15%) GPS issue (55%)
  • 15. Key features of our solution Using Topic modeling based tool to learn pain points from our users Ticket snippet with user profile: respective ticket samples are displayed when clicking on a keyword Word cloud view: user can switch to this view to see most relevant (tf-idf score) keywords in each topic >>DEMO
  • 16. Sample results “Fare Disputes” in one of the city we operate are mainly about payments, airport issues, and wrong riders: ● Credit cards and other modes of payment (18%) ● Overcharging (28.8%) ● Wrong profiles being billed (12.8%) ● Airport terminal issues (12.9%) ● Someone else taking the trip (12.5%)
  • 17. Sample results Lots of “rude”, “loud music”, “drunk”, and “slam door” keywords were detected as the pain points of our NY driver partners
  • 18. Sample results More than 10% of driver cancellation tickets in Singapore are related to car seat rules for child safety: many sample tickets show that drivers want to reimburse their cancellation fee due to their riders bringing children without prior notice.
  • 19. Tool architecture Computing node (any Uber servers) Data collection Data preparation LDA model training Web server (AWS node) Html and json files from training results User Interface (d3js) Train the model for each country with top issues monthly Web 1.0 design with the focus on computing node
  • 20. Workflow overview Data input: ticket text as raw data Output: topic model clusters Unlocking support insights from textual content Sample ~50,000 tickets for each training in each issue category
  • 21. Workflow overview Data Preparation (text processing) Extract useful information and transform corpus to a sparse matrix Data Modeling (Latent Dirichlet Allocation) Main computation to perform topic modeling Data input: ticket text as raw data Output: topic model clusters Unlocking support insights from textual content Sample ~50,000 tickets for each training in each issue category Text processing library: nltk, BeautifulSoup, re, TextBlob LDA library: gensim.ldamodel.LdaModel and pyLDAvis
  • 22. Workflow overview Data Preparation (text processing) Extract useful information and transform corpus to a sparse matrix Data Modeling (Latent Dirichlet Allocation) Main computation to perform topic modeling Data input: ticket text as raw data Output: topic model clusters Unlocking support insights from textual content Sample ~50,000 tickets for each training in each issue category Remove invalid words: ● Numbers ● Html tags ● Custom dictionary Stemming and lemmatization Tokenization TFIDF (Term Frequency Inverse Document Frequency)
  • 23. Workflow overview Data Preparation (text processing) Extract useful information and transform corpus to a sparse matrix Data Modeling (Latent Dirichlet Allocation) Main computation to perform topic modeling Data input: ticket text as raw data Output: topic model clusters Unlocking support insights from textual content Sample ~50,000 tickets for each training in each issue category Remove invalid words: ● Numbers re.sub(r'd+', '', text) ● Html tags BeautifulSoup(document).get_text() BeautifulSoup(document).find_all(‘b’) ● Custom dictionary Stemming and lemmatization Tokenization TFIDF (Term Frequency Inverse Document Frequency)
  • 24. Workflow overview Data Preparation (text processing) Extract useful information and transform corpus to a sparse matrix Data Modeling (Latent Dirichlet Allocation) Main computation to perform topic modeling Data input: ticket text as raw data Output: topic model clusters Unlocking support insights from textual content Sample ~50,000 tickets for each training in each issue category Remove invalid words Stemming and lemmatization: Reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For instance: ○ cancel, cancels, cancelled -> cancel ○ riders, rider -> rider Tokenization TFIDF (Term Frequency Inverse Document Frequency)
  • 25. Workflow overview Data Preparation (text processing) Extract useful information and transform corpus to a sparse matrix Data Modeling (Latent Dirichlet Allocation) Main computation to perform topic modeling Data input: ticket text as raw data Output: topic model clusters Unlocking support insights from textual content Sample ~50,000 tickets for each training in each issue category Remove invalid words Stemming and lemmatization Tokenization: Part-of-speech based word detection TFIDF (Term Frequency Inverse Document Frequency)
  • 26. Workflow overview Data Preparation (text processing) Extract useful information and transform corpus to a sparse matrix Data Modeling (Latent Dirichlet Allocation) Main computation to perform topic modeling Data input: ticket text as raw data Output: topic model clusters Unlocking support insights from textual content Sample ~50,000 tickets for each training in each issue category Remove invalid words Stemming and lemmatization Tokenization: Part-of-speech based word detection TFIDF (Term Frequency Inverse Document Frequency) Common practice to score each term with weighted frequency and relevance
  • 27. Data Preparation (Natural Language Processing) Using TFIDF to filter the most important keywords Machine Learning Model
  • 28. Data Preparation (Natural Language Processing) Using TFIDF to filter the most important keywords Machine Learning Model Term frequency Inverse Document Frequency
  • 29. Workflow overview Data Preparation (text processing) Extract useful information and transform corpus to a sparse matrix Data Modeling (Latent Dirichlet Allocation) Main computation to perform topic modeling Data input: ticket text as raw data Output: topic model clusters Data preparation for text processing can be very time-consuming Sample ~50,000 tickets for each training in each issue category Remove invalid words: Stemming and lemmatization Tokenization TFIDF (Term Frequency Inverse Document Frequency)
  • 30. Speed up data processing Pandas runs on a single thread by default A pandas DataFrame with 50k+ rows Data Preparation text_processing() is a heavy function contains many things: ● Tokenization ● Removal of numbers, html tags, and other invalid words ● Stemming and lemmatization ● TFIDF df['content'].apply(text_processing) → single thread by default
  • 31. Speed up data processing Pandas runs on a single thread by default Worker 1 Worker 2 Worker N keywords
  • 32. Data processing speedup trick in Pandas Pandas runs on a single thread by default 1 2 3 4 5 6 7 8 9 10
  • 33. Many handy text processing libraries TextBlob and spaCy Tokenization Sentence correction .correct() Part of speech .tags Sentiment analysis .sentiment.polarity NLP Library (TextBlob) (spaCy)
  • 34. Workflow overview Data Preparation (text processing) Extract useful information and transform corpus to a sparse matrix Data Modeling (Latent Dirichlet Allocation) Main computation to perform topic modeling Data input: ticket text as raw data Output: topic model clusters Unlocking support insights from textual content - but how? Sample ~50,000 tickets for each training in each issue category LDA: - Unsupervised learning - Bag of words - “topic distribution” Usage: lda = LdaModel(corpus=corpus, id2word=dictionary, num_topics=4, random_state=some_number) lda.show_topics()
  • 35. Latent Dirichlet Allocation model General concept of this model Unsupervised learning method - does not require any class labels; similar to clustering ‘Bag of words’ model - uses word counts in messages without regard for its order (Peter owe Alice money = Alice owe Peter money) Estimated iteratively - Starts with random initialization then adjusts probabilities to reduce perplexity / increase fit Doc 1 Doc 2 Doc 3 Doc n... (topic) FruitsFruits document-topic probabilities 30% health (topic 1) 60% fruits (topic 2) 10% disease (topic 3)
  • 36. Latent Dirichlet Allocation model Model implementation and visualization Data Preparation (text processing) Extract useful information and transform corpus to a sparse matrix Data Modeling (Latent Dirichlet Allocation) Main computation to perform topic modeling Data input: ticket text as raw data Output: topic model clusters Sample ~50,000 tickets for each training in each issue category Usage: lda = LdaModel(corpus=corpus, id2word=dictionary, num_topics=4, random_state=some_number) lda.show_topics() from pyLDAvis.gensim import prepare, save_html from gensim.models import LdaModel
  • 37. Future work and learnings Data Preparation (text processing) Extract useful information and transform corpus to a sparse matrix Data Modeling (Latent Dirichlet Allocation) Main computation to perform topic modeling Customization is needed ● Not suited for specific issue category ● Build own dictionary for the removal of irrelevant words Data input: ticket text as raw data Output: topic model clusters How to make the results more “actionable”? ● # of topic for convergence ● Time and performance tradeoff ● Other ”Deep NLP” model ? Bad result examples
  • 38. Project #1 Text ming tool to unlock user insights Python lib: natural language processing, topic modeling Self-introduction Who am I? What does our analytics team do for Asia-Pacific? Project #2 Artificial Intelligence revolution in call centers: deep learning-based bot Python lib: machine learning related such as tensorflow, keras, sklearn, numpy, and etc. Transforming the Call Center with Text Mining and Deep Learning for Better User Experience Table of contents Transforming the Call Center with Text Mining and Deep Learning for Better User Experience
  • 39. Product owner: Huaixiu Zheng and Yichia Wang in Uber’s Applied Machine Learning team Project #2: Artificial Intelligence revolution in call centers
  • 40. CSR’s sample workflow for user in a call center How does our users submit an issue?
  • 41. CSR’s sample workflow for user in a call center Online support via in-app-help User CSRContact Ticket Response Select Issue Category Write Message Confirm Issue Category Lookup info. & Knowledge Base Select Action Write response using a Reply Template
  • 42. The issue for call center operation: scalability and cost The growth comes at a price again….
  • 43. Solution? Let’s start from a basic sample “I want to change my rating for a rider”
  • 44. API-less solution to the basic sample We can ‘program’ the pre-defined logic for certain tickets with Selenium or Chrome Script element mapping element mapping
  • 45. End-to-end solution Web interaction Read and Write (click and input text) Knowledge base ● Keyword recognition ● Web element id dictionary ● (Natural Language Processing) Policy engine Program the flow aligning with policy/ SOP Monitoring and logging ● Real-time gsheet API logging ● Monitoring and alert trigger Ticket answering bot
  • 46. The business impact of a simple bot-solving solution 3k+ weekly solves A team of 18 CSR 28k USD monthly
  • 47. What’s the problem with this solution?
  • 48. What’s the problem with this solution? “Scalability”
  • 49. The difference between Programming and Machine Learning Outputs = Agents’ responses Inputs = Contact Ticket
  • 50. Our machine learning solution design Why go with “Semi-automated” assistance rather than real robot? Pros: - Scalable solution to all (+ new) ticket-types - Flexible and safer application as human can still evaluate it and make the final call Cons: Not fully automated to replace the agent workforce completely. Product designed by Hugh Williams, Huaixiu Zheng, Yi-Chia Wang in Applied Machine Learning team
  • 51. Our machine learning solution design ‘Assistant to CSR’ - Provide suggestions for reply and actions Issue category suggestion Action suggestion 10M+ tickets Correct response from agents to these 10M+ tickets Technical model training Product design
  • 52. Typical Machine Learning process Note: picture from “Mark Peng’s “General Tips for participating Kaggle Competitions” on Slideshare
  • 53. Typical Machine Learning process Model selection ML 101: Start with simple model first Data source: https://eng.uber.com/cota-v2/
  • 54. Deep Learning Architecture Reference: Uber AML Lab: http://eng.uber.com/cota Sample code with Keras for a simple CNN
  • 55. Deep Learning Architecture Reference: Uber AML Lab: http://eng.uber.com/cota Essay: COTA: Improving the Speed and Accuracy of Customer Support through Ranking and Deep Networks
  • 56. Development environment for Deep learning model training How does model training look like? >> DEMO Main codebase + data set
  • 57. Feature engineering and feature importance Trade off between capacity and interpretability “Capacity” “Interpretability”
  • 58. Feature engineering and feature importance What are the important features? Very easy to learn that in simpler model
  • 59. Feature engineering and feature importance What are the important features? Very easy to get explanation in simpler models
  • 60. Feature engineering and feature importance What are the important features? NN model is like our brain’s intuition … blackbox
  • 61. Feature engineering and feature importance What are the important features? Sklearn: Recursive feature elimination (sklearn.feature_selection.RFE) Mockup dataset
  • 62. Feature engineering and feature importance What are the important features? Time on model training >>> prediction Shuffle each feature to create noise…. on the testing set Mockup dataset
  • 63. Python tips: be cautious about the underlying “copy implementation” np.random.shuffle What’s the value of my_list2? A. [1, 2, 3, 4, 5] B. [2, 5, 1, 4, 3]
  • 64. np.random.shuffle What’s the value of my_list2? A. [1, 2, 3, 4, 5] B. [2, 5, 1, 4, 3] Python tips: be cautious about the underlying “copy implementation”
  • 65. np.random.permutation Python tips: be cautious about the underlying “copy implementation”
  • 66. np.random.permutation from copy import deepcopy mylist2 = deepcopy(my_list) Python tips: be cautious about the underlying “copy implementation”
  • 67. Feature engineering and feature importance What are the important features? Shuffle each feature to create noise…. on the testing set Mockup example
  • 68. Issue category suggestion Action suggestion Product design Last stop: making business Impact Ensure KPI measurement is well-planned in the beginning User CSRContact Ticket Response Select Issue Category Write Message Confirm Issue Category Lookup info. & Knowledge Base Select Action Write response using a Reply Template
  • 69. Last stop: making business Impact Identify key business metrics, and cautiously conduct and monitor A/A and A/B testing Source: https://eng.uber.com/cota-v2/
  • 70. Look forward to collaborating! http://careers.uber.com
  • 71. Paul Lo Data Analytics Manager @ Uber paul.lo@uber.com | paullo0106@gmail.com | | http://paullo.myvnc.com/blog/ Q&A