SlideShare ist ein Scribd-Unternehmen logo
1 von 26
HYDERABAD
OCTOBER 13, 14. 2015
Classifying issues from SR text
descriptions in Azure ML
George Simov
Data Scientist
Agenda
• Text Analytics concepts and terms
• Azure ML capabilities for text classification
• Implementation Details
• Spam Detection Model – binary classification
• Model for classifying issues from SR text descriptions – multi-class
classification
• Operationalization of the model
Text Analytics
Def: The term text analytics describes a set of linguistic, statistical, and machine learning
techniques that model and structure the information content of textual sources for
business intelligence, exploratory data analysis, research, or investigation.
Text Classification
• Binary Classification (for example: Spam Detection)
• Multiclass Classification (for example: Product classification by text description)
Text Clustering
• Grouping same or similar text documents based on distance/similarity function (usually cos
similarity in vector-space model)
Sentiment Analysis
• Identify and extract subjective information in source materials
• Positive, Negative, Neutral
Name Entity Recognition
• Subtask of information extraction that seeks to locate and classify elements in text into
pre-defined categories such as the names of persons, organizations, locations,
expressions of times, quantities, monetary values, percentages, etc.
Text Representation – transform text into numerical vectors
Bag Of Words Model (Vector Space Model)
• Each dimension (axis) corresponds to a document feature.
• Features: words or phrases (bag of words model)
• TF (term frequency): number of occurrences of each word in a document
• TFIDF (term frequency inverted term frequency) table : weight assigned to each term describing a document:
Wij = TF * IDF = tfij * log (N / dfi)
TF – Term Frequency
IDF – Inverse Document Frequency
Wij – Weight of the i-th term in j-th document
tfij – Frequency of the i-th term in document j
N – The total number of documents in the collection
dfi – The number of documents containing the i-th term
• N-grams – representing text features
Example: Text classification is an important area in text analytics
2-grams:
Text classification | classification is | is an | an important | important area | area in | in text | text analytics
Example
Azure ML Text Classification Workflow
Step 1. Data Preparation
SQL queries, Excel, R, …
Result:
Label Text
1 This is a spam
0 This is a text that is not a spam
1 This is another spam
Step 2. Text Preprocessing
- Lower case, Remove stop words, Remove numbers, Stemming, Synonyms…anything that might be helpful
Implementation in AML – R script module
process.text <- function(textVector, b_tolower, b_removeWords, b_stemDocument, b_removeNumbers )
{
library("tm")
print("replace special characters with space")
textVector <- gsub("[^0-9a-z]", " ", textVector, ignore.case = TRUE)
if(b_removeNumbers == TRUE){
textVector <- gsub("[^a-z]", " ", textVector, ignore.case = TRUE)
}
textVector <- gsub("s+", " ", textVector)
textVector <- gsub("^s", "", textVector)
textVector <- gsub("s$", "", textVector)
…………………………………..
theCorpus <- Corpus(VectorSource(textVector))
if(b_tolower == TRUE){
print("tolower ....")
theCorpus <- tm_map(theCorpus, content_transformer(tolower))
}
if(b_removeWords == TRUE){
print("remove stopwords ....")
theCorpus <- tm_map(theCorpus, removeWords, stopwords("english"))
}
if(b_stemDocument == TRUE){
print("word stemming ....")
theCorpus <- tm_map(theCorpus, stemDocument, "english")
}
………………………………………………………………………………………………………….
Step 3. Feature Representation and Extraction – 2 AML modules
- Feature Hashing
Parameters: Hashing bit size, N-grams
- Filter Based Feature Selection
- Properties
Step 4. Train Model
- A lot of binary and multiclass learners: linear regression, logistic regression, boosted decision tree, SVM, decision forest, ….
Step 5. Evaluate model
- Cross Validate Model
- Score Model
- Evaluate Model
Step 6. Visualization of the results and numerical metrics
- Binary Classifiers – Precision, Recall, F1 Score, AUC, AUC graphics
- Multiclass – Confusion table, custom script for precision/recall calculations
Spam Detection in answers.microsoft.com forums
Business Scenario:
Automatic spam detection in answer.microsoft.com threads.
Today there are many volunteers and MS FTEs who spend a lot
of time and efforts to clean up the forums from spam
messages. The solution is automatic spam detection.
Example POC : Spam detection in AML, based on the message
content.
Spam Detection Model
Spam Detection Data
Spam Detection Experiment Results – test data
Predicting Products/Issues by SR problem description
Business Scenario:
The Azure support portal (Ibiza) wans to get rid of the user selections for the product and the
problem/issue, because users make mistakes or select “Other” when they are confused what to select. This
leads to SR miss-routing and hence slowing down the process of the issue resolving. (We have seen up to 9
SR transfers during the SR life cycle).
Azure Support Portal
Customer ‘accuracy’ compared
to SE selection:
• ~ 75% - Service (level 0)
• ~ 50% - Feature (level 1)
• ~ 25% - Issue (level 2)
Why?
• Too many topics, customers
cannot discriminate
• Poorly defined topics
• Customers seldom traverse up
the tree to find more relevant
topics
• Customer don’t know how to
classify their symptoms
• Enter anything to talk to assisted
support
Consequence
• Less self-help, more support
volume
• Poor routing, more MPI,
Support Topic Taxonomy
Note: Current MOP experience. POR is UE be replaced
with text input only ‘Maven’ UI
Current Office 365 online support experience
Note: Current MOP experience. POR
is UE be replaced with text input
only ‘Maven’ UI
Predicting O365 Products by Problem Description
• Predicting O365 Products by Problem Description - RESULTS
Predicting O365 Products by Problem Description - RESULTS
Predicting O365 Issues by Problem Description - Model
Predicting O365 Issues by Problem Description - RESULTS
Category/Issue Frequencies Result after execution of the preprocessing R script
Predicting O365 Issues by Problem Description:
Cross-Validation RESULTS
Predicting O365 Issues by Problem Description – Analysis
1. Accuracy is not high enough to get rid completely of the problem descriptions
2. Idea about the functionality based on the results from the ML model:
- Sort/Rank the Products and the Problems/Issues in the selection list boxes by probability returned from the
ML model.
Expected Result: Decrease of the wrong selections based on the assumption that the user will find the correct
selection options at the beginning of the list.
This is an example of usefulness of the ML models even when they cannot solve a problem completely.
Operationalization
1. Azure ML creates automatically REST Web service
2. Azure ML provides an easy way to deploy the production version of the model on a production environment.
3. Performance – slower than TLC
4. Poor debugging capabilities.
5. Poor code instrumentation/troubleshooting capabilities
6. Scalability – deployment on a limited set of machine (16)
Consider all above proc/cons when making decision to have AML production model.
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Webfeiwin
 
Darshan sem4 140703_ooad_2014 (diagrams)
Darshan sem4 140703_ooad_2014 (diagrams)Darshan sem4 140703_ooad_2014 (diagrams)
Darshan sem4 140703_ooad_2014 (diagrams)Gajeshwar Bahekar
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introductionAnas Jamil
 
Object Oriented Software Development revision slide
Object Oriented Software Development revision slide Object Oriented Software Development revision slide
Object Oriented Software Development revision slide fauza jali
 
1Introduction to OOAD
1Introduction to OOAD1Introduction to OOAD
1Introduction to OOAD Shahid Riaz
 
Object oriented software engineering concepts
Object oriented software engineering conceptsObject oriented software engineering concepts
Object oriented software engineering conceptsKomal Singh
 
Object Oriented Approach for Software Development
Object Oriented Approach for Software DevelopmentObject Oriented Approach for Software Development
Object Oriented Approach for Software DevelopmentRishabh Soni
 
A2 databases
A2 databasesA2 databases
A2 databasesc.west
 
Logical Design and Conceptual Database Design
Logical Design and Conceptual Database DesignLogical Design and Conceptual Database Design
Logical Design and Conceptual Database DesignEr. Nawaraj Bhandari
 

Was ist angesagt? (19)

Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Web
 
Darshan sem4 140703_ooad_2014 (diagrams)
Darshan sem4 140703_ooad_2014 (diagrams)Darshan sem4 140703_ooad_2014 (diagrams)
Darshan sem4 140703_ooad_2014 (diagrams)
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
 
Object Oriented Software Development revision slide
Object Oriented Software Development revision slide Object Oriented Software Development revision slide
Object Oriented Software Development revision slide
 
Usecase Presentation
Usecase PresentationUsecase Presentation
Usecase Presentation
 
1Introduction to OOAD
1Introduction to OOAD1Introduction to OOAD
1Introduction to OOAD
 
Object oriented software engineering concepts
Object oriented software engineering conceptsObject oriented software engineering concepts
Object oriented software engineering concepts
 
Use case diagrams
Use case diagramsUse case diagrams
Use case diagrams
 
Object Oriented Approach for Software Development
Object Oriented Approach for Software DevelopmentObject Oriented Approach for Software Development
Object Oriented Approach for Software Development
 
Db lec 02_new
Db lec 02_newDb lec 02_new
Db lec 02_new
 
Introduction to OOAD
Introduction to OOADIntroduction to OOAD
Introduction to OOAD
 
A2 databases
A2 databasesA2 databases
A2 databases
 
Uml use casediagrams assignment help
Uml use casediagrams assignment helpUml use casediagrams assignment help
Uml use casediagrams assignment help
 
Adt
AdtAdt
Adt
 
Domain object model
Domain object modelDomain object model
Domain object model
 
Syllabus ms
Syllabus msSyllabus ms
Syllabus ms
 
Object Oriented Analysis and Design
Object Oriented Analysis and DesignObject Oriented Analysis and Design
Object Oriented Analysis and Design
 
Ooad ppt
Ooad pptOoad ppt
Ooad ppt
 
Logical Design and Conceptual Database Design
Logical Design and Conceptual Database DesignLogical Design and Conceptual Database Design
Logical Design and Conceptual Database Design
 

Ähnlich wie Classifying Issues from SR Text Descriptions in Azure ML

Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsGabriel Moreira
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Gabriel Moreira
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
machine learning workflow with data input.pptx
machine learning workflow with data input.pptxmachine learning workflow with data input.pptx
machine learning workflow with data input.pptxjasontseng19
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification SystemIRJET Journal
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in productionStepan Pushkarev
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19Yong Siang (Ivan) Tan
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfDanilo Cardona
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Mca1040 system analysis and design
Mca1040  system analysis and designMca1040  system analysis and design
Mca1040 system analysis and designsmumbahelp
 
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
 Machine Learning Data Life Cycle in Production (Week 2   feature engineering... Machine Learning Data Life Cycle in Production (Week 2   feature engineering...
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...Ajay Taneja
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?Matei Zaharia
 
A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...
A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...
A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...IRJET Journal
 
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.pptUNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.pptVGaneshKarthikeyan
 
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.pptUNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.pptVGaneshKarthikeyan
 
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.pptUNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.pptVGaneshKarthikeyan
 

Ähnlich wie Classifying Issues from SR Text Descriptions in Azure ML (20)

Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
 
NEXiDA at OMG June 2009
NEXiDA at OMG June 2009NEXiDA at OMG June 2009
NEXiDA at OMG June 2009
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
machine learning workflow with data input.pptx
machine learning workflow with data input.pptxmachine learning workflow with data input.pptx
machine learning workflow with data input.pptx
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification System
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in production
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdf
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Mca1040 system analysis and design
Mca1040  system analysis and designMca1040  system analysis and design
Mca1040 system analysis and design
 
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
 Machine Learning Data Life Cycle in Production (Week 2   feature engineering... Machine Learning Data Life Cycle in Production (Week 2   feature engineering...
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...
A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...
A WEB BASED APPLICATION FOR RESUME PARSER USING NATURAL LANGUAGE PROCESSING T...
 
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.pptUNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
 
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.pptUNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
 
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.pptUNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
UNIT-I(Unified_Process_and_Use Case_Diagrams)_OOAD.ppt
 

Classifying Issues from SR Text Descriptions in Azure ML

  • 2. Classifying issues from SR text descriptions in Azure ML George Simov Data Scientist
  • 3. Agenda • Text Analytics concepts and terms • Azure ML capabilities for text classification • Implementation Details • Spam Detection Model – binary classification • Model for classifying issues from SR text descriptions – multi-class classification • Operationalization of the model
  • 4. Text Analytics Def: The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation. Text Classification • Binary Classification (for example: Spam Detection) • Multiclass Classification (for example: Product classification by text description) Text Clustering • Grouping same or similar text documents based on distance/similarity function (usually cos similarity in vector-space model) Sentiment Analysis • Identify and extract subjective information in source materials • Positive, Negative, Neutral Name Entity Recognition • Subtask of information extraction that seeks to locate and classify elements in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
  • 5. Text Representation – transform text into numerical vectors Bag Of Words Model (Vector Space Model) • Each dimension (axis) corresponds to a document feature. • Features: words or phrases (bag of words model) • TF (term frequency): number of occurrences of each word in a document • TFIDF (term frequency inverted term frequency) table : weight assigned to each term describing a document: Wij = TF * IDF = tfij * log (N / dfi) TF – Term Frequency IDF – Inverse Document Frequency Wij – Weight of the i-th term in j-th document tfij – Frequency of the i-th term in document j N – The total number of documents in the collection dfi – The number of documents containing the i-th term • N-grams – representing text features Example: Text classification is an important area in text analytics 2-grams: Text classification | classification is | is an | an important | important area | area in | in text | text analytics
  • 7. Azure ML Text Classification Workflow Step 1. Data Preparation SQL queries, Excel, R, … Result: Label Text 1 This is a spam 0 This is a text that is not a spam 1 This is another spam
  • 8. Step 2. Text Preprocessing - Lower case, Remove stop words, Remove numbers, Stemming, Synonyms…anything that might be helpful Implementation in AML – R script module process.text <- function(textVector, b_tolower, b_removeWords, b_stemDocument, b_removeNumbers ) { library("tm") print("replace special characters with space") textVector <- gsub("[^0-9a-z]", " ", textVector, ignore.case = TRUE) if(b_removeNumbers == TRUE){ textVector <- gsub("[^a-z]", " ", textVector, ignore.case = TRUE) } textVector <- gsub("s+", " ", textVector) textVector <- gsub("^s", "", textVector) textVector <- gsub("s$", "", textVector) ………………………………….. theCorpus <- Corpus(VectorSource(textVector)) if(b_tolower == TRUE){ print("tolower ....") theCorpus <- tm_map(theCorpus, content_transformer(tolower)) } if(b_removeWords == TRUE){ print("remove stopwords ....") theCorpus <- tm_map(theCorpus, removeWords, stopwords("english")) } if(b_stemDocument == TRUE){ print("word stemming ....") theCorpus <- tm_map(theCorpus, stemDocument, "english") } ………………………………………………………………………………………………………….
  • 9. Step 3. Feature Representation and Extraction – 2 AML modules - Feature Hashing Parameters: Hashing bit size, N-grams - Filter Based Feature Selection - Properties Step 4. Train Model - A lot of binary and multiclass learners: linear regression, logistic regression, boosted decision tree, SVM, decision forest, …. Step 5. Evaluate model - Cross Validate Model - Score Model - Evaluate Model Step 6. Visualization of the results and numerical metrics - Binary Classifiers – Precision, Recall, F1 Score, AUC, AUC graphics - Multiclass – Confusion table, custom script for precision/recall calculations
  • 10. Spam Detection in answers.microsoft.com forums Business Scenario: Automatic spam detection in answer.microsoft.com threads. Today there are many volunteers and MS FTEs who spend a lot of time and efforts to clean up the forums from spam messages. The solution is automatic spam detection. Example POC : Spam detection in AML, based on the message content.
  • 13. Spam Detection Experiment Results – test data
  • 14. Predicting Products/Issues by SR problem description Business Scenario: The Azure support portal (Ibiza) wans to get rid of the user selections for the product and the problem/issue, because users make mistakes or select “Other” when they are confused what to select. This leads to SR miss-routing and hence slowing down the process of the issue resolving. (We have seen up to 9 SR transfers during the SR life cycle).
  • 16. Customer ‘accuracy’ compared to SE selection: • ~ 75% - Service (level 0) • ~ 50% - Feature (level 1) • ~ 25% - Issue (level 2) Why? • Too many topics, customers cannot discriminate • Poorly defined topics • Customers seldom traverse up the tree to find more relevant topics • Customer don’t know how to classify their symptoms • Enter anything to talk to assisted support Consequence • Less self-help, more support volume • Poor routing, more MPI, Support Topic Taxonomy Note: Current MOP experience. POR is UE be replaced with text input only ‘Maven’ UI
  • 17. Current Office 365 online support experience Note: Current MOP experience. POR is UE be replaced with text input only ‘Maven’ UI
  • 18. Predicting O365 Products by Problem Description
  • 19. • Predicting O365 Products by Problem Description - RESULTS
  • 20. Predicting O365 Products by Problem Description - RESULTS
  • 21. Predicting O365 Issues by Problem Description - Model
  • 22. Predicting O365 Issues by Problem Description - RESULTS Category/Issue Frequencies Result after execution of the preprocessing R script
  • 23. Predicting O365 Issues by Problem Description: Cross-Validation RESULTS
  • 24. Predicting O365 Issues by Problem Description – Analysis 1. Accuracy is not high enough to get rid completely of the problem descriptions 2. Idea about the functionality based on the results from the ML model: - Sort/Rank the Products and the Problems/Issues in the selection list boxes by probability returned from the ML model. Expected Result: Decrease of the wrong selections based on the assumption that the user will find the correct selection options at the beginning of the list. This is an example of usefulness of the ML models even when they cannot solve a problem completely.
  • 25. Operationalization 1. Azure ML creates automatically REST Web service 2. Azure ML provides an easy way to deploy the production version of the model on a production environment. 3. Performance – slower than TLC 4. Poor debugging capabilities. 5. Poor code instrumentation/troubleshooting capabilities 6. Scalability – deployment on a limited set of machine (16) Consider all above proc/cons when making decision to have AML production model.

Hinweis der Redaktion

  1. But the reality is, we ask the customer to select from too many topics, many which are confused with others. Customers ability to reliably select the right symptom falls to 25% when compared what the Support Engineer would choose. (NOTE: we are moving to PFAs for a better ‘ground truth’) While the symptom tree (called support topics) is only 3 levels deep, it is very broad and growing as new products and features are introduced to O365. As you can see, the top ‘Service’ level has 19 classes. Each service has between 3 and 23 issue groups (we call features), and each feature bucket contains anywhere from 4 to 38 issues. Overall, there are 1,300 possible topics to choose from! The dilemma is how to surface a reduced but relevant taxonomy to a customer. What’s the cost? When the customer does not properly self-classify, they can’t be provided the best self-help. The customers consequently submits a service request or makes a call to assisted support where the cost per service request is high. In correct symptom self classification also increases the chance for miss-routes - the wrong team getting the request. Even should one argue that customers self-selects at a 80% accuracy rather than 25%, the costs, with a volume of 100,000 cases per month is in the millions per year.
  2. Here is a screen shot of the existing customer experience in Office 365, where the customer first selects their top service level, then the feature and symptom level, and then describe their issue.
  3. # of apps support big data / data lake solutions (COSMOS/HDI etc) # of apps enabled for near real time services # of apps supporting data insights # of applications supporting self service capabilities