SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
AN	
  INTRODUCTION	
  TO	
  TOPIC	
  MODELING	
  
Turning	
  text	
  into	
  insight:	
  	
  
Handling	
  Raw,	
  Unlabeled	
  Text	
  
§  Common	
  Datasets:	
  
ª  Product/	
  Customer	
  Reviews	
  
ª  Call	
  Center	
  Transcripts	
  
ª  News	
  Paper	
  Articles	
  
ª  Legal	
  Documents	
  
§  Common	
  Tasks:	
  
ª  Find	
  documents	
  were	
  interested	
  in?	
  
ª  Categorize	
  documents?	
  
ª  Retrieve	
  information?	
  
2	
  
Handling	
  Raw,	
  Unlabeled	
  Text	
  
3	
  
§  Common	
  Datasets:	
  
ª  Product/	
  Customer	
  Reviews	
  
ª  Call	
  Center	
  Transcripts	
  
ª  News	
  Paper	
  Articles	
  
ª  Legal	
  Documents	
  
§  Common	
  Tasks:	
  
ª  Find	
  documents	
  were	
  
interested	
  in?	
  
ª  Categorize	
  documents?	
  
ª  Retrieve	
  information?	
  
§  The	
  Challenge	
  
ª  Normal	
  quantitative	
  approaches	
  don’t	
  work	
  with	
  text.	
  
ª  Datasets	
  are	
  large,	
  complicated,	
  sparse,	
  and	
  unwieldy.	
  
ª  Data	
  is	
  often	
  unlabeled.	
  
	
  
Example:	
  Understanding	
  Customer	
  Reviews	
  
4	
  
§  Mon	
  Ami	
  Gabi	
  is	
  a	
  restaurant	
  in	
  the	
  
Paris	
  Paris	
  Hotel	
  and	
  Casino.	
  
§  Thousands	
  of	
  customer	
  reviews	
  
for	
  the	
  restaurant	
  over	
  the	
  last	
  	
  
8	
  years.	
  
What	
  are	
  	
  
customers	
  	
  
saying?	
  
Excellent	
  breakfast	
  
menu.	
  They	
  just	
  
need	
  to	
  hire	
  more	
  
staff	
  to	
  have	
  a	
  
better	
  service.	
  
Great	
  place	
  	
  
for	
  brunch!	
  
Highly	
  recommend	
  	
  
the	
  steak	
  and	
  fries	
  
	
  and	
  sitting	
  outside.	
  
Had	
  a	
  great	
  meal	
  with	
  
a	
  great	
  atmosphere	
  
Food	
  was	
  ok…	
  
What	
  it	
  has	
  going	
  
for	
  it	
  is	
  the	
  view	
  
from	
  the	
  outside	
  
terrace.	
  
Topic	
  Modeling:	
  Framework	
  
5	
  
Excellent	
  breakfast	
  	
  
menu.	
  They	
  just	
  need	
  	
  
to	
  hire	
  more	
  staff	
  to	
  have	
  	
  
a	
  better	
  service	
  
Breakfast
Quality	
  of	
  Service	
  
breakfast	
  
better	
  
service	
  
staff	
  
Documents	
   Topics	
   Words	
  and	
  Phrases	
  
Topic	
  Modeling:	
  Preprocessing	
  
6	
  
§  Tokenize:	
  Extract	
  meaningful	
  units	
  from	
  sentences	
  
ª  I	
  ordered	
  a	
  french	
  toast	
  
ª  Regular	
  expression	
  cleanup,	
  end-­‐of-­‐line	
  hyphenation,	
  contraction,	
  
and	
  sentence-­‐initial	
  capitalization	
  rules.	
  	
  
§  Stemming	
  Algorithm:	
  Consolidate	
  feature	
  space	
  into	
  word	
  
stems	
  or	
  lemmas	
  
ª  {I,	
  ordered,	
  a,	
  french	
  toast}	
  
ª  Suffix	
  stripping,	
  part	
  of	
  speech	
  tagging	
  
§  Matrix	
  Factorization:	
  Convert	
  text	
  into	
  data	
  structure	
  for	
  
learning	
  algorithms.	
  
ª  Word-­‐document	
  matrices	
  often	
  have	
  1,000,000,000,000+	
  values.	
  
Need	
  special	
  compression	
  algorithms	
  to	
  make	
  data	
  manageable.	
  
{I,	
  ordered,	
  a,	
  french	
  toast}	
  
{I,	
  order,	
  a,	
  french	
  toast}	
  
Topic	
  Modeling:	
  Estimation	
  with	
  Gibbs	
  Sampler	
  
7	
  
ª  Use	
  Markov	
  Chain	
  Monte	
  Carlo	
  methods	
  to	
  simulate	
  our	
  document-­‐topic	
  and	
  topic-­‐
word	
  probability	
  distributions.	
  
ª  Results:	
  
Topic-­‐Word	
  
Breakfast	
   Service	
  
Breakfast:	
  0.31	
   Service:	
  0.28	
  
Eggs:	
  0.27	
   Staff:	
  0.24	
  
Coffee:	
  0.24	
   Friendly:	
  0.21	
  
Document-­‐Topic	
  
The	
  french	
  toast	
  was	
  great	
   The	
  staff	
  was	
  great,	
  but	
  the	
  
outdoor	
  patio	
  was	
  a	
  bit	
  noisy.	
  
French	
  Toast:	
  0.71	
   Service:	
  0.51	
  
Breakfast:	
  0.25	
   Environment:	
  0.44	
  
Service:	
  0.03	
   Breakfast:	
  0.02	
  
Harnessing	
  the	
  Model:	
  Topic	
  Frequency	
  
8	
  
What	
  are	
  my	
  customers	
  talking	
  
about?	
  
Harnessing	
  the	
  Model:	
  Evaluate	
  Products	
  and	
  Verticals	
  
9	
  
How	
  do	
  customers	
  feel	
  about	
  my	
  
products?	
  
Harnessing	
  the	
  Model:	
  Temporal	
  Insights	
  
10	
  
How	
  has	
  customer	
  sentiment	
  
evolved	
  among	
  my	
  product	
  lines	
  
over	
  time?	
  
Harnessing	
  the	
  Model:	
  Deep	
  Product	
  Insights	
  
11	
  
Which	
  properties	
  of	
  French	
  Toast	
  
drive	
  satisfaction	
  (or	
  
dissatisfaction)?	
  
Thank	
  you.	
  

Weitere ähnliche Inhalte

Ähnlich wie Turning Text Into Insights: An Introduction to Topic Models

Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Avkash Chauhan
 
Argosy university eng 096
Argosy university eng 096Argosy university eng 096
Argosy university eng 096leesa marteen
 
Graphs in the Real World
Graphs in the Real WorldGraphs in the Real World
Graphs in the Real WorldNeo4j
 
Rob Brown portfolio full pdf
Rob Brown portfolio full pdfRob Brown portfolio full pdf
Rob Brown portfolio full pdfRob Brown
 
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...MongoDB
 
Turning Waffle into Magic
Turning Waffle into MagicTurning Waffle into Magic
Turning Waffle into MagicRobert Bullard
 
You're testing what!
You're testing what!You're testing what!
You're testing what!Nexer Digital
 
Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy
Turning XML to XLS on the JVM, without loosing your Sanity, with GroovyTurning XML to XLS on the JVM, without loosing your Sanity, with Groovy
Turning XML to XLS on the JVM, without loosing your Sanity, with Groovygagravarr
 
Taus summit levels_of_pe
Taus summit levels_of_peTaus summit levels_of_pe
Taus summit levels_of_peRobert Martin
 
Conversion Optimization: The World Beyond Headlines & Button Color
Conversion Optimization: The World Beyond Headlines & Button ColorConversion Optimization: The World Beyond Headlines & Button Color
Conversion Optimization: The World Beyond Headlines & Button ColorOptimizely
 
Lean Enterprise Experience Canves
Lean Enterprise Experience CanvesLean Enterprise Experience Canves
Lean Enterprise Experience CanvesCatchi
 
How Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with SnowplowHow Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with SnowplowGiuseppe Gaviani
 
Case_Interview_Training.pdf
Case_Interview_Training.pdfCase_Interview_Training.pdf
Case_Interview_Training.pdfHiAnhNguynLng
 
1st Annual National Forum Clarion Case Competition Report .docx
1st Annual National Forum Clarion Case Competition Report .docx1st Annual National Forum Clarion Case Competition Report .docx
1st Annual National Forum Clarion Case Competition Report .docxherminaprocter
 
Georgetown Data Science - Team BuzzFeed
Georgetown Data Science - Team BuzzFeed Georgetown Data Science - Team BuzzFeed
Georgetown Data Science - Team BuzzFeed Joshua Erb
 

Ähnlich wie Turning Text Into Insights: An Introduction to Topic Models (20)

Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
 
Argosy university eng 096
Argosy university eng 096Argosy university eng 096
Argosy university eng 096
 
Graphs in the Real World
Graphs in the Real WorldGraphs in the Real World
Graphs in the Real World
 
Rob Brown portfolio full pdf
Rob Brown portfolio full pdfRob Brown portfolio full pdf
Rob Brown portfolio full pdf
 
Essay About Community
Essay About CommunityEssay About Community
Essay About Community
 
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
MongoDB .local Bengaluru 2019: A Complete Methodology to Data Modeling for Mo...
 
Turning Waffle into Magic
Turning Waffle into MagicTurning Waffle into Magic
Turning Waffle into Magic
 
40 Email Strategies
40 Email Strategies40 Email Strategies
40 Email Strategies
 
Optimisation vs prediction
Optimisation vs predictionOptimisation vs prediction
Optimisation vs prediction
 
You're testing what!
You're testing what!You're testing what!
You're testing what!
 
Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy
Turning XML to XLS on the JVM, without loosing your Sanity, with GroovyTurning XML to XLS on the JVM, without loosing your Sanity, with Groovy
Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy
 
Dynamic Quality Revisited - Lena Marg (Welocalize)
Dynamic Quality Revisited - Lena Marg (Welocalize)Dynamic Quality Revisited - Lena Marg (Welocalize)
Dynamic Quality Revisited - Lena Marg (Welocalize)
 
Taus summit levels_of_pe
Taus summit levels_of_peTaus summit levels_of_pe
Taus summit levels_of_pe
 
Conversion Optimization: The World Beyond Headlines & Button Color
Conversion Optimization: The World Beyond Headlines & Button ColorConversion Optimization: The World Beyond Headlines & Button Color
Conversion Optimization: The World Beyond Headlines & Button Color
 
Lean Enterprise Experience Canves
Lean Enterprise Experience CanvesLean Enterprise Experience Canves
Lean Enterprise Experience Canves
 
How Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with SnowplowHow Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with Snowplow
 
Case_Interview_Training.pdf
Case_Interview_Training.pdfCase_Interview_Training.pdf
Case_Interview_Training.pdf
 
1st Annual National Forum Clarion Case Competition Report .docx
1st Annual National Forum Clarion Case Competition Report .docx1st Annual National Forum Clarion Case Competition Report .docx
1st Annual National Forum Clarion Case Competition Report .docx
 
Eskm20140903
Eskm20140903Eskm20140903
Eskm20140903
 
Georgetown Data Science - Team BuzzFeed
Georgetown Data Science - Team BuzzFeed Georgetown Data Science - Team BuzzFeed
Georgetown Data Science - Team BuzzFeed
 

Kürzlich hochgeladen

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Kürzlich hochgeladen (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Turning Text Into Insights: An Introduction to Topic Models

  • 1. AN  INTRODUCTION  TO  TOPIC  MODELING   Turning  text  into  insight:    
  • 2. Handling  Raw,  Unlabeled  Text   §  Common  Datasets:   ª  Product/  Customer  Reviews   ª  Call  Center  Transcripts   ª  News  Paper  Articles   ª  Legal  Documents   §  Common  Tasks:   ª  Find  documents  were  interested  in?   ª  Categorize  documents?   ª  Retrieve  information?   2  
  • 3. Handling  Raw,  Unlabeled  Text   3   §  Common  Datasets:   ª  Product/  Customer  Reviews   ª  Call  Center  Transcripts   ª  News  Paper  Articles   ª  Legal  Documents   §  Common  Tasks:   ª  Find  documents  were   interested  in?   ª  Categorize  documents?   ª  Retrieve  information?   §  The  Challenge   ª  Normal  quantitative  approaches  don’t  work  with  text.   ª  Datasets  are  large,  complicated,  sparse,  and  unwieldy.   ª  Data  is  often  unlabeled.    
  • 4. Example:  Understanding  Customer  Reviews   4   §  Mon  Ami  Gabi  is  a  restaurant  in  the   Paris  Paris  Hotel  and  Casino.   §  Thousands  of  customer  reviews   for  the  restaurant  over  the  last     8  years.   What  are     customers     saying?   Excellent  breakfast   menu.  They  just   need  to  hire  more   staff  to  have  a   better  service.   Great  place     for  brunch!   Highly  recommend     the  steak  and  fries    and  sitting  outside.   Had  a  great  meal  with   a  great  atmosphere   Food  was  ok…   What  it  has  going   for  it  is  the  view   from  the  outside   terrace.  
  • 5. Topic  Modeling:  Framework   5   Excellent  breakfast     menu.  They  just  need     to  hire  more  staff  to  have     a  better  service   Breakfast Quality  of  Service   breakfast   better   service   staff   Documents   Topics   Words  and  Phrases  
  • 6. Topic  Modeling:  Preprocessing   6   §  Tokenize:  Extract  meaningful  units  from  sentences   ª  I  ordered  a  french  toast   ª  Regular  expression  cleanup,  end-­‐of-­‐line  hyphenation,  contraction,   and  sentence-­‐initial  capitalization  rules.     §  Stemming  Algorithm:  Consolidate  feature  space  into  word   stems  or  lemmas   ª  {I,  ordered,  a,  french  toast}   ª  Suffix  stripping,  part  of  speech  tagging   §  Matrix  Factorization:  Convert  text  into  data  structure  for   learning  algorithms.   ª  Word-­‐document  matrices  often  have  1,000,000,000,000+  values.   Need  special  compression  algorithms  to  make  data  manageable.   {I,  ordered,  a,  french  toast}   {I,  order,  a,  french  toast}  
  • 7. Topic  Modeling:  Estimation  with  Gibbs  Sampler   7   ª  Use  Markov  Chain  Monte  Carlo  methods  to  simulate  our  document-­‐topic  and  topic-­‐ word  probability  distributions.   ª  Results:   Topic-­‐Word   Breakfast   Service   Breakfast:  0.31   Service:  0.28   Eggs:  0.27   Staff:  0.24   Coffee:  0.24   Friendly:  0.21   Document-­‐Topic   The  french  toast  was  great   The  staff  was  great,  but  the   outdoor  patio  was  a  bit  noisy.   French  Toast:  0.71   Service:  0.51   Breakfast:  0.25   Environment:  0.44   Service:  0.03   Breakfast:  0.02  
  • 8. Harnessing  the  Model:  Topic  Frequency   8   What  are  my  customers  talking   about?  
  • 9. Harnessing  the  Model:  Evaluate  Products  and  Verticals   9   How  do  customers  feel  about  my   products?  
  • 10. Harnessing  the  Model:  Temporal  Insights   10   How  has  customer  sentiment   evolved  among  my  product  lines   over  time?  
  • 11. Harnessing  the  Model:  Deep  Product  Insights   11   Which  properties  of  French  Toast   drive  satisfaction  (or   dissatisfaction)?