A lecture on text analytics - 3 types of opportunities, 3 use cases, 3 dos and 3 don'ts.
Get the hang of how to go about solving a text-related business problem using text analytics.
2. What I am going to talk about.
Text Analytics
1. Examine 3 kinds of opportunities
2. Discuss 3 text analytics problems
3. Touch upon 3 things to watch out for
and 3 things to embrace.
3. What if we can master “text”?
What do we get from it?
There are opportunities in every vertical:
1. Aerospace / Defense / Automotive –-- Filing of various routine documents /
Technical specification standardization / Competitive intelligence and
customer feedback management
4. What if we can master “text”?
What do we get from it?
There are opportunities in every vertical:
1. Aerospace / Defense / Automotive –-- Filing of various routine documents /
Technical specification standardization / Competitive intelligence and
customer feedback management
2. Healthcare / Life sciences –-- Reporting / Storing relevant patents and
publications / Analysis of research and competitive intelligence
5. What if we can master “text”?
What do we get from it?
There are opportunities in every vertical:
1. Aerospace / Defense / Automotive –-- Filing of various routine documents /
Technical specification standardization / Competitive intelligence and
customer feedback management
2. Healthcare / Life sciences –-- Reporting / Storing relevant patents and
publications / Analysis of research and competitive intelligence
3. Legal and Government –-- Legal and administrative filings / Case document
and administrative record management / Analysis of legal and
administrative documents (land records, case files)
6. What if we can master “text”?
What do we get from it?
Do you observe a pattern?
In every vertical …
Output Text / Store and Transform Text / Ingest and Analyze Text
7. How do we unlock
the value in “text”?
Output Text / Store and Transform Text / Ingest and Analyze Text
Natural Language Generation Natural Language Understanding
Natural Language Processing (aka Text Analytics)
8. Use Case 1:
Customer Service
Let’s say you have some text … … and a database or spreadsheet with columns
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
… and you have to fill in the database fields
from the information in the text …
Reporter Location (of
Reporter)
Product
9. Use Case 1:
Land Records
Let’s say you have some text … … and a database or spreadsheet with columns
“Property K45L234
(lot 23-24) in Wake County
of 3000 sq ft
was sold to James Fischer
on 3-30-1997 …”
… and you have to fill in the database fields
from the information in the text …
10. Use Case 1:
Land Records
Let’s say you have some text … … and a database or spreadsheet with columns
“Property K45L234
(lot 23-24) in Wake County
of 3000 sq ft
was sold to James Fischer
on 3-30-1997 …”
… and you have to fill in the database fields
from the information in the text …
Title Number Lot County
11. Use Case 1:
M&A Transactions
Let’s say you have some text … … and a database or spreadsheet with columns
“Acme Financials, a subsidiary
of Lehman Sisters, was acquired
by John Doe Corp on 5/26/2001.”
… and you have to fill in the database fields
from the information in the text …
12. Use Case 1:
M&A Transactions
Let’s say you have some text … … and a database or spreadsheet with columns
“Acme Financials, a subsidiary
of Lehman Sisters, was acquired
by John Doe Corp on 5/26/2001.”
… and you have to fill in the database fields
from the information in the text …
Acquirer Acquired Date
13. Use Case 1: Customer Service
[ Information Extraction ]
Let’s say you have some text … … and a database or spreadsheet with columns
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Identifying entities and the relations between them
Reporter Location (of
Reporter)
Product
14. Use Case 1: Customer Service
[ Information Extraction ]
Let’s say you have some text … … and a database or spreadsheet with columns
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Identifying entities and the relations between them
Reporter Location Product
John
Chambers
Springfield,
MA
Ford
Ranger
15. Use Case 1: Customer Service
[ Information Extraction ]
Relations tell you about the connections between entities.
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Relations connect the entities that belong in a row.
Identifying entities and the relations between them
Reporter Location Product
John
Chambers
Springfield,
MA
Ford
Ranger
Location of Reporter
16. Use Case 1: Customer Service
[ Information Extraction ]
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Information extraction converts:
unstructured information into structured information.
Identifying entities and the relations between them
Reporter Location Product
John
Chambers
Springfield,
MA
Ford
Ranger
17. Use Case 1: Customer Service
[ Information Extraction ]
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Information extraction can improve efficiencies
in processes where humans read text and copy fields into databases.
Identifying entities and the relations between them
Reporter Location Product
John
Chambers
Springfield,
MA
Ford
Ranger
18. Use Case 1: Customer Service
[ Information Extraction ]
How can text analytics methods be used
to automate entity and relation extraction?
Rule based methods Machine learning methods
Aiaioo Labs aiaioo.com
19. Use Case 1: Customer Service
[ Information Extraction ]
Rule-based frameworks for entity and relation extraction?
http://services.gate.ac.uk/annie/
20. Use Case 1: Customer Service
[ Information Extraction ]
21. Use Case 1: Customer Service
[ Information Extraction ]
It uses lists of first names and last names of persons, and names of
places … and matches them in the text …
How does GATE/Annie identify entities and the relations?
“John Chambers of Springfield, MA reported a problem with the clutch
on his Ford Ranger purchased in Boston, MA in 2005.”
“Jack”
“Jill”
“John”
“Chambers”
“Miller”
“Farnsworth”
“Springfield”
“Boston”
“Cambridge”
“MA”
“CA”
“MD”
22. Use Case 1: Customer Service
[ Information Extraction ]
Machine learning frameworks for entity and relation extraction?
https://opennlp.apache.org/
Apache OpenNLP
23. Use Case 1: Customer Service
[ Information Extraction ]
Machine learning frameworks need training data.
https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html
24. Use Case 1: Customer Service
[ Information Extraction ]
From examples such as:
It learns to recognize:
How does OpenNLP identify entities and the relations?
“John Chambers of Springfield, MA reported a problem with the clutch
on his Ford Ranger purchased in Boston, MA in 2005.”
“<START:reporter>John Archer<END> of <START:location>Maryland<END>
reported a problem with his <START:product>Figo<END>.”
“<START:reporter>Vince Chambers<END> of <START:location>Denver,
CO<END> had trouble with his <START:product>Focus<END>.”
25. Use Case 1: Customer Service
[ Information Extraction ]
How to choose between text analytics methods
for entity and relation extraction?
Rule based methods Machine learning methods
3 months to reasonably performing model
Typically higher precision
Typically less flexibility
Typically less recall
1+ years to reasonably performing model
Typically lower precision
Typically more flexibility
Typically higher recall + overall performance
26. 5’11”
5’ 8”
Can you classify these door heights as: Short / Tall ?
5’8”
5’11” 6’2”
6’6”
5’ 2”
6’8”
6’9”
6’10”
Aiaioo Labs aiaioo.com
27. 5’11”
5’ 8”
In analytics, an analyst comes up
with a rule.
5’8”
5’11” 6’2”
6’6”
5’ 2”
6’8”
6’9”
6’10”
If door_height < 6’ then Short else Tall
Aiaioo Labs aiaioo.com
28. 5’11”
5’ 8”
In machine learning, the computer comes up with a
rule from examples.
5’8”
5’11” 6’2”
6’6”
5’ 2”
6’8”
6’9”
6’10”
Aiaioo Labs aiaioo.com
29. How do we unlock
the value in “text”?
The first use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Information Extraction
Identifying entities and the relations between them
Aiaioo Labs aiaioo.com
30. How do we unlock
the value in “text”?
The second use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Text Categorization
Labeling text with one or more category labels
Aiaioo Labs aiaioo.com
31. Use Case 2:
Organizing Text for Storage
Let’s say you have some text … … and you want to mark it as one of …
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Report
Inquiry
Aiaioo Labs aiaioo.com
32. Use Case 2: Organizing Text
[ Text Categorization ]
Start by collecting some samples of documents
of each of your categories
Report Inquiry
I have a problem
This complaint is about
Where can I buy a
Do you sell furniture
Aiaioo Labs aiaioo.com
33. Use Case 2: Organizing Text
[ Text Categorization ]
Train a classifierwith them.
Aiaioo Labs aiaioo.com
Report Inquiry
I have a problem
This complaint is about
Where can I buy a
Do you sell furniture
34. Use Case 2: Organizing Text
[ Text Categorization ]
Start by collecting some samples of documents
of each of your categories
Politics Sports
The United Nations
The United States and
Manchester United
Manchester and Barca
Aiaioo Labs aiaioo.com
35. Use Case 2: Organizing Text
[ Text Categorization ]
Train a classifierwith them.
Politics Sports
The United Nations
The United States and
Manchester United
Manchester and Barca
Aiaioo Labs aiaioo.com
36. Use Case 2: Organizing Text
[ Text Categorization ]
Run the classifieron a new piece of text.
The classifierwill return a label.
Politics
Nations and States
Aiaioo Labs aiaioo.com
37. Use Case 2: Organizing Text
[ Text Categorization ]
How can text analytics methods be used
to automate organization/categorization?
Rule based methods Machine learning methods
Aiaioo Labs aiaioo.com
38. Use Case 2: Organizing Text
[ Text Categorization ]
But rule-based methods work for classification too.
Rule-based text categorization is often used in:
Social media sentiment classification
Aiaioo Labs aiaioo.com
39. Use Case 2: Organizing Text
[ Text Categorization ]
We use lists of negative and positive words (usually adjectives)
(available in the AFINN gazetteer) … and match them in the text …
How do we use rules to identify sentiment?
“I am sad that Steve Jobs died.”
“sad”
“bad”
“evil”
“distraught”
“dead”
“died”
“thrilled”
“excited”
“amazed”
“happy”
“love”
“joy”
Aiaioo Labs aiaioo.com
40. Use Case 2: Organizing Text
[ Text Categorization ]
Can we use entity and relation extraction to do better?
“I am sad that [Steve Jobs died].”
Analysis: This person holds a positive opinion
of Steve Jobs
The –ve entity ‘sad’ is related to the –ve event ‘Steve Jobs died’.
Aiaioo Labs aiaioo.com
41. Use Case 2: Organizing Text
[ Text Categorization ]
How to choose between text analytics methods
for text categorization?
Rule based methods Machine learning methods
Typically higher precision
Typically less flexibility
Typically less recall
Typically lower precision
Typically more flexibility
Typically higher recall + overall performance
Aiaioo Labs aiaioo.com
42. How do we unlock
the value in “text”?
The first use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Information Extraction
Identifying entities and the relations between them
Aiaioo Labs aiaioo.com
43. How do we unlock
the value in “text”?
The second use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Text Categorization
Labeling text with one or more category labels
Aiaioo Labs aiaioo.com
44. How do we unlock
the value in “text”?
The third use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Question Answering
Generating a response to an inquiry
Aiaioo Labs aiaioo.com
45. Use Case 3:
Answering Questions
Let’s say you get a question … … and you want to answer to be one of …
“Do you ship your cars to Boston, MA?” Yes
No
Aiaioo Labs aiaioo.com
46. Use Case 3:
Answering Questions
First you classify the question into one of 3 types… and these are…
“Do you ship your cars to Boston, MA?”
“Who is the CEO of Apple?”
“Why is the sky blue?”
Yes/No questions
Factoid questions
Non-factoid questions
Aiaioo Labs aiaioo.com
47. Use Case 3:
Answering Questions
Look for answers in databases that you created using entity / relationship extraction
“Do you ship your cars to Boston, MA?”
“Who is the CEO of Apple?”
“Why is the sky blue?”
Product Ships To
Cars USA
CEO Firm
Tim Cook Apple
Aiaioo Labs aiaioo.com
48. To watch out for:
Text Analytics Traps
1. Testing on Training Data
2. Using US Training Data for India
3. Treating all Data Sources as One
Aiaioo Labs aiaioo.com
49. To embrace:
Text Analytics Tricks
1. UI Compensation for AI Inaccuracy
2. Raising Precision at the Cost of Recall
3. Domain Specific Rules
Aiaioo Labs aiaioo.com
50. About Aiaioo Labs
AI Research Lab
1. http://aiaioo.com
2. http://aiaioo.com/publications
3. http://aiaioo.wordpress.com
Aiaioo Labs aiaioo.com