SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Cohan Sujay Carlos
CEO, Aiaioo Labs
Fun with Text
Managing Text Analytics
What I am going to talk about.
Text Analytics
1. Examine 3 kinds of opportunities
2. Discuss 3 text analytics problems
3. Touch upon 3 things to watch out for
and 3 things to embrace.
What if we can master “text”?
What do we get from it?
There are opportunities in every vertical:
1. Aerospace / Defense / Automotive –-- Filing of various routine documents /
Technical specification standardization / Competitive intelligence and
customer feedback management
What if we can master “text”?
What do we get from it?
There are opportunities in every vertical:
1. Aerospace / Defense / Automotive –-- Filing of various routine documents /
Technical specification standardization / Competitive intelligence and
customer feedback management
2. Healthcare / Life sciences –-- Reporting / Storing relevant patents and
publications / Analysis of research and competitive intelligence
What if we can master “text”?
What do we get from it?
There are opportunities in every vertical:
1. Aerospace / Defense / Automotive –-- Filing of various routine documents /
Technical specification standardization / Competitive intelligence and
customer feedback management
2. Healthcare / Life sciences –-- Reporting / Storing relevant patents and
publications / Analysis of research and competitive intelligence
3. Legal and Government –-- Legal and administrative filings / Case document
and administrative record management / Analysis of legal and
administrative documents (land records, case files)
What if we can master “text”?
What do we get from it?
Do you observe a pattern?
In every vertical …
Output Text / Store and Transform Text / Ingest and Analyze Text
How do we unlock
the value in “text”?
Output Text / Store and Transform Text / Ingest and Analyze Text
Natural Language Generation Natural Language Understanding
Natural Language Processing (aka Text Analytics)
Use Case 1:
Customer Service
Let’s say you have some text … … and a database or spreadsheet with columns
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
… and you have to fill in the database fields
from the information in the text …
Reporter Location (of
Reporter)
Product
Use Case 1:
Land Records
Let’s say you have some text … … and a database or spreadsheet with columns
“Property K45L234
(lot 23-24) in Wake County
of 3000 sq ft
was sold to James Fischer
on 3-30-1997 …”
… and you have to fill in the database fields
from the information in the text …
Use Case 1:
Land Records
Let’s say you have some text … … and a database or spreadsheet with columns
“Property K45L234
(lot 23-24) in Wake County
of 3000 sq ft
was sold to James Fischer
on 3-30-1997 …”
… and you have to fill in the database fields
from the information in the text …
Title Number Lot County
Use Case 1:
M&A Transactions
Let’s say you have some text … … and a database or spreadsheet with columns
“Acme Financials, a subsidiary
of Lehman Sisters, was acquired
by John Doe Corp on 5/26/2001.”
… and you have to fill in the database fields
from the information in the text …
Use Case 1:
M&A Transactions
Let’s say you have some text … … and a database or spreadsheet with columns
“Acme Financials, a subsidiary
of Lehman Sisters, was acquired
by John Doe Corp on 5/26/2001.”
… and you have to fill in the database fields
from the information in the text …
Acquirer Acquired Date
Use Case 1: Customer Service
[ Information Extraction ]
Let’s say you have some text … … and a database or spreadsheet with columns
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Identifying entities and the relations between them
Reporter Location (of
Reporter)
Product
Use Case 1: Customer Service
[ Information Extraction ]
Let’s say you have some text … … and a database or spreadsheet with columns
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Identifying entities and the relations between them
Reporter Location Product
John
Chambers
Springfield,
MA
Ford
Ranger
Use Case 1: Customer Service
[ Information Extraction ]
Relations tell you about the connections between entities.
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Relations connect the entities that belong in a row.
Identifying entities and the relations between them
Reporter Location Product
John
Chambers
Springfield,
MA
Ford
Ranger
Location of Reporter
Use Case 1: Customer Service
[ Information Extraction ]
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Information extraction converts:
unstructured information into structured information.
Identifying entities and the relations between them
Reporter Location Product
John
Chambers
Springfield,
MA
Ford
Ranger
Use Case 1: Customer Service
[ Information Extraction ]
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Information extraction can improve efficiencies
in processes where humans read text and copy fields into databases.
Identifying entities and the relations between them
Reporter Location Product
John
Chambers
Springfield,
MA
Ford
Ranger
Use Case 1: Customer Service
[ Information Extraction ]
How can text analytics methods be used
to automate entity and relation extraction?
Rule based methods Machine learning methods
Aiaioo Labs aiaioo.com
Use Case 1: Customer Service
[ Information Extraction ]
Rule-based frameworks for entity and relation extraction?
http://services.gate.ac.uk/annie/
Use Case 1: Customer Service
[ Information Extraction ]
Use Case 1: Customer Service
[ Information Extraction ]
It uses lists of first names and last names of persons, and names of
places … and matches them in the text …
How does GATE/Annie identify entities and the relations?
“John Chambers of Springfield, MA reported a problem with the clutch
on his Ford Ranger purchased in Boston, MA in 2005.”
“Jack”
“Jill”
“John”
“Chambers”
“Miller”
“Farnsworth”
“Springfield”
“Boston”
“Cambridge”
“MA”
“CA”
“MD”
Use Case 1: Customer Service
[ Information Extraction ]
Machine learning frameworks for entity and relation extraction?
https://opennlp.apache.org/
Apache OpenNLP
Use Case 1: Customer Service
[ Information Extraction ]
Machine learning frameworks need training data.
https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html
Use Case 1: Customer Service
[ Information Extraction ]
From examples such as:
It learns to recognize:
How does OpenNLP identify entities and the relations?
“John Chambers of Springfield, MA reported a problem with the clutch
on his Ford Ranger purchased in Boston, MA in 2005.”
“<START:reporter>John Archer<END> of <START:location>Maryland<END>
reported a problem with his <START:product>Figo<END>.”
“<START:reporter>Vince Chambers<END> of <START:location>Denver,
CO<END> had trouble with his <START:product>Focus<END>.”
Use Case 1: Customer Service
[ Information Extraction ]
How to choose between text analytics methods
for entity and relation extraction?
Rule based methods Machine learning methods
3 months to reasonably performing model
Typically higher precision
Typically less flexibility
Typically less recall
1+ years to reasonably performing model
Typically lower precision
Typically more flexibility
Typically higher recall + overall performance
5’11”
5’ 8”
Can you classify these door heights as: Short / Tall ?
5’8”
5’11” 6’2”
6’6”
5’ 2”
6’8”
6’9”
6’10”
Aiaioo Labs aiaioo.com
5’11”
5’ 8”
In analytics, an analyst comes up
with a rule.
5’8”
5’11” 6’2”
6’6”
5’ 2”
6’8”
6’9”
6’10”
If door_height < 6’ then Short else Tall
Aiaioo Labs aiaioo.com
5’11”
5’ 8”
In machine learning, the computer comes up with a
rule from examples.
5’8”
5’11” 6’2”
6’6”
5’ 2”
6’8”
6’9”
6’10”
Aiaioo Labs aiaioo.com
How do we unlock
the value in “text”?
The first use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Information Extraction
Identifying entities and the relations between them
Aiaioo Labs aiaioo.com
How do we unlock
the value in “text”?
The second use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Text Categorization
Labeling text with one or more category labels
Aiaioo Labs aiaioo.com
Use Case 2:
Organizing Text for Storage
Let’s say you have some text … … and you want to mark it as one of …
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Report
Inquiry
Aiaioo Labs aiaioo.com
Use Case 2: Organizing Text
[ Text Categorization ]
Start by collecting some samples of documents
of each of your categories
Report Inquiry
I have a problem
This complaint is about
Where can I buy a
Do you sell furniture
Aiaioo Labs aiaioo.com
Use Case 2: Organizing Text
[ Text Categorization ]
Train a classifierwith them.
Aiaioo Labs aiaioo.com
Report Inquiry
I have a problem
This complaint is about
Where can I buy a
Do you sell furniture
Use Case 2: Organizing Text
[ Text Categorization ]
Start by collecting some samples of documents
of each of your categories
Politics Sports
The United Nations
The United States and
Manchester United
Manchester and Barca
Aiaioo Labs aiaioo.com
Use Case 2: Organizing Text
[ Text Categorization ]
Train a classifierwith them.
Politics Sports
The United Nations
The United States and
Manchester United
Manchester and Barca
Aiaioo Labs aiaioo.com
Use Case 2: Organizing Text
[ Text Categorization ]
Run the classifieron a new piece of text.
The classifierwill return a label.
Politics
Nations and States
Aiaioo Labs aiaioo.com
Use Case 2: Organizing Text
[ Text Categorization ]
How can text analytics methods be used
to automate organization/categorization?
Rule based methods Machine learning methods
Aiaioo Labs aiaioo.com
Use Case 2: Organizing Text
[ Text Categorization ]
But rule-based methods work for classification too.
Rule-based text categorization is often used in:
Social media sentiment classification
Aiaioo Labs aiaioo.com
Use Case 2: Organizing Text
[ Text Categorization ]
We use lists of negative and positive words (usually adjectives)
(available in the AFINN gazetteer) … and match them in the text …
How do we use rules to identify sentiment?
“I am sad that Steve Jobs died.”
“sad”
“bad”
“evil”
“distraught”
“dead”
“died”
“thrilled”
“excited”
“amazed”
“happy”
“love”
“joy”
Aiaioo Labs aiaioo.com
Use Case 2: Organizing Text
[ Text Categorization ]
Can we use entity and relation extraction to do better?
“I am sad that [Steve Jobs died].”
Analysis: This person holds a positive opinion
of Steve Jobs
The –ve entity ‘sad’ is related to the –ve event ‘Steve Jobs died’.
Aiaioo Labs aiaioo.com
Use Case 2: Organizing Text
[ Text Categorization ]
How to choose between text analytics methods
for text categorization?
Rule based methods Machine learning methods
Typically higher precision
Typically less flexibility
Typically less recall
Typically lower precision
Typically more flexibility
Typically higher recall + overall performance
Aiaioo Labs aiaioo.com
How do we unlock
the value in “text”?
The first use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Information Extraction
Identifying entities and the relations between them
Aiaioo Labs aiaioo.com
How do we unlock
the value in “text”?
The second use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Text Categorization
Labeling text with one or more category labels
Aiaioo Labs aiaioo.com
How do we unlock
the value in “text”?
The third use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Question Answering
Generating a response to an inquiry
Aiaioo Labs aiaioo.com
Use Case 3:
Answering Questions
Let’s say you get a question … … and you want to answer to be one of …
“Do you ship your cars to Boston, MA?” Yes
No
Aiaioo Labs aiaioo.com
Use Case 3:
Answering Questions
First you classify the question into one of 3 types… and these are…
“Do you ship your cars to Boston, MA?”
“Who is the CEO of Apple?”
“Why is the sky blue?”
Yes/No questions
Factoid questions
Non-factoid questions
Aiaioo Labs aiaioo.com
Use Case 3:
Answering Questions
Look for answers in databases that you created using entity / relationship extraction
“Do you ship your cars to Boston, MA?”
“Who is the CEO of Apple?”
“Why is the sky blue?”
Product Ships To
Cars USA
CEO Firm
Tim Cook Apple
Aiaioo Labs aiaioo.com
To watch out for:
Text Analytics Traps
1. Testing on Training Data
2. Using US Training Data for India
3. Treating all Data Sources as One
Aiaioo Labs aiaioo.com
To embrace:
Text Analytics Tricks
1. UI Compensation for AI Inaccuracy
2. Raising Precision at the Cost of Recall
3. Domain Specific Rules
Aiaioo Labs aiaioo.com
About Aiaioo Labs
AI Research Lab
1. http://aiaioo.com
2. http://aiaioo.com/publications
3. http://aiaioo.wordpress.com
Aiaioo Labs aiaioo.com
THANK YOU
Aiaioo Labs aiaioo.com

Weitere ähnliche Inhalte

Ähnlich wie Fun with Text - Managing Text Analytics

Free Essays On Racism In Australia
Free Essays On Racism In AustraliaFree Essays On Racism In Australia
Free Essays On Racism In AustraliaJennifer Brown
 
Research steps, handout
Research steps, handoutResearch steps, handout
Research steps, handoutmrmahoney
 
Amazon Go Case Analysis1. Problem StatementThe key prob.docx
Amazon Go Case Analysis1. Problem StatementThe key  prob.docxAmazon Go Case Analysis1. Problem StatementThe key  prob.docx
Amazon Go Case Analysis1. Problem StatementThe key prob.docxgreg1eden90113
 
Can I Pay Someone To Write My Research Paper - The
Can I Pay Someone To Write My Research Paper - TheCan I Pay Someone To Write My Research Paper - The
Can I Pay Someone To Write My Research Paper - TheLaura Smith
 
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)Laura Chiticariu
 
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pubStephen Buxton
 
How To Write A Synthesis Ap Lang Essay. Online assignment writing service.
How To Write A Synthesis Ap Lang Essay. Online assignment writing service.How To Write A Synthesis Ap Lang Essay. Online assignment writing service.
How To Write A Synthesis Ap Lang Essay. Online assignment writing service.Cheryl Thompson
 
Daeil Kim: Machine Learning at the New York Times
Daeil Kim: Machine Learning at the New York TimesDaeil Kim: Machine Learning at the New York Times
Daeil Kim: Machine Learning at the New York Timesmortardata
 
Using Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York TimesUsing Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York TimesVivian S. Zhang
 
How To Write Paper Presentation In International Conference 2023
How To Write Paper Presentation In International Conference 2023How To Write Paper Presentation In International Conference 2023
How To Write Paper Presentation In International Conference 2023Amanda Detwiler
 
Essay On Abraham Lincoln In Hindi. Online assignment writing service.
Essay On Abraham Lincoln In Hindi. Online assignment writing service.Essay On Abraham Lincoln In Hindi. Online assignment writing service.
Essay On Abraham Lincoln In Hindi. Online assignment writing service.Yolanda Allrich
 
ESSAY ON SECTION 5 INFORMAL PROCESSES AND DISCRETION (Due 11.docx
ESSAY ON SECTION 5 INFORMAL PROCESSES AND DISCRETION (Due 11.docxESSAY ON SECTION 5 INFORMAL PROCESSES AND DISCRETION (Due 11.docx
ESSAY ON SECTION 5 INFORMAL PROCESSES AND DISCRETION (Due 11.docxtheodorelove43763
 
American Psychological Association (APA) information for .docx
American Psychological Association (APA) information for .docxAmerican Psychological Association (APA) information for .docx
American Psychological Association (APA) information for .docxdaniahendric
 
A language modeling framework for expert finding
A language modeling framework for expert findingA language modeling framework for expert finding
A language modeling framework for expert findingSaúl Vargas Sandoval
 
Example Of A Argumentative Essay. Argumentative Essay Topics for College Assi...
Example Of A Argumentative Essay. Argumentative Essay Topics for College Assi...Example Of A Argumentative Essay. Argumentative Essay Topics for College Assi...
Example Of A Argumentative Essay. Argumentative Essay Topics for College Assi...Nicole Heinen
 

Ähnlich wie Fun with Text - Managing Text Analytics (18)

Free Essays On Racism In Australia
Free Essays On Racism In AustraliaFree Essays On Racism In Australia
Free Essays On Racism In Australia
 
Research steps, handout
Research steps, handoutResearch steps, handout
Research steps, handout
 
Amazon Go Case Analysis1. Problem StatementThe key prob.docx
Amazon Go Case Analysis1. Problem StatementThe key  prob.docxAmazon Go Case Analysis1. Problem StatementThe key  prob.docx
Amazon Go Case Analysis1. Problem StatementThe key prob.docx
 
Can I Pay Someone To Write My Research Paper - The
Can I Pay Someone To Write My Research Paper - TheCan I Pay Someone To Write My Research Paper - The
Can I Pay Someone To Write My Research Paper - The
 
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
 
Data Architecture (i.e., normalization / relational algebra) and Database Sec...
Data Architecture (i.e., normalization / relational algebra) and Database Sec...Data Architecture (i.e., normalization / relational algebra) and Database Sec...
Data Architecture (i.e., normalization / relational algebra) and Database Sec...
 
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
 
Cis 500
Cis 500Cis 500
Cis 500
 
Pad 500
Pad 500Pad 500
Pad 500
 
How To Write A Synthesis Ap Lang Essay. Online assignment writing service.
How To Write A Synthesis Ap Lang Essay. Online assignment writing service.How To Write A Synthesis Ap Lang Essay. Online assignment writing service.
How To Write A Synthesis Ap Lang Essay. Online assignment writing service.
 
Daeil Kim: Machine Learning at the New York Times
Daeil Kim: Machine Learning at the New York TimesDaeil Kim: Machine Learning at the New York Times
Daeil Kim: Machine Learning at the New York Times
 
Using Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York TimesUsing Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York Times
 
How To Write Paper Presentation In International Conference 2023
How To Write Paper Presentation In International Conference 2023How To Write Paper Presentation In International Conference 2023
How To Write Paper Presentation In International Conference 2023
 
Essay On Abraham Lincoln In Hindi. Online assignment writing service.
Essay On Abraham Lincoln In Hindi. Online assignment writing service.Essay On Abraham Lincoln In Hindi. Online assignment writing service.
Essay On Abraham Lincoln In Hindi. Online assignment writing service.
 
ESSAY ON SECTION 5 INFORMAL PROCESSES AND DISCRETION (Due 11.docx
ESSAY ON SECTION 5 INFORMAL PROCESSES AND DISCRETION (Due 11.docxESSAY ON SECTION 5 INFORMAL PROCESSES AND DISCRETION (Due 11.docx
ESSAY ON SECTION 5 INFORMAL PROCESSES AND DISCRETION (Due 11.docx
 
American Psychological Association (APA) information for .docx
American Psychological Association (APA) information for .docxAmerican Psychological Association (APA) information for .docx
American Psychological Association (APA) information for .docx
 
A language modeling framework for expert finding
A language modeling framework for expert findingA language modeling framework for expert finding
A language modeling framework for expert finding
 
Example Of A Argumentative Essay. Argumentative Essay Topics for College Assi...
Example Of A Argumentative Essay. Argumentative Essay Topics for College Assi...Example Of A Argumentative Essay. Argumentative Essay Topics for College Assi...
Example Of A Argumentative Essay. Argumentative Essay Topics for College Assi...
 

Mehr von aiaioo

Document Analysis with Deep Learning
Document Analysis with Deep LearningDocument Analysis with Deep Learning
Document Analysis with Deep Learningaiaioo
 
Deep Learning through Pytorch Exercises
Deep Learning through Pytorch ExercisesDeep Learning through Pytorch Exercises
Deep Learning through Pytorch Exercisesaiaioo
 
Learning Non-Linear Functions for Text Classification
Learning Non-Linear Functions for Text ClassificationLearning Non-Linear Functions for Text Classification
Learning Non-Linear Functions for Text Classificationaiaioo
 
Arduino for Indian Languages
Arduino for Indian LanguagesArduino for Indian Languages
Arduino for Indian Languagesaiaioo
 
Vaklipi (Natural Language Programming and Queries)
Vaklipi (Natural Language Programming and Queries)Vaklipi (Natural Language Programming and Queries)
Vaklipi (Natural Language Programming and Queries)aiaioo
 
Statistics for linguistics
Statistics for linguisticsStatistics for linguistics
Statistics for linguisticsaiaioo
 
Rules engines to machine learning
Rules engines to machine learningRules engines to machine learning
Rules engines to machine learningaiaioo
 
Aiaioo labs - Only Slightly Futuristic
Aiaioo labs - Only Slightly FuturisticAiaioo labs - Only Slightly Futuristic
Aiaioo labs - Only Slightly Futuristicaiaioo
 

Mehr von aiaioo (8)

Document Analysis with Deep Learning
Document Analysis with Deep LearningDocument Analysis with Deep Learning
Document Analysis with Deep Learning
 
Deep Learning through Pytorch Exercises
Deep Learning through Pytorch ExercisesDeep Learning through Pytorch Exercises
Deep Learning through Pytorch Exercises
 
Learning Non-Linear Functions for Text Classification
Learning Non-Linear Functions for Text ClassificationLearning Non-Linear Functions for Text Classification
Learning Non-Linear Functions for Text Classification
 
Arduino for Indian Languages
Arduino for Indian LanguagesArduino for Indian Languages
Arduino for Indian Languages
 
Vaklipi (Natural Language Programming and Queries)
Vaklipi (Natural Language Programming and Queries)Vaklipi (Natural Language Programming and Queries)
Vaklipi (Natural Language Programming and Queries)
 
Statistics for linguistics
Statistics for linguisticsStatistics for linguistics
Statistics for linguistics
 
Rules engines to machine learning
Rules engines to machine learningRules engines to machine learning
Rules engines to machine learning
 
Aiaioo labs - Only Slightly Futuristic
Aiaioo labs - Only Slightly FuturisticAiaioo labs - Only Slightly Futuristic
Aiaioo labs - Only Slightly Futuristic
 

Kürzlich hochgeladen

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 

Kürzlich hochgeladen (20)

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 

Fun with Text - Managing Text Analytics

  • 1. Cohan Sujay Carlos CEO, Aiaioo Labs Fun with Text Managing Text Analytics
  • 2. What I am going to talk about. Text Analytics 1. Examine 3 kinds of opportunities 2. Discuss 3 text analytics problems 3. Touch upon 3 things to watch out for and 3 things to embrace.
  • 3. What if we can master “text”? What do we get from it? There are opportunities in every vertical: 1. Aerospace / Defense / Automotive –-- Filing of various routine documents / Technical specification standardization / Competitive intelligence and customer feedback management
  • 4. What if we can master “text”? What do we get from it? There are opportunities in every vertical: 1. Aerospace / Defense / Automotive –-- Filing of various routine documents / Technical specification standardization / Competitive intelligence and customer feedback management 2. Healthcare / Life sciences –-- Reporting / Storing relevant patents and publications / Analysis of research and competitive intelligence
  • 5. What if we can master “text”? What do we get from it? There are opportunities in every vertical: 1. Aerospace / Defense / Automotive –-- Filing of various routine documents / Technical specification standardization / Competitive intelligence and customer feedback management 2. Healthcare / Life sciences –-- Reporting / Storing relevant patents and publications / Analysis of research and competitive intelligence 3. Legal and Government –-- Legal and administrative filings / Case document and administrative record management / Analysis of legal and administrative documents (land records, case files)
  • 6. What if we can master “text”? What do we get from it? Do you observe a pattern? In every vertical … Output Text / Store and Transform Text / Ingest and Analyze Text
  • 7. How do we unlock the value in “text”? Output Text / Store and Transform Text / Ingest and Analyze Text Natural Language Generation Natural Language Understanding Natural Language Processing (aka Text Analytics)
  • 8. Use Case 1: Customer Service Let’s say you have some text … … and a database or spreadsheet with columns “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” … and you have to fill in the database fields from the information in the text … Reporter Location (of Reporter) Product
  • 9. Use Case 1: Land Records Let’s say you have some text … … and a database or spreadsheet with columns “Property K45L234 (lot 23-24) in Wake County of 3000 sq ft was sold to James Fischer on 3-30-1997 …” … and you have to fill in the database fields from the information in the text …
  • 10. Use Case 1: Land Records Let’s say you have some text … … and a database or spreadsheet with columns “Property K45L234 (lot 23-24) in Wake County of 3000 sq ft was sold to James Fischer on 3-30-1997 …” … and you have to fill in the database fields from the information in the text … Title Number Lot County
  • 11. Use Case 1: M&A Transactions Let’s say you have some text … … and a database or spreadsheet with columns “Acme Financials, a subsidiary of Lehman Sisters, was acquired by John Doe Corp on 5/26/2001.” … and you have to fill in the database fields from the information in the text …
  • 12. Use Case 1: M&A Transactions Let’s say you have some text … … and a database or spreadsheet with columns “Acme Financials, a subsidiary of Lehman Sisters, was acquired by John Doe Corp on 5/26/2001.” … and you have to fill in the database fields from the information in the text … Acquirer Acquired Date
  • 13. Use Case 1: Customer Service [ Information Extraction ] Let’s say you have some text … … and a database or spreadsheet with columns “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” Entities are pieces of text that could go into the fields in the database. Identifying entities and the relations between them Reporter Location (of Reporter) Product
  • 14. Use Case 1: Customer Service [ Information Extraction ] Let’s say you have some text … … and a database or spreadsheet with columns “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” Entities are pieces of text that could go into the fields in the database. Identifying entities and the relations between them Reporter Location Product John Chambers Springfield, MA Ford Ranger
  • 15. Use Case 1: Customer Service [ Information Extraction ] Relations tell you about the connections between entities. “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” Entities are pieces of text that could go into the fields in the database. Relations connect the entities that belong in a row. Identifying entities and the relations between them Reporter Location Product John Chambers Springfield, MA Ford Ranger Location of Reporter
  • 16. Use Case 1: Customer Service [ Information Extraction ] “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” Information extraction converts: unstructured information into structured information. Identifying entities and the relations between them Reporter Location Product John Chambers Springfield, MA Ford Ranger
  • 17. Use Case 1: Customer Service [ Information Extraction ] “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” Information extraction can improve efficiencies in processes where humans read text and copy fields into databases. Identifying entities and the relations between them Reporter Location Product John Chambers Springfield, MA Ford Ranger
  • 18. Use Case 1: Customer Service [ Information Extraction ] How can text analytics methods be used to automate entity and relation extraction? Rule based methods Machine learning methods Aiaioo Labs aiaioo.com
  • 19. Use Case 1: Customer Service [ Information Extraction ] Rule-based frameworks for entity and relation extraction? http://services.gate.ac.uk/annie/
  • 20. Use Case 1: Customer Service [ Information Extraction ]
  • 21. Use Case 1: Customer Service [ Information Extraction ] It uses lists of first names and last names of persons, and names of places … and matches them in the text … How does GATE/Annie identify entities and the relations? “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” “Jack” “Jill” “John” “Chambers” “Miller” “Farnsworth” “Springfield” “Boston” “Cambridge” “MA” “CA” “MD”
  • 22. Use Case 1: Customer Service [ Information Extraction ] Machine learning frameworks for entity and relation extraction? https://opennlp.apache.org/ Apache OpenNLP
  • 23. Use Case 1: Customer Service [ Information Extraction ] Machine learning frameworks need training data. https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html
  • 24. Use Case 1: Customer Service [ Information Extraction ] From examples such as: It learns to recognize: How does OpenNLP identify entities and the relations? “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” “<START:reporter>John Archer<END> of <START:location>Maryland<END> reported a problem with his <START:product>Figo<END>.” “<START:reporter>Vince Chambers<END> of <START:location>Denver, CO<END> had trouble with his <START:product>Focus<END>.”
  • 25. Use Case 1: Customer Service [ Information Extraction ] How to choose between text analytics methods for entity and relation extraction? Rule based methods Machine learning methods 3 months to reasonably performing model Typically higher precision Typically less flexibility Typically less recall 1+ years to reasonably performing model Typically lower precision Typically more flexibility Typically higher recall + overall performance
  • 26. 5’11” 5’ 8” Can you classify these door heights as: Short / Tall ? 5’8” 5’11” 6’2” 6’6” 5’ 2” 6’8” 6’9” 6’10” Aiaioo Labs aiaioo.com
  • 27. 5’11” 5’ 8” In analytics, an analyst comes up with a rule. 5’8” 5’11” 6’2” 6’6” 5’ 2” 6’8” 6’9” 6’10” If door_height < 6’ then Short else Tall Aiaioo Labs aiaioo.com
  • 28. 5’11” 5’ 8” In machine learning, the computer comes up with a rule from examples. 5’8” 5’11” 6’2” 6’6” 5’ 2” 6’8” 6’9” 6’10” Aiaioo Labs aiaioo.com
  • 29. How do we unlock the value in “text”? The first use case … Output Text / Store and Transform Text / Ingest and Analyze Text Information Extraction Identifying entities and the relations between them Aiaioo Labs aiaioo.com
  • 30. How do we unlock the value in “text”? The second use case … Output Text / Store and Transform Text / Ingest and Analyze Text Text Categorization Labeling text with one or more category labels Aiaioo Labs aiaioo.com
  • 31. Use Case 2: Organizing Text for Storage Let’s say you have some text … … and you want to mark it as one of … “John Chambers of Springfield, MA reported a problem with the clutch on his Ford Ranger purchased in Boston, MA in 2005.” Report Inquiry Aiaioo Labs aiaioo.com
  • 32. Use Case 2: Organizing Text [ Text Categorization ] Start by collecting some samples of documents of each of your categories Report Inquiry I have a problem This complaint is about Where can I buy a Do you sell furniture Aiaioo Labs aiaioo.com
  • 33. Use Case 2: Organizing Text [ Text Categorization ] Train a classifierwith them. Aiaioo Labs aiaioo.com Report Inquiry I have a problem This complaint is about Where can I buy a Do you sell furniture
  • 34. Use Case 2: Organizing Text [ Text Categorization ] Start by collecting some samples of documents of each of your categories Politics Sports The United Nations The United States and Manchester United Manchester and Barca Aiaioo Labs aiaioo.com
  • 35. Use Case 2: Organizing Text [ Text Categorization ] Train a classifierwith them. Politics Sports The United Nations The United States and Manchester United Manchester and Barca Aiaioo Labs aiaioo.com
  • 36. Use Case 2: Organizing Text [ Text Categorization ] Run the classifieron a new piece of text. The classifierwill return a label. Politics Nations and States Aiaioo Labs aiaioo.com
  • 37. Use Case 2: Organizing Text [ Text Categorization ] How can text analytics methods be used to automate organization/categorization? Rule based methods Machine learning methods Aiaioo Labs aiaioo.com
  • 38. Use Case 2: Organizing Text [ Text Categorization ] But rule-based methods work for classification too. Rule-based text categorization is often used in: Social media sentiment classification Aiaioo Labs aiaioo.com
  • 39. Use Case 2: Organizing Text [ Text Categorization ] We use lists of negative and positive words (usually adjectives) (available in the AFINN gazetteer) … and match them in the text … How do we use rules to identify sentiment? “I am sad that Steve Jobs died.” “sad” “bad” “evil” “distraught” “dead” “died” “thrilled” “excited” “amazed” “happy” “love” “joy” Aiaioo Labs aiaioo.com
  • 40. Use Case 2: Organizing Text [ Text Categorization ] Can we use entity and relation extraction to do better? “I am sad that [Steve Jobs died].” Analysis: This person holds a positive opinion of Steve Jobs The –ve entity ‘sad’ is related to the –ve event ‘Steve Jobs died’. Aiaioo Labs aiaioo.com
  • 41. Use Case 2: Organizing Text [ Text Categorization ] How to choose between text analytics methods for text categorization? Rule based methods Machine learning methods Typically higher precision Typically less flexibility Typically less recall Typically lower precision Typically more flexibility Typically higher recall + overall performance Aiaioo Labs aiaioo.com
  • 42. How do we unlock the value in “text”? The first use case … Output Text / Store and Transform Text / Ingest and Analyze Text Information Extraction Identifying entities and the relations between them Aiaioo Labs aiaioo.com
  • 43. How do we unlock the value in “text”? The second use case … Output Text / Store and Transform Text / Ingest and Analyze Text Text Categorization Labeling text with one or more category labels Aiaioo Labs aiaioo.com
  • 44. How do we unlock the value in “text”? The third use case … Output Text / Store and Transform Text / Ingest and Analyze Text Question Answering Generating a response to an inquiry Aiaioo Labs aiaioo.com
  • 45. Use Case 3: Answering Questions Let’s say you get a question … … and you want to answer to be one of … “Do you ship your cars to Boston, MA?” Yes No Aiaioo Labs aiaioo.com
  • 46. Use Case 3: Answering Questions First you classify the question into one of 3 types… and these are… “Do you ship your cars to Boston, MA?” “Who is the CEO of Apple?” “Why is the sky blue?” Yes/No questions Factoid questions Non-factoid questions Aiaioo Labs aiaioo.com
  • 47. Use Case 3: Answering Questions Look for answers in databases that you created using entity / relationship extraction “Do you ship your cars to Boston, MA?” “Who is the CEO of Apple?” “Why is the sky blue?” Product Ships To Cars USA CEO Firm Tim Cook Apple Aiaioo Labs aiaioo.com
  • 48. To watch out for: Text Analytics Traps 1. Testing on Training Data 2. Using US Training Data for India 3. Treating all Data Sources as One Aiaioo Labs aiaioo.com
  • 49. To embrace: Text Analytics Tricks 1. UI Compensation for AI Inaccuracy 2. Raising Precision at the Cost of Recall 3. Domain Specific Rules Aiaioo Labs aiaioo.com
  • 50. About Aiaioo Labs AI Research Lab 1. http://aiaioo.com 2. http://aiaioo.com/publications 3. http://aiaioo.wordpress.com Aiaioo Labs aiaioo.com
  • 51. THANK YOU Aiaioo Labs aiaioo.com