On how to change the utility curve of deep learning to make deep learning projects deliver an ROI no matter how accurate the machine learning system is - presented at the Nasscom Analytics Summit 2018.
1. 8
Cohan Sujay Carlos
CEO, Aiaioo Labs
Document Analysis Using
Deep Learning
Use Cases of Deep Learning Applied to Document
Analysis
Aman Neelappa
Consultant, Aiaioo Labs
2. 8What I am going to talk about
Document Analysis
Two types of document analysis tasks
How deep learning can help automate it
How to improve the utility profile of
automation
… but first, let’s see what deep learning is
3. 8
• Deep learning refers to the use of artificial
neural networks with more than one layer of
neurons (interconnections).
What is deep learning?
4. 8
Deep learning democratizes (I might say deskills) AI:
Anyone can learn it (it’s only multiplications, matrices
and partial differentiation).
Anyone can teach it (it’s only multiplications, matrices
and partial differentiation).
Deep Learning
5. 8
It has a high benefits - costs ratio:
The same algorithms work on images, on text and on speech.
The same core math works for sequential models and non-
sequential models.
You get three * two for the price of one!
Deep Learning Motivation
6. 8
It requires fewer engineering steps for new models:
There is no need for hand-crafted training algorithms.
There is no need for feature engineering.
You get two things for free!
Deep Learning Motivation
7. 8
So now we know what deep learning is.
It’s got to do with a neural network.
And one that has many layers.
Let’s see how such deep neural networks can be used
for document analysis.
So that’s deep learning!
8. 8
Two Types of
Document Analysis Tasks
1. Deciding / Labelling / Routing
2. Data Entry / Information Extraction
Almost all document analysis tasks fall into
one of these two categories!
9. 8
How Deep Learning
Can be Applied to Document
Analysis in a Single
Framework
1. Encoding
2. Decoding
Automation
10. 8How to Make
Deep Learning
Automation Usable
1. Encoding
2. Decoding
Automation Confidence
11. 8
Two Types of
Document Analysis Tasks
1. Deciding / Labelling / Routing
2. Data Entry / Information Extraction
Almost all document analysis tasks fall into
one of these two categories!
12. 8
Type 1:
Deciding / Labelling /
Routing
1. Deciding if someone should get a loan or
not
2. Labelling an email as a “complaint”
3. Sending a defect report to a suitable
team
What do all the above have in common?
13. 8
Type 1:
Deciding / Labelling /
Routing
1. Deciding if someone should get a loan or
not
2. Labelling an email as a “complaint”
3. Sending a defect report to a suitable
team
What do all the above have in common?
14. 8
Type 1 ->
Machine Learning
Concept 1
Picking from a finite set of choices
= Classification
also Labelling / Categorizing / Deciding /
Choosing
15. 8
Type 1 ->
Machine Learning
Concept 1
Picking from a finite set of choices requires
= Classifiers
Tools that do classification are called
classifiers
16. 8
Classification =
Deciding = Labelling
5’11”
5’ 8”
Classify these door heights as: Short or Tall ?
5’8”
5’11”
6’2”
6’6”
5’ 2”
6’8”
6’9”
6’10”
17. 8
• In the case of doors, the input data was a number
(the door height) which is structured data.
• Let’s look at an example of classification where the
input is unstructured data.
• Here is an image of a landscape. What is the colour
you see here?
Classification
20. 8• You said that the colour here is blue.
• What have you just done?
• You’ve categorized this area of the image
into one of a set of colours.
• You’ve labelled this area as blue.
• You’ve decided this is blue.
• You’ve done classification.
• Humans operate as classifiers more
often than we realize.
Classification
21. 8
• Ok, now let’s see how we can build a classifier
using the concepts that we have learnt so far.
• What neural networks do is take an input which
is a vector of real numbers and give you an
output which is a vector of real numbers.
• Can you turn this machinery into a classifier?
Classifiers
22. 8Outputs = c ; Inputs = f ;
Neurons = W
Neural Networks – The Concept
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
1Operations:
1. each neuron (interconnection) has a weight = W
2. it contributes the weighted input value f to the output => f * W
3. each output is the sum of the contributions of all incoming neurons …
c = sum of neuron contributions = sum of f * W
23. 8Outputs = c ; Inputs = f ;
Neurons = W
Neural Networks – How They Work
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
1
c1 = W11 * f1 + b1
c2 = W21 * f2 + b2
24. 8Outputs = c ; Inputs = f ;
Neurons = W
Neural Networks Example
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
f1 = 1
f2 = 2
1
W11 = 3 W12 = 4 b1 = 0.5 What are c1 and c2?
W21 = 7 W22 = 1 b2 = 0.3
26. 8Outputs = c ; Inputs = f ;
Neurons = W
Turning this Neural Networks into
a Classifier!
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
f1 = 1
f2 = 2
1
W11 = 3 W12 = 4 b1 = 0.5 Add a decision rule >>
W21 = 7 W22 = 1 b2 = 0.3 Choose the class with the higher score!
c1 = 1 * 3 + 2 * 4 + 0.5 = 11.5
c2 = 1 * 7 + 2 * 1 + 0.3 = 9.3 You’ve got your decoder!!!
27. 8
Outputs = c ; Inputs = f ;
Neurons = W
Turning this Neural Networks
into a Classifier!
c1 c2
Your decoder can also look like this!!! This is a multilayer (deep) neural network.
1
1
W’11
2
2 3
W’21
W’12
W’22
b'1
b'2
1
W11
2 3
W21
W12
W22
b1
b2
f1
f2 1
1
28. 8Outputs = c ; Inputs = f ;
Neurons = W
Now to use this classifier
on documents!
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
1
Now all you need is something that will turn a document into integers –> ‘f’
Document 1: This is a small document
29. 8Outputs = c ; Inputs = f ;
Neurons = W
Now to use this on documents, all
we need is an encoder!
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
1
Encoder = something that will turn a document into a vector of integers – ‘f’
Document 1: This is a small document
30. 8
How do you encode a document
A document is a sequence of words!
There are many problems in machine learning where you
deal with sequences.
Document 1: word1 word2 word3 word4
31. 8
Sequential Deep Learning Models
There are lots of real world problems where the features
form long sequences (that is, they have an ordering):
a) Speech recognition
b) Anything to do with text
c) Handwriting recognition
d) DNA sequencing
e) Video analytics / processing
f) Stock price prediction
32. 8
• Is there a deep learning model
that can be presented with
features sequentially?
Sequential Deep Learning Models
Hidden h
Classes c
Features f
W’
W
1
1
W’11
2
2 3
W’21
W’12
W’22
b'1
b'2
1
W11
2 3
W21
W12
W22
b1
b2
34. 8
Recurrent Neural Networks
(RNNs): At time t = 0
Sequential Deep Learning Models
Hidden h
Classes c
Features f
W’
W
V
1
1
V1
1
2
2 3
V2
1
V1
2
V2
2
b'
1
b'
2
1
W1
1
2 3
W2
1
W1
2
W2
2
b
1
b
2
At any point in time, an
RNN looks almost like a
regular multilayer
neural network …
Almost!
35. 8
Recurrent Neural Networks
(RNNs): At time t = 0
Sequential Deep Learning Models
Hidden h
Classes c
Features f
W’
W
V
1
1
V11
2
2 3
V21 V12
V22
b'1
b'2
1
W11
2 3
W21
W12
W22
b1
b2
1 2
There is a
difference:
In addition to its
inputs, it also
reads its own
hidden “state” …
from the previous
time step.
36. 8Recurrent Neural Networks
(RNNs): At time t = 1
Sequential Deep Learning Models
1
1
V11
2
2 3
V21 V12
V22
b'1
b'2
1
W11
2 3
W21
W12
W22
b1
b2Now when it reads the
previous
hidden “state” …
the vector of previous
hidden state values contains
the values from t = 0
1
1
V1
1
2
2 3
V2
1
V1
2
V2
2
b'
1
b'
2
1
W1
1
2 3
W2
1
W1
2
W2
2
b
1
b
2
t = 0
t = 1
37. 8
Sequential Deep Learning Models
1
1
V11
2
2 3
V21 V12
V22
b'1
b'2
1
W11
2 3
W21
W12
W22
b1
b2Now when it reads the
previous
hidden “state” …
the vector of previous
hidden state values contains
the values from t = 1
1
1
V1
1
2
2 3
V2
1
V1
2
V2
2
b'
1
b'
2
1
W1
1
2 3
W2
1
W1
2
W2
2
b
1
b
2
t = 1
t = 2
Recurrent Neural Networks
(RNNs): At time t = 2
38. 8Illustration of RNNs from
the WildML blog.
Sequential Deep Learning Models
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
39. 8
How do you pass in words?
Document 1: word1 word2 …
t=1 word1 0 1 0 0 1
t=2 word2 1 0 0 1 1
t=3 word3 1 1 1 1 1
t=4 word4 0 0 0 0 0
Sequential Deep Learning Models
Hidden h
Classes c
Features f
W’
W
V
word embedding
40. 8Recurrent Neural Networks
(RNNs): At time t = 1
Sequential Deep Learning Models
1
1
V11
2
2 3
V21 V12
V22
b'1
b'2
1
W11
2 3
W21
W12
W22
b1
b2
At every step you pass in
a) a word embedding
b) the previous state
and the RNN’s final state
becomes an encoding of
the whole document!
1
1
V1
1
2
2 3
V2
1
V1
2
V2
2
b'
1
b'
2
1
W1
1
2 3
W2
1
W1
2
W2
2
b
1
b
2
t = 0
t = 1
RNN
RNN
Document is
encoded in
last state
41. 8Long Short-Term Memory (LSTMs):
At time t = 1
Sequential Deep Learning Models
1 2
1 2
1 2
1 2
t = 0
t = 1
LSTM
LSTM
At every step you pass in
a) a word embedding
b) the previous state
and the LSTM’s final state
becomes an encoding of
the whole document!
Document is
encoded in
last state
42. 8Gated Recurrent Unit (GRUs):
At time t = 1
Sequential Deep Learning Models
1 2
1 2
1 2
1 2
t = 0
t = 1
GRU
GRU
At every step you pass in
a) a word embedding
b) the previous state
and the GRU’s final state
becomes an encoding of
the whole document!
Document is
encoded in
last state
43. 8RNN:
At time t = 1
The Deep Learning Classifier
1 2
1 2
1 2
1 2
t = 0
t = 1
RNN
ENCODER
At every step you pass in
a) a word embedding
b) the previous state
and the RNN’s final state
becomes an encoding of the
whole document!
DECODER
Just a neural network layer! Document is
encoded in last state
RNN
f1
f2
1
1
W11
2
2 4
W21
W12
W22
1
ENCODER
DECODER
44. 8Type 2:
Information Extraction
1. Extracting the loan amount in a loan
document
2. Identifying the firms involved in a merger
3. Finding the name of the King of England
What do all the above have in common?
45. 8Type 2:
Information Extraction
1. Extracting the loan amount in a loan
document
2. Identifying the firms involved in a merger
3. Finding the name of the King of England
What do all the above have in common? ->
46. 8Type 2:
Let’s say you have some text … and someone is typing things into a spreadsheet from the text
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Identifying entities and the relations between them
Reporter Location Product
John
Chambers
Springfield
, MA
Ford
Ranger
47. 8Type 2:
Relations tell you about the connections between entities.
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Relations connect the entities that belong in a row.
Identifying entities and the relations between them
Reporter Location Product
John
Chambers
Springfield,
MA
Ford
Ranger
Location of Reporter
48. 8
How do you find an entity in a
document
A document is a sequence of words!
How do you make a neural network tell you where the
correct sub-sequence is?
Document 1: word1 word2 word3 word4
49. 8
The old way to find an entity
in a document
1 2 2 1
Keys:
1 = not in entity
2 = in entity
So, we need to make our neural networks output 1 or 2 suitably!
Document 1: word1 word2 word3 word4
50. 8RNN:
The Deep Learning Entity Extractor
1 2
1 2
1 2
1 2
t = 0
t = 1
RNN
ENCODER
At every step you pass in
a) a word embedding
b) the previous state
and the RNN’s final state
becomes an encoding of the
whole document!
DECODER
Just a neural network layer! Document is decoded
word by word
RNN
f1
f2
1
1
W11
2
2 4
W21
W12
W22
1
ENCODER
DECODER
f1
f2
1
1
2
2 4
W21
W12
W22
1
DECODER
51. 8
It’s the same neural network!
1. For Classification – you apply the
decoder at the end of the sequence.
2. For Entity Extraction – you apply the
decoder at every point in the sequence.
Aiaioo Labs aiaioo.com
52. 8What I’ve talked about
Document Analysis
Two types of document analysis tasks
How deep learning can help automate it
How to improve the utility profile of
automation
53. 8
AI Utility Failure
Modes
1. The AI team said the accuracy of the AI
was 90% but when we deployed the AI, it
didn’t work.
2. No ROI if AI accuracy is below human
accuracy.
54. 8
“The AI team said the
accuracy of the AI was
90% but when we
deployed the AI, it didn’t
work.”
Possible reasons:
a) The accuracy was measured on training
data
b) The training data was curated non-
randomly
55. 8
“No utility if AI accuracy
is below human
accuracy.”
Because if the AI’s accuracy is below the required
accuracy:
a) Humans are employed to correct the errors
b) Humans don’t know which outputs are wrong
c) So they check every single output
d) And that’s a lot of work!
Is there any way to get utility if AI accuracy is below
human accuracy?
56. 8
“No utility if AI accuracy
is below human
accuracy.”
Is there any way to get utility if AI accuracy is
below human accuracy?
57. 8
“No utility if AI accuracy
is below human
accuracy.”
Use confidence scores
- There are ways to make deep learning
systems output a confidence score reflecting
the probability that an answer is correct
58. 8
“No utility if AI accuracy
is below human
accuracy.”
The solution = Use confidence
scores
- There are ways to make deep learning
systems output a confidence score reflecting
the probability that an answer is correct
59. 8
Type 1:
Deciding / Labelling /
Routing
1. The AI returns a decision on whether someone should get a
loan or not and a number between 0 and 1 reflecting its
confidence in that decision
2. The AI labels an email as a “complaint” and a confidence
score from 0 to 1
3. The AI suggests sending a defect report to a suitable team
(with a confidence score from 0 to 1)
60. 8Type 2:
Information Extraction
1. The AI returns the loan amount and a
confidence score between 0 and 1 that the
loan amount is right
2. The AI identifies the firms involved in a
merger with a confidence score
3. The AI finds the name of the King of
61. 8
“No utility if AI accuracy
is below human
accuracy.”
Now that you have confidence scores,
you can use the AI to provide the utility of
saving work no matter what its overall
accuracy
- If the confidence of the AI in its answer is above a
certain threshold, use the answer, else ask a
human
- The question is no longer one of replacing humans
62. 8
“No utility if AI accuracy
is below human
accuracy.”
Now that you have confidence scores,
you can use the AI to provide the utility of
improving quality no matter what its
overall accuracy
- If the confidence of the AI in its answer is above a
certain threshold, and the human has a difference
answer, alert the human to a possible error
- If the human was right, that’s valuable training data
64. 8
So remember these AI utility hacks:
1. Confidence values boost utility – so always deploy
systems with the ability to return confidence scores
2. Start with a manual process – add in AI to taste –
the human process generates data for the AI, and
the AI progressively makes the human processes
more efficient
3. Humans and AI can correct each other and improve
each others’ quality - The AI can also correct human
Aiaioo Labs cohan@aiaioo.com
65. 8About Aiaioo Labs
AI Research Lab
1. http://aiaioo.com
2. http://aiaioo.com/publications
3. http://aiaioo.wordpress.com
Aiaioo Labs cohan@aiaioo.com
Teacher: We hope to make clear in the very first content slide (page 2) that “deep” learning models are just neural networks, but with more than one layer of interconnections.
Deep learning as of today (2018) is based on the math of neural networks. And we say the neural networks are “deep” if they involve more than one layer of neurons. Deep learning models are just multilayered neural networks. We’ll see later on what changed between the 1980s when their performance was considered very poor and recent years when the self-same neural networks have yielded state-of-the-art performance on almost every task that machine learning has been applied to.
It might be useful to dispel students’ fear of math at this point by letting them know that if they don’t like math, they’re in the right class. Let them know that deep learning is different from other forms of machine learning in that it is very easy to learn. In order to develop an understanding of deep learning, the only math they will need to know is – hold your breath – multiplication and division. And maybe some differentiation. But the frameworks available today do the differentiation automatically for you, so you don’t even need to know that. Multiplication is enough.
Also, let the students know that what’s super-interesting about deep learning is that there is only one bit of math to learn (one learning algorithm) and it works for all problems. If we were teaching statistical machine learning, we’d be learning an algorithm for text, another for images, yet another for sequential classification (oh wait, you’d learn 3 algorithms just for HMMs – a sequential model). With neural networks, the underlying learning algorithm is the same for any kind of problem or model. So anyone who knows programming can learn this stuff in a few hours. In fact, by the end of this class, in four hours, you’ll all be building image classifiers, and a chatbot. Ready for this?
Explain that it is easy.
Explain the benefits.
Explain the benefits.
Say this for the previous slide
So, now that you’ve heard what neurons do, can you tell me what c1 and c2 are on this slide?
Teacher: This is a problem based on the math of the previous slide. Ask the students to compute the outputs c1 and c2 given the inputs f1 and f2. Goal: student develops an understanding of what a neural network does by doing what it does.
Note: Give ample time for digesting. Return to previous slides and explain if you see puzzled looks.
Walk through the solution => c1 is equal to W11 into f1 plus W12 into f2 plus 1 into b1, which is ...
So, now that you’ve heard what neurons do, can you tell me what c1 and c2 are on this slide?
Teacher: This is a problem based on the math of the previous slide. Ask the students to compute the outputs c1 and c2 given the inputs f1 and f2. Goal: student develops an understanding of what a neural network does by doing what it does.
Note: Give ample time for digesting. Return to previous slides and explain if you see puzzled looks.
Walk through the solution => c1 is equal to W11 into f1 plus W12 into f2 plus 1 into b1, which is ...
So, now that you’ve heard what neurons do, can you tell me what c1 and c2 are on this slide?
Teacher: This is a problem based on the math of the previous slide. Ask the students to compute the outputs c1 and c2 given the inputs f1 and f2. Goal: student develops an understanding of what a neural network does by doing what it does.
Note: Give ample time for digesting. Return to previous slides and explain if you see puzzled looks.
Walk through the solution => c1 is equal to W11 into f1 plus W12 into f2 plus 1 into b1, which is ...
Here’s the solution.
Teacher: This slide is the answer to the problem set in the previous slide.
Note: Make them say it out loud.
Here’s the solution.
Teacher: This slide is the answer to the problem set in the previous slide.
Note: Make them say it out loud.
Here’s the solution.
Teacher: This slide is the answer to the problem set in the previous slide.
Note: Make them say it out loud.
Here’s the solution.
Teacher: This slide is the answer to the problem set in the previous slide.
Note: Make them say it out loud.
Here’s the solution.
Teacher: This slide is the answer to the problem set in the previous slide.
Note: Make them say it out loud.