Document Analysis with Deep Learning

8
Cohan Sujay Carlos
CEO, Aiaioo Labs
Document Analysis Using
Deep Learning
Use Cases of Deep Learning Applied to Document
Analysis
Aman Neelappa
Consultant, Aiaioo Labs

8What I am going to talk about
Document Analysis
Two types of document analysis tasks
How deep learning can help automate it
How to improve the utility profile of
automation
… but first, let’s see what deep learning is

8
• Deep learning refers to the use of artificial
neural networks with more than one layer of
neurons (interconnections).
What is deep learning?

8
Deep learning democratizes (I might say deskills) AI:
Anyone can learn it (it’s only multiplications, matrices
and partial differentiation).
Anyone can teach it (it’s only multiplications, matrices
and partial differentiation).
Deep Learning

8
It has a high benefits - costs ratio:
The same algorithms work on images, on text and on speech.
The same core math works for sequential models and non-
sequential models.
You get three * two for the price of one!
Deep Learning Motivation

8
It requires fewer engineering steps for new models:
There is no need for hand-crafted training algorithms.
There is no need for feature engineering.
You get two things for free!
Deep Learning Motivation

8
So now we know what deep learning is.
It’s got to do with a neural network.
And one that has many layers.
Let’s see how such deep neural networks can be used
for document analysis.
So that’s deep learning!

8
Two Types of
Document Analysis Tasks
1. Deciding / Labelling / Routing
2. Data Entry / Information Extraction
Almost all document analysis tasks fall into
one of these two categories!

8
How Deep Learning
Can be Applied to Document
Analysis in a Single
Framework
1. Encoding
2. Decoding
Automation

8How to Make
Deep Learning
Automation Usable
1. Encoding
2. Decoding
Automation Confidence

8
Type 1:
Deciding / Labelling /
Routing
1. Deciding if someone should get a loan or
not
2. Labelling an email as a “complaint”
3. Sending a defect report to a suitable
team
What do all the above have in common?

8
Type 1 ->
Machine Learning
Concept 1
Picking from a finite set of choices
= Classification
also Labelling / Categorizing / Deciding /
Choosing

8
Type 1 ->
Machine Learning
Concept 1
Picking from a finite set of choices requires
= Classifiers
Tools that do classification are called
classifiers

8
Classification =
Deciding = Labelling
5’11”
5’ 8”
Classify these door heights as: Short or Tall ?
5’8”
5’11”
6’2”
6’6”
5’ 2”
6’8”
6’9”
6’10”

8
• In the case of doors, the input data was a number
(the door height) which is structured data.
• Let’s look at an example of classification where the
input is unstructured data.
• Here is an image of a landscape. What is the colour
you see here?
Classification

8
What is Classification?
What colour do you see here?

8
What is Classification?
Classification = Categorizing
= Labelling = Deciding
Blue

8• You said that the colour here is blue.
• What have you just done?
• You’ve categorized this area of the image
into one of a set of colours.
• You’ve labelled this area as blue.
• You’ve decided this is blue.
• You’ve done classification.
• Humans operate as classifiers more
often than we realize.
Classification

8
• Ok, now let’s see how we can build a classifier
using the concepts that we have learnt so far.
• What neural networks do is take an input which
is a vector of real numbers and give you an
output which is a vector of real numbers.
• Can you turn this machinery into a classifier?
Classifiers

8Outputs = c ; Inputs = f ;
Neurons = W
Neural Networks – The Concept
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
1Operations:
1. each neuron (interconnection) has a weight = W
2. it contributes the weighted input value f to the output => f * W
3. each output is the sum of the contributions of all incoming neurons …
c = sum of neuron contributions = sum of f * W

Neurons = W
Neural Networks – How They Work
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
1
c1 = W11 * f1 + b1
c2 = W21 * f2 + b2

Neurons = W
Neural Networks Example
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
f1 = 1
f2 = 2
1
W11 = 3 W12 = 4 b1 = 0.5 What are c1 and c2?
W21 = 7 W22 = 1 b2 = 0.3

Neurons = W
Neural Networks Answer
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
f1 = 1
f2 = 2
1
W11 = 3 W12 = 4 b1 = 0.5 What are c1 and c2?
W21 = 7 W22 = 1 b2 = 0.3
c1 = 1 * 3 + 2 * 4 + 0.5 = 11.5
c2 = 1 * 7 + 2 * 1 + 0.3 = 9.3

Neurons = W
Turning this Neural Networks into
a Classifier!
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
f1 = 1
f2 = 2
1
W11 = 3 W12 = 4 b1 = 0.5 Add a decision rule >>
W21 = 7 W22 = 1 b2 = 0.3 Choose the class with the higher score!
c1 = 1 * 3 + 2 * 4 + 0.5 = 11.5
c2 = 1 * 7 + 2 * 1 + 0.3 = 9.3 You’ve got your decoder!!!

8
Outputs = c ; Inputs = f ;
Neurons = W
Turning this Neural Networks
into a Classifier!
c1 c2
Your decoder can also look like this!!! This is a multilayer (deep) neural network.
1
1
W’11
2
2 3
W’21
W’12
W’22
b'1
b'2
1
W11
2 3
W21
W12
W22
b1
b2
f1
f2 1
1

Neurons = W
Now to use this classifier
on documents!
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
1
Now all you need is something that will turn a document into integers –> ‘f’
Document 1: This is a small document

Neurons = W
Now to use this on documents, all
we need is an encoder!
f1
f2
c1 c2
1
1
W11
2
2 4
W21
W12
W22
b1
b2
1
Encoder = something that will turn a document into a vector of integers – ‘f’
Document 1: This is a small document

8
How do you encode a document
A document is a sequence of words!
There are many problems in machine learning where you
deal with sequences.
Document 1: word1 word2 word3 word4

8
Sequential Deep Learning Models
There are lots of real world problems where the features
form long sequences (that is, they have an ordering):
a) Speech recognition
b) Anything to do with text
c) Handwriting recognition
d) DNA sequencing
e) Video analytics / processing
f) Stock price prediction

8
• Is there a deep learning model
that can be presented with
features sequentially?
Hidden h
Classes c
Features f
W’
W
1
1
W’11
2
2 3
W’21
W’12
W’22
b'1
b'2
1
W11
2 3
W21
W12
W22
b1
b2

8Yes.
They’re called …
Recurrent Neural Networks
(RNNs):
Hidden h
Classes c
Features f
W’
W
V

8
(RNNs): At time t = 0
Hidden h
Classes c
Features f
W’
W
V
1
1
V1
1
2
2 3
V2
1
V1
2
V2
2
b'
1
b'
2
1
W1
1
2 3
W2
1
W1
2
W2
2
b
1
b
2
At any point in time, an
RNN looks almost like a
regular multilayer
neural network …
Almost!

8
Hidden h
Classes c
Features f
W’
W
V
1
1
V11
2
2 3
V21 V12
V22
b'1
b'2
1
W11
2 3
W21
W12
W22
b1
b2
1 2
There is a
difference:
In addition to its
inputs, it also
reads its own
hidden “state” …
from the previous
time step.

8Recurrent Neural Networks
1
1
V11
2
2 3
V21 V12
V22
b'1
b'2
1
W11
2 3
W21
W12
W22
b1
b2Now when it reads the
previous
the vector of previous
hidden state values contains
the values from t = 0
1
1
V1
1
2
2 3
V2
1
V1
2
V2
2
b'
1
b'
2
1
W1
1
2 3
W2
1
W1
2
W2
2
b
1
b
2
t = 0
t = 1

8
1
1
V11
2
2 3
V21 V12
V22
b'1
b'2
1
W11
2 3
W21
W12
W22
b1
b2Now when it reads the
previous
the vector of previous
hidden state values contains
the values from t = 1
1
1
V1
1
2
2 3
V2
1
V1
2
V2
2
b'
1
b'
2
1
W1
1
2 3
W2
1
W1
2
W2
2
b
1
b
2
t = 1
t = 2

8Illustration of RNNs from
the WildML blog.
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

8
How do you pass in words?
Document 1: word1 word2 …
t=1 word1 0 1 0 0 1
t=2 word2 1 0 0 1 1
t=3 word3 1 1 1 1 1
t=4 word4 0 0 0 0 0
Hidden h
Classes c
Features f
W’
W
V
word embedding

8Recurrent Neural Networks
1
1
V11
2
2 3
V21 V12
V22
b'1
b'2
1
W11
2 3
W21
W12
W22
b1
b2
At every step you pass in
a) a word embedding
b) the previous state
and the RNN’s final state
becomes an encoding of
the whole document!
1
1
V1
1
2
2 3
V2
1
V1
2
V2
2
b'
1
b'
2
1
W1
1
2 3
W2
1
W1
2
W2
2
b
1
b
2
t = 0
t = 1
RNN
RNN
Document is
encoded in
last state

8Long Short-Term Memory (LSTMs):
At time t = 1
1 2
1 2
1 2
1 2
t = 0
t = 1
LSTM
LSTM
a) a word embedding
and the LSTM’s final state
the whole document!
Document is
encoded in
last state

8Gated Recurrent Unit (GRUs):
At time t = 1
1 2
1 2
1 2
1 2
t = 0
t = 1
GRU
GRU
a) a word embedding
and the GRU’s final state
the whole document!
Document is
encoded in
last state

8RNN:
At time t = 1
The Deep Learning Classifier
1 2
1 2
1 2
1 2
t = 0
t = 1
RNN
ENCODER
a) a word embedding
becomes an encoding of the
whole document!
DECODER
Just a neural network layer! Document is
encoded in last state
RNN
f1
f2
1
1
W11
2
2 4
W21
W12
W22
1
ENCODER
DECODER

8Type 2:
Information Extraction
1. Extracting the loan amount in a loan
document
2. Identifying the firms involved in a merger
3. Finding the name of the King of England
What do all the above have in common?

8Type 2:
1. Extracting the loan amount in a loan
document
2. Identifying the firms involved in a merger
3. Finding the name of the King of England
What do all the above have in common? ->

8Type 2:
Let’s say you have some text … and someone is typing things into a spreadsheet from the text
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Identifying entities and the relations between them
Reporter Location Product
John
Chambers
Springfield
, MA
Ford
Ranger

8Type 2:
Relations tell you about the connections between entities.
“John Chambers of Springfield, MA
reported a problem with the clutch
on his Ford Ranger purchased in
Boston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Relations connect the entities that belong in a row.
Identifying entities and the relations between them
Reporter Location Product
John
Chambers
Springfield,
MA
Ford
Ranger
Location of Reporter

8
How do you find an entity in a
document
A document is a sequence of words!
How do you make a neural network tell you where the
correct sub-sequence is?

8
The old way to find an entity
in a document
1 2 2 1
Keys:
1 = not in entity
2 = in entity
So, we need to make our neural networks output 1 or 2 suitably!

8RNN:
The Deep Learning Entity Extractor
1 2
1 2
1 2
1 2
t = 0
t = 1
RNN
ENCODER
a) a word embedding
becomes an encoding of the
whole document!
DECODER
Just a neural network layer! Document is decoded
word by word
RNN
f1
f2
1
1
W11
2
2 4
W21
W12
W22
1
ENCODER
DECODER
f1
f2
1
1
2
2 4
W21
W12
W22
1
DECODER

8
It’s the same neural network!
1. For Classification – you apply the
decoder at the end of the sequence.
2. For Entity Extraction – you apply the
decoder at every point in the sequence.
Aiaioo Labs aiaioo.com

8What I’ve talked about
Document Analysis
Two types of document analysis tasks
How deep learning can help automate it
How to improve the utility profile of
automation

8
AI Utility Failure
Modes
1. The AI team said the accuracy of the AI
was 90% but when we deployed the AI, it
didn’t work.
2. No ROI if AI accuracy is below human
accuracy.

8
“The AI team said the
accuracy of the AI was
90% but when we
deployed the AI, it didn’t
work.”
Possible reasons:
a) The accuracy was measured on training
data
b) The training data was curated non-
randomly

8
“No utility if AI accuracy
is below human
accuracy.”
Because if the AI’s accuracy is below the required
accuracy:
a) Humans are employed to correct the errors
b) Humans don’t know which outputs are wrong
c) So they check every single output
d) And that’s a lot of work!
Is there any way to get utility if AI accuracy is below
human accuracy?

8
is below human
accuracy.”
Is there any way to get utility if AI accuracy is
below human accuracy?

8
is below human
accuracy.”
Use confidence scores
- There are ways to make deep learning
systems output a confidence score reflecting
the probability that an answer is correct

8
is below human
accuracy.”
The solution = Use confidence
scores
- There are ways to make deep learning
systems output a confidence score reflecting
the probability that an answer is correct

8
Type 1:
Deciding / Labelling /
Routing
1. The AI returns a decision on whether someone should get a
loan or not and a number between 0 and 1 reflecting its
confidence in that decision
2. The AI labels an email as a “complaint” and a confidence
score from 0 to 1
3. The AI suggests sending a defect report to a suitable team
(with a confidence score from 0 to 1)

8Type 2:
1. The AI returns the loan amount and a
confidence score between 0 and 1 that the
loan amount is right
2. The AI identifies the firms involved in a
merger with a confidence score
3. The AI finds the name of the King of

8
is below human
accuracy.”
Now that you have confidence scores,
you can use the AI to provide the utility of
saving work no matter what its overall
accuracy
- If the confidence of the AI in its answer is above a
certain threshold, use the answer, else ask a
human
- The question is no longer one of replacing humans

8
is below human
accuracy.”
Now that you have confidence scores,
you can use the AI to provide the utility of
improving quality no matter what its
overall accuracy
- If the confidence of the AI in its answer is above a
certain threshold, and the human has a difference
answer, alert the human to a possible error
- If the human was right, that’s valuable training data

8
Aiaioo Labs cohan@aiaioo.com
0
0.2
0.4
0.6
0.8
1
1.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Utility Profile
Accuracy Utility without Confidence Utility with Confidence

8
So remember these AI utility hacks:
1. Confidence values boost utility – so always deploy
systems with the ability to return confidence scores
2. Start with a manual process – add in AI to taste –
the human process generates data for the AI, and
the AI progressively makes the human processes
more efficient
3. Humans and AI can correct each other and improve
each others’ quality - The AI can also correct human

8About Aiaioo Labs
AI Research Lab
1. http://aiaioo.com
2. http://aiaioo.com/publications
3. http://aiaioo.wordpress.com

8
THANK YOU

Document Analysis with Deep Learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Document Analysis with Deep Learning

Ähnlich wie Document Analysis with Deep Learning (20)

Mehr von aiaioo

Mehr von aiaioo (10)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Document Analysis with Deep Learning

Hinweis der Redaktion