Introduction to Tensor Flow for Optical Character Recognition (OCR)

Tensor Flow for Optical
Character Recognition
Devfest 2017

Hello!
I Am Vincenzo
Santopietro
You can contact me at
linkedin.com/in/vincenzosantopietro

1
Machine Learning
and Deep Learning

Machine learning refers to the use of
algorithms to parse data, process and learn
from it, in order to make predictions or
determinations about something.
One of the best application for machine
learning is computer vision: OCR, object
tracking, object recognition etc.
MACHINE LEARNING
MACHINE LEARNING AND DEEP LEARNING

Deep learning is a subfield of machine
learning concerned with algorithms
inspired by the structure and function of
the brain called artificial neural networks
(ANNs)
Compared to older ML algorithms, Deep
Learning performs better with a large
amount of data
DEEP LEARNING

SUPERVISED VS UNSUPERVISED
Our data are not labeled. Unsupervised
algorithms consider confidence measures
among samples in order to create
homogeneous clusters.
Most famous technique: Clustering
(k-means, hierarchical etc.)
UNSUPERVISED LEARNING
All data has been labeled (supervised) by
an expert. Thanks to this labeling process,
we can help the network to realise the
difference between classes (even though
sometimes this does not happen).
Some techniques: NNs, SVM, etc.
SUPERVISED LEARNING

CLASSIFICATION VS PREDICTION
Prediction refers to the problem of
estimating the behaviour of a phenomenon
by analysing the “previous history”
i.e: object tracking, forecasting etc.
PREDICTION
Given an input observation, classification is
the problem of identifying to which of a set
of categories (classes) the new observation
belongs.
i.e: traffic signals recognition, emotion
recognition etc.
CLASSIFICATION

TRAINING A LOGISTIC CLASSIFIER
The logistic classifier is based on the
formula on the right, where X represents
the input data matrix, W is the weights
matrix, b contains bias terms and y is the
output of the classifier.
The goal is to tune the values of W and b in
order to have the lowest loss value
WEIGHTS AND BIAS TERMS
Weights bias terms
Training

Softmax is a function of the logits that
takes a vector of scores and transforms it
into probabilities.
SOFTMAX
MEASURING THE LOSS
Given an input sample, it’s possible to
estimate the distance between the output
of the classifier and the groundtruth value.
CROSS ENTROPY

THE LOSS FUNCTION
MEASURING THE LOSS
We measure the loss of the training
process by computing the previous
formula over the entire training set.
The loss depends on W and b seen
before.
We want to minimise the average
cross-entropy.

We’ll use Gradient Descent to minimise the
loss function.
GRADIENT DESCENT
MINIMISING THE LOSS

TensorFlow is an open-source library for numerical
computation and machine learning.
Its basic principle is simple: you build in Python a
graph of computation to perorm and then
TensorFlow runs it efficiently using optimized C++
code.
TensorFlow supports computation across multiple
CPUs and GPUs
How does it work?
TENSOR FLOW’S GRAPHS

Software that uses TensorFlow is often divided into
two phases: graph building and execution
In order to evaluate this graph we must run the
session and all its initialisers
CPUs and GPUs
Running a simple graph
x = tf.Variable(3,name=”x”)
y = tf.Variable(4,name=”y”)
f = x*x + 2*y + 5
sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
res = sess.run(f)
print res

Software that uses TensorFlow is often divided into
two phases: graph building and execution
In order to evaluate this graph we must run the
session and all its initialisers
CPUs and GPUs
Running a simple graph
x = tf.Variable(3,name=”x”)
y = tf.Variable(4,name=”y”)
f = x*x + 2*y + 5
with tf.Session() as session:
x.initializer.run()
y.initializer.run()
result = f.eval()

When you evaluate a node, TensorFlow determines
the set of nodes that it depends on and evaluates
these nodes first.
NB: TensorFlow won’t reuse pre-computed values
unless you do this ->
Node values
LIFECYCLE OF A NODE VALUE
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3
print(y.eval())
print(z.eval())
y_val, z_val = sess.run([y, z])
print(y_val)
print(z_val)

Placeholder nodes don’t perform any computation.
They just output the data you’ll tell them to output
at runtime.
This kind of nodes are useful for batched learning
When creating a placeholder node, you have to
specify its size: None means any size.
Placeholder nodes
FEEDING DATA TO THE TRAININING
ALGORITHM

TensorFlow lets you save your model at regular
intervals because the training process might last for
hours, days or even weeks.
All you need to do is to call the save method from a
saver object.
If you want to restore the model, you have to call
the restore method instead.
Checkpoints
SAVING/RESTORING MODELS
saver = tf.train.Saver()
. . .
. . .
for epoch in range(n_epochs):
if epoch % 100 == 0:
save_path =
saver.save(session,”/tmp/my_model.ckpt”)
save_path = saver.save(session,”/tmp/my_model.ckpt”)
saver.restore(session, “/tmp/my_model.ckpt”)
. . .

““ A person who
never made a
mistake never
tried anything
new ”

Introduction to Tensor Flow for Optical Character Recognition (OCR)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Introduction to Tensor Flow for Optical Character Recognition (OCR)

Ähnlich wie Introduction to Tensor Flow for Optical Character Recognition (OCR) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Introduction to Tensor Flow for Optical Character Recognition (OCR)