Deep Learning Food Analysis


Deep Learning for Food Analysis
Petia Radeva
www.cvc.uab.es/~petia
Computer Vision at UB (CVUB), Universitat de Barcelona &
Medical Imaging Laboratory, Computer Vision Center

Index
 Motivation
 Learning and Deep learning
 Deep learning for food analysis
 Lifelogging
2
22:55AMiTANS’16, Albena, 26 of June, 2016

Metabolic diseases and health
3
 4.2 million die of chronic diseases
in Europe (diabetes or cancer)
linked to lack of physical activities
and unhealthy diet.
 Physical activities can increase
lifespan by 1.5-3.7 years.
 Obesity is a chronic disease
associated with huge economic,
social and personal costs.
 Risk factors for cancers,
cardiovascular and metabolic
disorders and leading causes of
premature mortality worldwide.

Health and medical care
 Today, 88% of U.S. healthcare dollars are
spent on medical care – access to
physicians, hospitals, procedures, drugs,
etc.
 However, medical care only accounts for
approximately 10% of a person’s health.
 Approximately half the decline in U.S.
Deaths from coronary heart disease from
1980 through 2000 may be attributable
to reductions in major risk factors
(systolic blood pressure, smoking,
physical inactivity).
4

Health and medical care
Recent data shows evidence of stagnation that may be explained by the increases in obesity and
diabetes prevalence.
Healthcare resources and dollars must now be dedicated to improving lifestyle and behavior.
5

Why food analysis?
 Today, measuring physical activities is not a problem.
 But what about food and nutrition?
 Nutritional health apps are based on food diaries
6

Two main questions?
 What we eat?
 Automatic food recognition vs. Food diaries
 And how we eat?
 Automatic eating pattern extraction – when, where, how, how
long, with whom, in which context?
 Lifelogging
7

Index
 Motivation
 Lifelogging
8

Why “Learn”?
 Machine learning consists of:
 Developing models, methods and algorithms to make computers learn i.e. take decision.
 Training from big amount of example data.
 Learning is used when:
 Humans are unable to explain their expertise (speech recognition)
 Human expertise does not exist (navigating on Mars),
 Solution changes in time (routing on a computer network)
 Solution needs to be adapted to particular cases (user biometrics)
 Data is cheap and abundant (data warehouses, data marts); knowledge is expensive
and scarce.
 Example in retail: Customer transactions to consumer behavior:
People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven” (www.amazon.com)
 Build a model that is a good and useful approximation to the data.
9

Growth of Machine Learning
 This trend is accelerating due to:
 Big data and data science today are a reality
 Improved data capture, networking, faster computers
 New sensors / IO devices / Internet of Things
 Software too complex to write by hand
 Demand for self-customization to user
 It turns out to be difficult to extract knowledge from human
expertsfailure of expert systems in the 1980’s.
 Improved machine learning algorithms
AMiTANS’16, Albena, 26 of June, 2016
10
22:55

11

Deep leearning everywhere
12

Deep learning applications
13

Formalization of learning
 Consider:
 training examples: D= {z1, z2, .., zn} with the zi being examples sampled from an unknown
process P(Z);
 a model f and a loss functional L(f,Z) that returns a real-valued scalar.
Minimize the expected value of L(f,Z) under the unknown generating process P(Z).
 Supervised Learning: each example is an (input,target) pair: Z=(X,Y)
 classification: Y is a finite integer (e.g. a symbol) corresponding to a class index, and we
often take as loss function the negative conditional log-likelihood, with the interpretation
that fi(X) estimates P(Y=i|X):
L(f,(X,Y)) = -log fi(X), where fi(X)>=0, Σi fi(X) = 1.
14

Classification/Recognition
Is this an urban or rural area?
Input: x
Output: y  {-1,+1}
From: M. Pawan Kumar
Which city is this?
Output: y  {1,2,…,C}
Binary classification Multi-class classification
15

Object Detection and segmentation
Where is the object in the image?
Output: y  {Pixels}
From: M. Pawan Kumar
What is the semantic class of each pixel?
Output: y  {1,2,…,C}|Pixels|
car
road
grass
treesky
16

A Simplified View of the Pipeline
Input
x
Features
Φ(x)
Scores
f(Φ(x),y)
Extract Features
Compute
Scores
maxy f(Φ(x),y)Prediction
y(f)
Learn f
From: M. Pawan Kumar22:55AMiTANS’16, Albena, 26 of June, 2016
17

Learning Objective
Data distribution P(x,y)
Prediction
f* = argminf EP(x,y) Error(y(f),y)
Ground Truth
Measure of prediction quality (error, loss)
Distribution is unknown
Expectation over
data distribution
18

Learning Objective
Training data {(xi,yi), i = 1,2,…,n}
Prediction
f* = argminf EP(x,y) Error(y(f),y)
Ground Truth
Measure of prediction quality
Expectation over
data distribution
19

Learning Objective
Training data {(xi,yi), i = 1,2,…,n}
Prediction
f* = argminf Σi Error(yi(f),yi)
Ground Truth
Measure of prediction quality
Expectation over
empirical distribution
Finite samples
20

The problem of image classification
21
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016

Dual representation of images as points/vectors
22
22:55
32x32x3 D vector
Each image of M rows by N columns by C channels ( 3 for color
images) can be considered as a vector/point in RMxNxC and
viceversa.

Linear Classier and key classification components
22:55
23
Given two classes how to learn a hyperplane to separate them?
To find the hyperplane we need to specify :
• Score function
• Loss function
• Optimization

Interpreting a linear classifier
24
32x32x3 D vector

General learning pipeline
25
Training consists of constructing the prediction model f according to a training set.

The problem of image classification
26

Parametric approach: linear classifier
27
Score function:

Loss function/optimization
28
The score function

Image classification
29

Loss function and optimisation
 Question: if you were to assign a single number to how unhappy you are
with these scores, what would you do?
22:55
30
Question : Given the score and the loss function, how to find the parameters W?

Interpreting a linear classifier
31
10x3072

Why is a CNN doing deep learning?
32
where fi=Σjwij * xj w1n
f1
f2
fm
x1
x2
xn
w11
w12

Activation functions of NN
33

Setting the number of layers and their size
34
- Neurons arranged into fully-connected layers
- Bigger = better (but might have to regularize more strongly).
- How many parameters to learn?

Why a CNN is neural network?
35

Architecture of neural networks
22:55
36
Modern CNNs: ~10 million neurons
Human visual cortex: ~5 billion neurons

Activation functions of NN
37

What is it a Convolutional Neural Network?
22:55
38

Convolutional and Max-pooling layer
22:55
39
Convolutional layer
Max-pool layer

How does the CNN work?
22:55
40

Example architecture
22:55
41
The trick is to train the weights such that when the network sees a picture of a truck, the last layer will say “truck”.

Training a CNN
42
The process of training a CNN consists of training all hyperparameters: convolutional
matrices and weights of the fully connected layers.
- Several millions pf parameters!!!

Learned convolutional filters
22:55
43

Neural network training
44
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
Using the chain rule, optimize the parameters, W of the
neural network by gradient descent and backpropagation.
Optimization consists of training severalmillions of parameters!

Monitoring loss and accuracy
22:55
45
Looks linear?
Learning rate too low!
Decreases too slowly?
Learning rate too high.
Looks too noisy?
Increases the batch size.
Big gap?
- you're overfitting, increase
regularization!

Transfer learning
46

Imagenet
22:55
47

1001 benefits of CNN
 Transfer learning: Fine tunning for object recognition
 Replace and retrain the classier on top of the ConvNet
 Fine-tune the weights of the pre-trained network by continuing the backpropagation
 Feature extraction by CNN
 Object detectiion
 Object segmentation
 Image similarity and matching by CNN
22:55
48
Convolutional Neural Networks (4096 Features)AMiTANS’16, Albena, 26 of June, 2016

ConvNets are everywhere
49

50

51

52

53

Index
 Motivation
 Lifelogging
54

Automatic food analysis
55
Can we automatically recognize food?
• To detect every instance of a dish in all of its variants, shapes and positions and in a
large number of images.
The main problems that arise are:
• Complexity and variability of the data.
• Huge amounts of data to analyse.

Automatic Food Analysis
 Food detection
 Food recognition
 Food environment recognition
 Eating pattern extraction
56

Food datasets
57
Food256 - 25.600 images (100 images/class)
Classes: 256
Food101 – 101.000 images (1000 images/class)
Classes: 101
Food101+FoodCAT: 146.392 (101.000+45.392)
Classes: 131
EgocentricFood: 5038 images
Classes: 9

Food localization and recognition
58
General scheme of our food localization and recognition proposal 22:55AMiTANS’16, Albena, 26 of June, 2016

Food localization
Food
Non Food
...
w1
w2
wn
G
oogleNet
Softm
ax
G
AP
inception4eoutput
Deep
Convolution
X
FAM
Bounding
Box
G
eneration
59
Examples of localization and recognition on UECFood256 (top) and EgocentricFood (bottom). Ground
truth is shown in green and our method in blue.

Image Input
Foodness Map
Extraction
Food Detection CNN
Food Recognition CNN
Food Type
Recognition
Apple
Strawberry
Food recognition
Results: TOP-1 74.7%
TOP-5 91.6%
SoA (Bossard,2014): TOP-1 56,4%22:55AMiTANS’16, Albena, 26 of June, 2016
60

Demo
61

Food environment classification
62
Bakery
Banquet hall
Bar
Butcher shop
Cafetería
Ice cream parlor
Kitchen
Kitchenette
Market
Pantry
Picnic Area
Restaurant
Restaurant Kitchen
Restaurant Patio
Supermarket
Candy store
Coffee shop
Dinette
Dining room
Food court
Galley
Classification results:
0.92 - Food-related vs. Non-food-related
0.68 - 22 classes of Food-related categories

Index
 Motivation
 Lifelogging
63

Wearable cameras and the life-logging trend
64
Shipments of wearable computing devices worldwide by
category from 2013 to 2015 (in millions)

Life-logging data
 What we have:
22:55
65

Wealth of life-logging data
 We propose an energy-based approach for motion-based event
segmentation of life-logging sequences of low temporal
resolution
 - The segmentation is reached integrating different kind of
image features and classifiers into a graph-cut framework to
assure consistent sequence treatment.
66
Complete dataset of a day captured with SenseCam (more than 4,100 images
Choice of devise depends on:
1) where they are set: a hung up camera has
the advantage that is considered more
unobtrusive for the user, or
2) their temporal resolution: a camera with a
low fps will capture less motion information,
but we will need to process less data.
We chose a SenseCam or Narrative - cameras
hung on the neck or pinned on the dress that
capture 2-4 fps.
Or the hell of life-logging data

Visual Life-logging data
Events to be extracted from life-logging images
67
The camera captures up to 2000 images per day, around 100.000 images per month. Applying Computer Vision
algorithms we are able to extract the diary of the person:
- Activities he/she has done
- Interactions he/she has participated
- Events he/she has taken part
- Duties he/she has performed
- Environments and places he/she visited, etc.

Towards healthy habits
Towards visualizing summarized lifestyle data to ease the management of the user’s
healthy habits (sedentary lifestyles, nutritional activity, etc.).
68

Conclusions
 Healthy habits – one of the main health concern for people, society,
and governments
 Deep learning – a technology that “came to stay”
 A new technological trend with huge power
 Specially useful for food recognition and analysis
 Lifelogging – a unexplored technology that hides big potential to help
people monitor and describe their behaviour and thus improve their
lifestyle.
69

22:55
70

Deep learning applications
71

Deep Learning Food Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Deep Learning Food Analysis

Similar to Deep Learning Food Analysis (20)

Recently uploaded

Recently uploaded (20)

Deep Learning Food Analysis

Editor's Notes