This document discusses deep learning applications for food analysis. It provides an introduction to deep learning and convolutional neural networks (CNNs), explaining how CNNs can be used to automatically recognize food in images. CNNs are well-suited for food recognition tasks because they can learn hierarchical representations of images and are effective at computer vision problems. The document outlines challenges in automatic food analysis like data complexity and variability. It also mentions existing food datasets that can be used to train CNN models for tasks like food detection, recognition, and analyzing eating patterns from images.
Streamlining Python Development: A Guide to a Modern Project Setup
Deep Learning Food Analysis
1.
Deep Learning for Food Analysis
Petia Radeva
www.cvc.uab.es/~petia
Computer Vision at UB (CVUB), Universitat de Barcelona &
Medical Imaging Laboratory, Computer Vision Center
2. Index
Motivation
Learning and Deep learning
Deep learning for food analysis
Lifelogging
2
22:55AMiTANS’16, Albena, 26 of June, 2016
3. Metabolic diseases and health
3
22:55AMiTANS’16, Albena, 26 of June, 2016
4.2 million die of chronic diseases
in Europe (diabetes or cancer)
linked to lack of physical activities
and unhealthy diet.
Physical activities can increase
lifespan by 1.5-3.7 years.
Obesity is a chronic disease
associated with huge economic,
social and personal costs.
Risk factors for cancers,
cardiovascular and metabolic
disorders and leading causes of
premature mortality worldwide.
4. Health and medical care
Today, 88% of U.S. healthcare dollars are
spent on medical care – access to
physicians, hospitals, procedures, drugs,
etc.
However, medical care only accounts for
approximately 10% of a person’s health.
Approximately half the decline in U.S.
Deaths from coronary heart disease from
1980 through 2000 may be attributable
to reductions in major risk factors
(systolic blood pressure, smoking,
physical inactivity).
4
22:55AMiTANS’16, Albena, 26 of June, 2016
5. Health and medical care
Recent data shows evidence of stagnation that may be explained by the increases in obesity and
diabetes prevalence.
Healthcare resources and dollars must now be dedicated to improving lifestyle and behavior.
5
22:55AMiTANS’16, Albena, 26 of June, 2016
6. Why food analysis?
Today, measuring physical activities is not a problem.
But what about food and nutrition?
Nutritional health apps are based on food diaries
6
22:55AMiTANS’16, Albena, 26 of June, 2016
7. Two main questions?
What we eat?
Automatic food recognition vs. Food diaries
And how we eat?
Automatic eating pattern extraction – when, where, how, how
long, with whom, in which context?
Lifelogging
7
22:55AMiTANS’16, Albena, 26 of June, 2016
8. Index
Motivation
Learning and Deep learning
Deep learning for food analysis
Lifelogging
8
22:55AMiTANS’16, Albena, 26 of June, 2016
9. Why “Learn”?
Machine learning consists of:
Developing models, methods and algorithms to make computers learn i.e. take decision.
Training from big amount of example data.
Learning is used when:
Humans are unable to explain their expertise (speech recognition)
Human expertise does not exist (navigating on Mars),
Solution changes in time (routing on a computer network)
Solution needs to be adapted to particular cases (user biometrics)
Data is cheap and abundant (data warehouses, data marts); knowledge is expensive
and scarce.
Example in retail: Customer transactions to consumer behavior:
People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven” (www.amazon.com)
Build a model that is a good and useful approximation to the data.
9
22:55AMiTANS’16, Albena, 26 of June, 2016
10. Growth of Machine Learning
This trend is accelerating due to:
Big data and data science today are a reality
Improved data capture, networking, faster computers
New sensors / IO devices / Internet of Things
Software too complex to write by hand
Demand for self-customization to user
It turns out to be difficult to extract knowledge from human
expertsfailure of expert systems in the 1980’s.
Improved machine learning algorithms
AMiTANS’16, Albena, 26 of June, 2016
10
22:55
14. Formalization of learning
Consider:
training examples: D= {z1, z2, .., zn} with the zi being examples sampled from an unknown
process P(Z);
a model f and a loss functional L(f,Z) that returns a real-valued scalar.
Minimize the expected value of L(f,Z) under the unknown generating process P(Z).
Supervised Learning: each example is an (input,target) pair: Z=(X,Y)
classification: Y is a finite integer (e.g. a symbol) corresponding to a class index, and we
often take as loss function the negative conditional log-likelihood, with the interpretation
that fi(X) estimates P(Y=i|X):
L(f,(X,Y)) = -log fi(X), where fi(X)>=0, Σi fi(X) = 1.
14
22:55AMiTANS’16, Albena, 26 of June, 2016
15. Classification/Recognition
Is this an urban or rural area?
Input: x
Output: y {-1,+1}
From: M. Pawan Kumar
Which city is this?
Output: y {1,2,…,C}
Binary classification Multi-class classification
22:55AMiTANS’16, Albena, 26 of June, 2016
15
16. Object Detection and segmentation
Where is the object in the image?
Output: y {Pixels}
From: M. Pawan Kumar
What is the semantic class of each pixel?
Output: y {1,2,…,C}|Pixels|
car
road
grass
treesky
22:55AMiTANS’16, Albena, 26 of June, 2016
16
17. A Simplified View of the Pipeline
Input
x
Features
Φ(x)
Scores
f(Φ(x),y)
Extract Features
Compute
Scores
maxy f(Φ(x),y)Prediction
y(f)
Learn f
From: M. Pawan Kumar22:55AMiTANS’16, Albena, 26 of June, 2016
17
18. Learning Objective
Data distribution P(x,y)
Prediction
f* = argminf EP(x,y) Error(y(f),y)
Ground Truth
Measure of prediction quality (error, loss)
Distribution is unknown
Expectation over
data distribution
From: M. Pawan Kumar22:55AMiTANS’16, Albena, 26 of June, 2016
18
19. Learning Objective
Training data {(xi,yi), i = 1,2,…,n}
Prediction
f* = argminf EP(x,y) Error(y(f),y)
Ground Truth
Measure of prediction quality
Expectation over
data distribution
From: M. Pawan Kumar22:55AMiTANS’16, Albena, 26 of June, 2016
19
20. Learning Objective
Training data {(xi,yi), i = 1,2,…,n}
Prediction
f* = argminf Σi Error(yi(f),yi)
Ground Truth
Measure of prediction quality
Expectation over
empirical distribution
Finite samples
From: M. Pawan Kumar22:55AMiTANS’16, Albena, 26 of June, 2016
20
21. The problem of image classification
21
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
22. Dual representation of images as points/vectors
22
22:55
AMiTANS’16, Albena, 26 of June, 2016
32x32x3 D vector
Each image of M rows by N columns by C channels ( 3 for color
images) can be considered as a vector/point in RMxNxC and
viceversa.
23. Linear Classier and key classification components
22:55
23
Given two classes how to learn a hyperplane to separate them?
To find the hyperplane we need to specify :
• Score function
• Loss function
• Optimization
AMiTANS’16, Albena, 26 of June, 2016
24. Interpreting a linear classifier
24
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
32x32x3 D vector
25. General learning pipeline
25
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
Training consists of constructing the prediction model f according to a training set.
26. The problem of image classification
26
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
27. Parametric approach: linear classifier
27
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
Score function:
30. Loss function and optimisation
Question: if you were to assign a single number to how unhappy you are
with these scores, what would you do?
22:55
30
Question : Given the score and the loss function, how to find the parameters W?
AMiTANS’16, Albena, 26 of June, 2016
31. Interpreting a linear classifier
31
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
10x3072
32. Why is a CNN doing deep learning?
32
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
where fi=Σjwij * xj w1n
f1
f2
fm
x1
x2
xn
w11
w12
33. Activation functions of NN
33
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
34. Setting the number of layers and their size
34
- Neurons arranged into fully-connected layers
- Bigger = better (but might have to regularize more strongly).
- How many parameters to learn?
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
35. Why a CNN is neural network?
35
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
36. Architecture of neural networks
22:55
36
Modern CNNs: ~10 million neurons
Human visual cortex: ~5 billion neurons
AMiTANS’16, Albena, 26 of June, 2016
37. Activation functions of NN
37
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson22:55AMiTANS’16, Albena, 26 of June, 2016
38. What is it a Convolutional Neural Network?
22:55
38
AMiTANS’16, Albena, 26 of June, 2016
40. How does the CNN work?
22:55
40
AMiTANS’16, Albena, 26 of June, 2016
41. Example architecture
22:55
41
The trick is to train the weights such that when the network sees a picture of a truck, the last layer will say “truck”.
AMiTANS’16, Albena, 26 of June, 2016
42. Training a CNN
22:55AMiTANS’16, Albena, 26 of June, 2016
42
The process of training a CNN consists of training all hyperparameters: convolutional
matrices and weights of the fully connected layers.
- Several millions pf parameters!!!
44. Neural network training
44
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
Using the chain rule, optimize the parameters, W of the
neural network by gradient descent and backpropagation.
22:55AMiTANS’16, Albena, 26 of June, 2016
Optimization consists of training severalmillions of parameters!
45. Monitoring loss and accuracy
22:55
45
Looks linear?
Learning rate too low!
Decreases too slowly?
Learning rate too high.
Looks too noisy?
Increases the batch size.
Big gap?
- you're overfitting, increase
regularization!
AMiTANS’16, Albena, 26 of June, 2016
48. 1001 benefits of CNN
Transfer learning: Fine tunning for object recognition
Replace and retrain the classier on top of the ConvNet
Fine-tune the weights of the pre-trained network by continuing the backpropagation
Feature extraction by CNN
Object detectiion
Object segmentation
Image similarity and matching by CNN
22:55
48
Convolutional Neural Networks (4096 Features)AMiTANS’16, Albena, 26 of June, 2016
54. Index
Motivation
Learning and Deep learning
Deep learning for food analysis
Lifelogging
54
22:55AMiTANS’16, Albena, 26 of June, 2016
55. Automatic food analysis
55
Can we automatically recognize food?
• To detect every instance of a dish in all of its variants, shapes and positions and in a
large number of images.
The main problems that arise are:
• Complexity and variability of the data.
• Huge amounts of data to analyse.
22:55AMiTANS’16, Albena, 26 of June, 2016
62. Food environment classification
62
Bakery
Banquet hall
Bar
Butcher shop
Cafetería
Ice cream parlor
Kitchen
Kitchenette
Market
Pantry
Picnic Area
Restaurant
Restaurant Kitchen
Restaurant Patio
Supermarket
Candy store
Coffee shop
Dinette
Dining room
Food court
Galley
Classification results:
0.92 - Food-related vs. Non-food-related
0.68 - 22 classes of Food-related categories
22:55AMiTANS’16, Albena, 26 of June, 2016
63. Index
Motivation
Learning and Deep learning
Deep learning for food analysis
Lifelogging
63
22:55AMiTANS’16, Albena, 26 of June, 2016
64. Wearable cameras and the life-logging trend
64
Shipments of wearable computing devices worldwide by
category from 2013 to 2015 (in millions)
22:55AMiTANS’16, Albena, 26 of June, 2016
66. Wealth of life-logging data
We propose an energy-based approach for motion-based event
segmentation of life-logging sequences of low temporal
resolution
- The segmentation is reached integrating different kind of
image features and classifiers into a graph-cut framework to
assure consistent sequence treatment.
22:55AMiTANS’16, Albena, 26 of June, 2016
66
Complete dataset of a day captured with SenseCam (more than 4,100 images
Choice of devise depends on:
1) where they are set: a hung up camera has
the advantage that is considered more
unobtrusive for the user, or
2) their temporal resolution: a camera with a
low fps will capture less motion information,
but we will need to process less data.
We chose a SenseCam or Narrative - cameras
hung on the neck or pinned on the dress that
capture 2-4 fps.
Or the hell of life-logging data
67. Visual Life-logging data
Events to be extracted from life-logging images
67
The camera captures up to 2000 images per day, around 100.000 images per month. Applying Computer Vision
algorithms we are able to extract the diary of the person:
- Activities he/she has done
- Interactions he/she has participated
- Events he/she has taken part
- Duties he/she has performed
- Environments and places he/she visited, etc.
22:55AMiTANS’16, Albena, 26 of June, 2016
68. Towards healthy habits
Towards visualizing summarized lifestyle data to ease the management of the user’s
healthy habits (sedentary lifestyles, nutritional activity, etc.).
22:55AMiTANS’16, Albena, 26 of June, 2016
68
69. Conclusions
Healthy habits – one of the main health concern for people, society,
and governments
Deep learning – a technology that “came to stay”
A new technological trend with huge power
Specially useful for food recognition and analysis
Lifelogging – a unexplored technology that hides big potential to help
people monitor and describe their behaviour and thus improve their
lifestyle.
69
22:55AMiTANS’16, Albena, 26 of June, 2016
Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style
Character recognition: Different handwriting styles.
Speech recognition: Temporal dependency.
Use of a dictionary or the syntax of the language.
Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for speech
Medical diagnosis: From symptoms to illnesses
Web Advertizing: Predict if a user clicks on an ad on the Internet.
Other methods also use unsupervised pre-training to structure a neural network, making it first learn generally useful feature detectors. Then the network is trained further by supervised back-propagation to classify labeled data. The deep model of Hinton et al. (2006) involves learning the distribution of a high level representation using successive layers of binary or real-valued latent variables. It uses a restricted Boltzmann machine to model each new layer of higher level features. Each new layer guarantees an increase on the lower-bound of the log likelihood of the data, thus improving the model, if trained properly. Once sufficiently many layers have been learned the deep architecture may be used as a generative model by reproducing the data when sampling down the model (an "ancestral pass") from the top level feature activations.[8] Hinton reports that his models are effective feature extractors over high-dimensional, structured data.[9]
Natural Language Processing which is used heavily in language conversion in chat rooms or processing text from where human speeches.
Optical Character Recognition which is scanning of images. It's gaining traction lately to read an image and extract text out of it and correlate to the objects found on image
Speech Recognition applications like Siri or Cortana needs no introduction
Artificial Intelligence induction to different robots for automating at least a minute level of tasks a human can do. We want them to be a little smarter.
Drug discovery though medical imaging-based diagnosis using deep learning. It's kind of in early stages now. Check Butterfly Network for the work they are doing.
CRM needs for companies are growing day by day. There are hundreds of thousands of companies around the globe from small to big companies who wants to know their potential customers. Deep Learning has provided some outstanding results. Check for companies like RelateIQ (product) who has seen astounding success of using Machine Learning in this area.
Exponential linear units- ELU all benefits of relu, does not die, closer to zero meanoutputs, but computation requires exp()
Medical applications - there are tremendous advances in robotic surgery that relies on extremely sensitive tactile equipment. However, if a doctor can advise a robot to "move a fraction of a millimeter to the left of the clavicle" they could potentially gain more control by directing the robot via full understood voice control.
Automotive - we are already seeing self driving cars; deep learning will possibly integrate into automated driving systems to detect and interpret sights and sounds that might be beyond the capacity of humans.
Military - drones are particularly well suited to deep learning.
Surveillance - here too drones will play a role, but the idea of computers that are able to sense and interpret with a human-like degree of accuracy will change the way in which surveillance is done.