SlideShare a Scribd company logo
1 of 56

Can Deep Learning and Egocentric Vision
for Visual Lifelogging help us eat better?
Petia Radeva
www.cvc.uab.es/~petia
Computer Vision at UB (CVUB), Universitat de Barcelona &
Medical Imaging Laboratory, Computer Vision Center
Index
 Healthy habits
 Deep learning
 Automatic food analysis
 Egocentric vision
22:45
I Medical Imaging
22:45
What happens outside the body?
22:45
Project led by Dr. Maite Garolera of the Consorci Sanitari de Terrassa:
Goal: using episodic images to develop cognitive exercises and tools for memory
reinforcing of MCI and Alzheimer people.
22:45
But episodic images serve for something more than reinforcing memory….
They are showing the lifestyle of individuals!
Rememory: Life-logging for MCI treatment
Risk factors and chronic diseases
22:45
Chronic disease statistics
22:45
Obesity in Catalunya
51% of the Catalan population from 18 to 74 years overweight, 15% are obese.
62% without university studies vs. 36% with high education. 22:45
The obesity pandemic
 Risk factors for cancers, cardiovascular and
metabolic disorders and leading causes of
premature mortality worldwide.
 4.2 million die of chronic diseases in Europe
(diabetes or cancer) linked to lack of physical
activities and unhealthy diet.
 Physical activities can increase lifespan by
1.5-3.7 years.
22:45
Which wearables do consumers plan to buy?
• 21M Fitbit sold in 2015!
• It’s expected to double by 2018, to 81.7 million users.
22:45
The Consumer Technology Association (CTA), formerly the Consumer Electronics Association (CEA), surveyed
1,001 US internet users. Source: eMarketer.
 Today, automatically measuring physical activity is not a problem.
 But what about food and nutrition?
22:45
What are we missing in health applications?
 But what about food and nutrition?
 State of the art: Nutritional health apps are based on manual food diaries.
22:45
Sparkpeople
LoseIt!
MyFitnessPal
Cronometer Fatsecret
What are we missing in health applications?
https://techcrunch.com/2016/09/29/lose-it-launches-snap-it-to-let-users-count-calories-in-food-photos/
How many food
categories there are?
Today we are speaking
about 200.000 basic
food categories.
What about automatic food recognition?
Is it possible?
22:45
Image databases evolution
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
10000000
ARDatabase(1998)
YaleFaceDatabase(2001)
Caltech(2003)
101(2004)
VOC2005(2005)
TUGraz-02(2005)
VOC2006(2006)
Caltech256(2006)
MIT-CSAIL(2006)
VOC2007(2007)
Cifar-10(2009)
Cifar-100(2009)
Imagenet(2011)
SunDB(2012)
Places(2014)
Food101(2014)
Places2(2016)
3276 165 4620 9197 1578 1280 5304 30607 2873 9963 6000060000
1400000
15000
2500000
101000
10000000
Database
Number of objects/Database
Number of images/Database
126
15 6
101
10 4 10
256
125
21 10
100
1000
900
205
101
476
0
200
400
600
800
1000
1200
ARDatabase(1998)
YaleFace…
Caltech(2003)
101(2004)
VOC2005(2005)
TUGraz-02(2005)
VOC2006(2006)
Caltech256(2006)
MIT-CSAIL(2006)
VOC2007(2007)
Cifar-10(2009)
Cifar-100(2009)
Imagenet(2011)
SunDB(2012)
Places(2014)
Food101(2014)
Places2(2016)
ImageNet &
Deep learning
22:45
Imagenet
22:45
Food datasets
Food256: 25.600 images (100 images/class)
Classes: 256
Food101 – 101.000 images (1000 images/class)
Classes: 101
Food101+FoodCAT: 146.392 (101.000+45.392)
Classes: 231
EgocentricFood: 5038 images
Classes: 9
22:45
150.000 images
231 categories
1.400.000 images
1000 categories
????? images
200.000 categories
Food DB ImageNet Future Food DB
One is for sure,
if there is a solution,
it is highly probable
to need
Deep learning!
22:45
Index
 Healthy habits and food analysis
 Deep learning
 Automatic food analysis
 Egocentric vision
22:45
Deep leearning everywhere
22:45
White House wants the nation to get ready for AI
October, 2016
http://readwrite.com/2016/10/16/white-house-offers-artificial-intelligence-plan-cl1/
22:45
The learning pipeline
22:45
Input
f(x,W)y(f)
Score function
Predicted label
X
Feature
extraction
Good enough?
The traning process
22:45
Input
+
Ground
truth
f(x,W)argminf Σi Error(yi(f),yi)
Score function
X
Feature
extraction
Learn f
The learning process
22:45
argminf Σi Error(yi(f),yi)
Expectation over
data distribution
Prediction Ground Truth
Measure of prediction quality (error, loss)
Training data {(xi,yi), i = 1,2,…,n}
Loss function the negative conditional log-likelihood, with the interpretation that fi(X) estimates
P(Y=i|X):
L(f(x),y)) = -log fi(x), where fi(x)>=0, Σi fi(x) = 1.
The problem of image classification
22:45
32x32x3 D vector
Each image of M rows by N columns by C channels (3 for color images) can be
considered as a vector/point in RMxNxC and viceversa.
Dual representation of images as points/vectors
R32x32x3
Linear classification
22:45
Given two classes how to learn a hyperplane to separate them?
R32x32x3
To find the hyperplane that separates dogs from cats, we need to define:
• The score function
• The loss function
• And the optimization process.
Linear classification
22:45
How to project data in the feature space:
f(x)=W x + b
If x is an image of (32x32x3), -> x in R3072,
The matrix W is (3x3072).
The bias vector b is 3-dimensional.
3072x1
3x3072 3x1
3x1
Linear classification
22:45
How to project data in the feature space:
f(x)=W x + b
If we have 3 classes, f(x) will give 3 scores.
3072x1
3x3072 3x1
3x1
Image classification
Adapted from: Fei-Fei Li & Andrej Karpathy & Justin Johnson
22:45
Loss function and optimisation
 Question: if you were to assign a single number to how unhappy you are
with these scores, what would you do?
22:45
Question : Given the score and the loss function, how to find the parameters W?
L(f(xi),yi)
W
Loss function
f(xi,W)
Score
function
Input
Xi
Yi
How is a CNN doing deep learning?
22:45
y=Wx
Image
….
First layer
y1=ΣiW1ixi
y10=ΣiW10ixi
….
Second layer
y=W(Wx) y=W(W(Wx))
….
Output layer
W11
W12
W13
W1n
Fully connected layers
y1=ΣiW1ixi
…
Why a CNN is a neural network?
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
22:45
Modern CNNs – 10M neurons
Human CNNs – 5B of neurons.
Activation functions of NN
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
22:45
Why is it convolutional?
Adapted from: Fei-Fei Li & Andrej Karpathy & Justin Johnson
22:45
What is new in the Convolutional Neural Network?
22:45
Convolutional and Max-pooling layer
22:45
Convolutional layer
Max-pool layer
Spatial info No spatial info
Example architecture
22:45
The trick is to train the weights such that when the network sees a picture of a truck, the last layer will say
“truck”.
Credit slide: Li Fei-fei
Training a CNN
22:45
The process of training a CNN consists of training all hyperparameters: convolutional
matrices and weights of the fully connected layers.
- Several millions of parameters!!!
1001 benefits of CNN
 Transfer learning: Fine tunning for object recognition
 Replace and retrain the classier on top of the ConvNet
 Fine-tune the weights of the pre-trained network by continuing the backpropagation
 Feature extraction by CNN
 Object detection
 Object segmentation
 Image similarity and matching by CNN
22:45Convolutional Neural Networks (4096 Features)
Index
 Healthy habits and food analysis
 Deep learning
 Automatic food analysis
 Egocentric vision
22:45
Automatic food analysis
Can we automatically recognize food?
• To detect and classify every instance of a dish in all of its variants, shapes and
positions and in a large number of images.
The main problems that arise are:
• Complexity and variability of the data.
• Huge amounts of data to analyse.
22:45
Automatic Food Analysis
 Food detection
 Food recognition
 Food environment recognition
 Eating pattern extraction
22:45
Food localization
Food
Non Food
...
w1
w2
wn
G
oogleNet
Softm
ax
G
AP
inception4eoutput
Deep
Convolution
X
FAM
Bounding
Box
G
eneration
Examples of localization and recognition on UECFood256 (top) and EgocentricFood (bottom). Ground
truth is shown in green and our method in blue.
22:45
Marc Bolaños, Petia Radeva: Simultaneous Food Localization and Recognition, ICPR’16, Cancun, Mexico, arXiv.org> cs>
arXiv:1604.07953, 2016.
Image Input
Foodness Map
Extraction
Food Detection CNN
Food Recognition CNN
Food Type
Recognition
Apple
Strawberry
Food recognition
Results: TOP-1 74.7%
TOP-5 91.6%
SoA (Bossard,2014): TOP-1 56,4%22:45
Demo
22:45
Herruzo, P., Bolaños, M. and Radeva, P. (2016). “Can a CNN Recognize Catalan Diet?”. In Proceedings of the 8th Intl Conf. for
Promoting the Application of Mathematics in Technical and Natural Sciences (AMiTaNS).
Food environment classification
Bakery
Banquet hall
Bar
Butcher shop
Cafetería
Ice cream parlor
Kitchen
Kitchenette
Market
Pantry
Picnic Area
Restaurant
Restaurant Kitchen
Restaurant Patio
Supermarket
Candy store
Coffee shop
Dinette
Dining room
Food court
Galley
Classification results:
0.92 - Food-related vs. Non-food-related
0.68 - 22 classes of Food-related categories
22:45
Towards automatic image description
22:45
Bolaños, M., Peris, Á., Casacuberta, F., & Radeva, P. “VIBIKNet: Visual Bidirectional Kernelized Network for the VQA
Challenge” VQA Challenge, CVPR '16.
Two main questions?
 What we eat?
 Automatic food recognition vs. Food
diaries
 And how we eat?
 Automatic eating pattern extraction –
when, where, how, how long, with
whom, in which context?
22:45
Index
 Healthy habits and food analysis
 Deep learning
 Automatic food analysis
 Egocentric vision
22:45
Wearable cameras and the life-logging trend
Shipments of wearable computing devices worldwide by
category from 2013 to 2015 (in millions)
22:45
Life-logging data
 What we have:
22:45
Wealth of life-logging data
 We propose an energy-based approach for motion-based event
segmentation of life-logging sequences of low temporal
resolution
 - The segmentation is reached integrating different kind of
image features and classifiers into a graph-cut framework to
assure consistent sequence treatment.
22:45
Complete dataset of a day captured with SenseCam (more than 4,100 images
Choice of devise depends on:
1) where they are set: a hung up camera has
the advantage that is considered more
unobtrusive for the user, or
2) their temporal resolution: a camera with a
low fps will capture less motion information,
but we will need to process less data.
We chose a SenseCam or Narrative - cameras
hung on the neck or pinned on the dress that
capture 2-4 fps.
Or the hell of life-logging data
Visual Life-logging data
Events to be extracted from life-logging images
- Activities he/she has done
- Interactions he/she has participated
- Events he/she has taken part
- Duties he/she has performed
- Environments and places he/she visited, etc.
22:45
Dimiccoli, M., Bolaños, M​., Talavera, E., Aghaei, M., Nikolov, S., and Radeva, P. (2015). “SRClustering: Semantic ​ Regularized Clustering for
Egocentric Photo Streams Segmentation”. In Computer Vision and Image Understanding Journal (CVIU) (In press). Preprint:
http://arxiv.org/abs/1512.07143
Egocentric vision progress
22:45
Bolaños, M.​, Dimiccoli, M. & Radeva, P. (2015). “Towards Storytelling from Visual Lifelogging: An ​ Overview”.
In Transactions on HumanMachine Systems Journal (THMS) (IN PRESS). Preprint: http://arxiv.org/abs/1507.06120
Towards healthy habits
Towards visualizing summarized lifestyle data to ease the management of the user’s
healthy habits (sedentary lifestyles, nutritional activity, etc.).
22:45
M. Aeghai, M. Dimiccoli, P. Radeva. Extended Bag-of-Tracklets for Multi-Face Tracking in Egocentric Photo Streams. Computer Vision and Image
Understanding, Volume 149, 146-156, 2016. Special Issue on Assistive Computer Vision and Robotics, Elsevier, 2016. doi: 10.1016/j.cviu.2016.02.013
Conclusions
 Healthy habits – one of the main health concern for people, society, and
governments
 Deep learning – a technology that came to stay
 A new technological trend that is affecting directly our environment
 Food analysis and recognition – a new challenge with huge potential for applications
 We need food databases of millions of images and thousands of categories
 A wide set of problems for food analysis – recognition, segmentation, habits
characterization, image and video description, etc.
 Egocentric vision and Lifelogging – a recent trend in Computer Vision and
unexplored technology that hides big potential to help people monitor and describe
their behaviour and thus improve their lifestyle.
22:45
THANK YOU!
22:45

More Related Content

Similar to Can Deep Learning and Egocentric Vision for Visual Lifelogging help us eat better?

Skin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptxSkin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptx
VishalLabde
 
Report on "Food recognition system using BOF model"
Report on "Food recognition system using BOF model"Report on "Food recognition system using BOF model"
Report on "Food recognition system using BOF model"
madhuripallod
 
DeepDRImageGuidedDiabeticRetinopathyDetectionUsingAttentionBasedDeepLearningS...
DeepDRImageGuidedDiabeticRetinopathyDetectionUsingAttentionBasedDeepLearningS...DeepDRImageGuidedDiabeticRetinopathyDetectionUsingAttentionBasedDeepLearningS...
DeepDRImageGuidedDiabeticRetinopathyDetectionUsingAttentionBasedDeepLearningS...
RamithaDevi
 

Similar to Can Deep Learning and Egocentric Vision for Visual Lifelogging help us eat better? (20)

Abordagem da qualidade no desenvolvimento de tecnologia robótica assistiva sl...
Abordagem da qualidade no desenvolvimento de tecnologia robótica assistiva sl...Abordagem da qualidade no desenvolvimento de tecnologia robótica assistiva sl...
Abordagem da qualidade no desenvolvimento de tecnologia robótica assistiva sl...
 
Skin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptxSkin melanoma stage detection - CNN.pptx
Skin melanoma stage detection - CNN.pptx
 
What are the Responsibilities of a Product Manager by Google PM
What are the Responsibilities of a Product Manager by Google PMWhat are the Responsibilities of a Product Manager by Google PM
What are the Responsibilities of a Product Manager by Google PM
 
ML基本からResNetまで
ML基本からResNetまでML基本からResNetまで
ML基本からResNetまで
 
Natusfera Lifewatch Competence Center EGI amsterdam 2016 small
Natusfera Lifewatch Competence Center EGI amsterdam 2016  smallNatusfera Lifewatch Competence Center EGI amsterdam 2016  small
Natusfera Lifewatch Competence Center EGI amsterdam 2016 small
 
Detecting and Preventing Ulcerative Colitis samples using efficient feature s...
Detecting and Preventing Ulcerative Colitis samples using efficient feature s...Detecting and Preventing Ulcerative Colitis samples using efficient feature s...
Detecting and Preventing Ulcerative Colitis samples using efficient feature s...
 
Up to care! IHI 2011 04-07 Vilans
Up to care! IHI 2011 04-07 VilansUp to care! IHI 2011 04-07 Vilans
Up to care! IHI 2011 04-07 Vilans
 
Web People Search
Web People SearchWeb People Search
Web People Search
 
Cancer uk 2015_module1_ouellette_ver02
Cancer uk 2015_module1_ouellette_ver02Cancer uk 2015_module1_ouellette_ver02
Cancer uk 2015_module1_ouellette_ver02
 
CORBEL Bioimage Analysis webinar slides
CORBEL Bioimage Analysis webinar slidesCORBEL Bioimage Analysis webinar slides
CORBEL Bioimage Analysis webinar slides
 
Open Labs brief
Open Labs briefOpen Labs brief
Open Labs brief
 
MediaEval 2017 - Medical Multimedia Task: Multimedia for Medicine: The Medico...
MediaEval 2017 - Medical Multimedia Task: Multimedia for Medicine: The Medico...MediaEval 2017 - Medical Multimedia Task: Multimedia for Medicine: The Medico...
MediaEval 2017 - Medical Multimedia Task: Multimedia for Medicine: The Medico...
 
Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)Practical aspects of medical image ai for hospital (IRB course)
Practical aspects of medical image ai for hospital (IRB course)
 
Report on "Food recognition system using BOF model"
Report on "Food recognition system using BOF model"Report on "Food recognition system using BOF model"
Report on "Food recognition system using BOF model"
 
Endobronchial Ultrasound Image Diagnosis Using Convolutional Neural Network (...
Endobronchial Ultrasound Image Diagnosis Using Convolutional Neural Network (...Endobronchial Ultrasound Image Diagnosis Using Convolutional Neural Network (...
Endobronchial Ultrasound Image Diagnosis Using Convolutional Neural Network (...
 
Slight change of plans!
Slight change of plans!Slight change of plans!
Slight change of plans!
 
Skin_Cancer.pptx
Skin_Cancer.pptxSkin_Cancer.pptx
Skin_Cancer.pptx
 
DeepDRImageGuidedDiabeticRetinopathyDetectionUsingAttentionBasedDeepLearningS...
DeepDRImageGuidedDiabeticRetinopathyDetectionUsingAttentionBasedDeepLearningS...DeepDRImageGuidedDiabeticRetinopathyDetectionUsingAttentionBasedDeepLearningS...
DeepDRImageGuidedDiabeticRetinopathyDetectionUsingAttentionBasedDeepLearningS...
 
Medical Segmentation Decathalon
Medical Segmentation DecathalonMedical Segmentation Decathalon
Medical Segmentation Decathalon
 
PHYS459_Thesis
PHYS459_ThesisPHYS459_Thesis
PHYS459_Thesis
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Can Deep Learning and Egocentric Vision for Visual Lifelogging help us eat better?

  • 1.  Can Deep Learning and Egocentric Vision for Visual Lifelogging help us eat better? Petia Radeva www.cvc.uab.es/~petia Computer Vision at UB (CVUB), Universitat de Barcelona & Medical Imaging Laboratory, Computer Vision Center
  • 2. Index  Healthy habits  Deep learning  Automatic food analysis  Egocentric vision 22:45
  • 4. What happens outside the body? 22:45
  • 5. Project led by Dr. Maite Garolera of the Consorci Sanitari de Terrassa: Goal: using episodic images to develop cognitive exercises and tools for memory reinforcing of MCI and Alzheimer people. 22:45 But episodic images serve for something more than reinforcing memory…. They are showing the lifestyle of individuals! Rememory: Life-logging for MCI treatment
  • 6. Risk factors and chronic diseases 22:45
  • 8. Obesity in Catalunya 51% of the Catalan population from 18 to 74 years overweight, 15% are obese. 62% without university studies vs. 36% with high education. 22:45
  • 9. The obesity pandemic  Risk factors for cancers, cardiovascular and metabolic disorders and leading causes of premature mortality worldwide.  4.2 million die of chronic diseases in Europe (diabetes or cancer) linked to lack of physical activities and unhealthy diet.  Physical activities can increase lifespan by 1.5-3.7 years. 22:45
  • 10. Which wearables do consumers plan to buy? • 21M Fitbit sold in 2015! • It’s expected to double by 2018, to 81.7 million users. 22:45 The Consumer Technology Association (CTA), formerly the Consumer Electronics Association (CEA), surveyed 1,001 US internet users. Source: eMarketer.
  • 11.  Today, automatically measuring physical activity is not a problem.  But what about food and nutrition? 22:45 What are we missing in health applications?
  • 12.  But what about food and nutrition?  State of the art: Nutritional health apps are based on manual food diaries. 22:45 Sparkpeople LoseIt! MyFitnessPal Cronometer Fatsecret What are we missing in health applications?
  • 13. https://techcrunch.com/2016/09/29/lose-it-launches-snap-it-to-let-users-count-calories-in-food-photos/ How many food categories there are? Today we are speaking about 200.000 basic food categories. What about automatic food recognition? Is it possible? 22:45
  • 14. Image databases evolution 0 1000000 2000000 3000000 4000000 5000000 6000000 7000000 8000000 9000000 10000000 ARDatabase(1998) YaleFaceDatabase(2001) Caltech(2003) 101(2004) VOC2005(2005) TUGraz-02(2005) VOC2006(2006) Caltech256(2006) MIT-CSAIL(2006) VOC2007(2007) Cifar-10(2009) Cifar-100(2009) Imagenet(2011) SunDB(2012) Places(2014) Food101(2014) Places2(2016) 3276 165 4620 9197 1578 1280 5304 30607 2873 9963 6000060000 1400000 15000 2500000 101000 10000000 Database Number of objects/Database Number of images/Database 126 15 6 101 10 4 10 256 125 21 10 100 1000 900 205 101 476 0 200 400 600 800 1000 1200 ARDatabase(1998) YaleFace… Caltech(2003) 101(2004) VOC2005(2005) TUGraz-02(2005) VOC2006(2006) Caltech256(2006) MIT-CSAIL(2006) VOC2007(2007) Cifar-10(2009) Cifar-100(2009) Imagenet(2011) SunDB(2012) Places(2014) Food101(2014) Places2(2016) ImageNet & Deep learning 22:45
  • 16. Food datasets Food256: 25.600 images (100 images/class) Classes: 256 Food101 – 101.000 images (1000 images/class) Classes: 101 Food101+FoodCAT: 146.392 (101.000+45.392) Classes: 231 EgocentricFood: 5038 images Classes: 9 22:45 150.000 images 231 categories 1.400.000 images 1000 categories ????? images 200.000 categories Food DB ImageNet Future Food DB
  • 17. One is for sure, if there is a solution, it is highly probable to need Deep learning! 22:45
  • 18. Index  Healthy habits and food analysis  Deep learning  Automatic food analysis  Egocentric vision 22:45
  • 20. White House wants the nation to get ready for AI October, 2016 http://readwrite.com/2016/10/16/white-house-offers-artificial-intelligence-plan-cl1/ 22:45
  • 21. The learning pipeline 22:45 Input f(x,W)y(f) Score function Predicted label X Feature extraction Good enough?
  • 22. The traning process 22:45 Input + Ground truth f(x,W)argminf Σi Error(yi(f),yi) Score function X Feature extraction Learn f
  • 23. The learning process 22:45 argminf Σi Error(yi(f),yi) Expectation over data distribution Prediction Ground Truth Measure of prediction quality (error, loss) Training data {(xi,yi), i = 1,2,…,n} Loss function the negative conditional log-likelihood, with the interpretation that fi(X) estimates P(Y=i|X): L(f(x),y)) = -log fi(x), where fi(x)>=0, Σi fi(x) = 1.
  • 24. The problem of image classification 22:45 32x32x3 D vector Each image of M rows by N columns by C channels (3 for color images) can be considered as a vector/point in RMxNxC and viceversa. Dual representation of images as points/vectors R32x32x3
  • 25. Linear classification 22:45 Given two classes how to learn a hyperplane to separate them? R32x32x3 To find the hyperplane that separates dogs from cats, we need to define: • The score function • The loss function • And the optimization process.
  • 26. Linear classification 22:45 How to project data in the feature space: f(x)=W x + b If x is an image of (32x32x3), -> x in R3072, The matrix W is (3x3072). The bias vector b is 3-dimensional. 3072x1 3x3072 3x1 3x1
  • 27. Linear classification 22:45 How to project data in the feature space: f(x)=W x + b If we have 3 classes, f(x) will give 3 scores. 3072x1 3x3072 3x1 3x1
  • 28. Image classification Adapted from: Fei-Fei Li & Andrej Karpathy & Justin Johnson 22:45
  • 29. Loss function and optimisation  Question: if you were to assign a single number to how unhappy you are with these scores, what would you do? 22:45 Question : Given the score and the loss function, how to find the parameters W? L(f(xi),yi) W Loss function f(xi,W) Score function Input Xi Yi
  • 30. How is a CNN doing deep learning? 22:45 y=Wx Image …. First layer y1=ΣiW1ixi y10=ΣiW10ixi …. Second layer y=W(Wx) y=W(W(Wx)) …. Output layer W11 W12 W13 W1n Fully connected layers y1=ΣiW1ixi …
  • 31. Why a CNN is a neural network? From: Fei-Fei Li & Andrej Karpathy & Justin Johnson 22:45 Modern CNNs – 10M neurons Human CNNs – 5B of neurons.
  • 32. Activation functions of NN From: Fei-Fei Li & Andrej Karpathy & Justin Johnson 22:45
  • 33. Why is it convolutional? Adapted from: Fei-Fei Li & Andrej Karpathy & Justin Johnson 22:45
  • 34. What is new in the Convolutional Neural Network? 22:45
  • 35. Convolutional and Max-pooling layer 22:45 Convolutional layer Max-pool layer Spatial info No spatial info
  • 36. Example architecture 22:45 The trick is to train the weights such that when the network sees a picture of a truck, the last layer will say “truck”. Credit slide: Li Fei-fei
  • 37. Training a CNN 22:45 The process of training a CNN consists of training all hyperparameters: convolutional matrices and weights of the fully connected layers. - Several millions of parameters!!!
  • 38. 1001 benefits of CNN  Transfer learning: Fine tunning for object recognition  Replace and retrain the classier on top of the ConvNet  Fine-tune the weights of the pre-trained network by continuing the backpropagation  Feature extraction by CNN  Object detection  Object segmentation  Image similarity and matching by CNN 22:45Convolutional Neural Networks (4096 Features)
  • 39. Index  Healthy habits and food analysis  Deep learning  Automatic food analysis  Egocentric vision 22:45
  • 40. Automatic food analysis Can we automatically recognize food? • To detect and classify every instance of a dish in all of its variants, shapes and positions and in a large number of images. The main problems that arise are: • Complexity and variability of the data. • Huge amounts of data to analyse. 22:45
  • 41. Automatic Food Analysis  Food detection  Food recognition  Food environment recognition  Eating pattern extraction 22:45
  • 42. Food localization Food Non Food ... w1 w2 wn G oogleNet Softm ax G AP inception4eoutput Deep Convolution X FAM Bounding Box G eneration Examples of localization and recognition on UECFood256 (top) and EgocentricFood (bottom). Ground truth is shown in green and our method in blue. 22:45 Marc Bolaños, Petia Radeva: Simultaneous Food Localization and Recognition, ICPR’16, Cancun, Mexico, arXiv.org> cs> arXiv:1604.07953, 2016.
  • 43. Image Input Foodness Map Extraction Food Detection CNN Food Recognition CNN Food Type Recognition Apple Strawberry Food recognition Results: TOP-1 74.7% TOP-5 91.6% SoA (Bossard,2014): TOP-1 56,4%22:45
  • 44. Demo 22:45 Herruzo, P., Bolaños, M. and Radeva, P. (2016). “Can a CNN Recognize Catalan Diet?”. In Proceedings of the 8th Intl Conf. for Promoting the Application of Mathematics in Technical and Natural Sciences (AMiTaNS).
  • 45. Food environment classification Bakery Banquet hall Bar Butcher shop Cafetería Ice cream parlor Kitchen Kitchenette Market Pantry Picnic Area Restaurant Restaurant Kitchen Restaurant Patio Supermarket Candy store Coffee shop Dinette Dining room Food court Galley Classification results: 0.92 - Food-related vs. Non-food-related 0.68 - 22 classes of Food-related categories 22:45
  • 46. Towards automatic image description 22:45 Bolaños, M., Peris, Á., Casacuberta, F., & Radeva, P. “VIBIKNet: Visual Bidirectional Kernelized Network for the VQA Challenge” VQA Challenge, CVPR '16.
  • 47. Two main questions?  What we eat?  Automatic food recognition vs. Food diaries  And how we eat?  Automatic eating pattern extraction – when, where, how, how long, with whom, in which context? 22:45
  • 48. Index  Healthy habits and food analysis  Deep learning  Automatic food analysis  Egocentric vision 22:45
  • 49. Wearable cameras and the life-logging trend Shipments of wearable computing devices worldwide by category from 2013 to 2015 (in millions) 22:45
  • 50. Life-logging data  What we have: 22:45
  • 51. Wealth of life-logging data  We propose an energy-based approach for motion-based event segmentation of life-logging sequences of low temporal resolution  - The segmentation is reached integrating different kind of image features and classifiers into a graph-cut framework to assure consistent sequence treatment. 22:45 Complete dataset of a day captured with SenseCam (more than 4,100 images Choice of devise depends on: 1) where they are set: a hung up camera has the advantage that is considered more unobtrusive for the user, or 2) their temporal resolution: a camera with a low fps will capture less motion information, but we will need to process less data. We chose a SenseCam or Narrative - cameras hung on the neck or pinned on the dress that capture 2-4 fps. Or the hell of life-logging data
  • 52. Visual Life-logging data Events to be extracted from life-logging images - Activities he/she has done - Interactions he/she has participated - Events he/she has taken part - Duties he/she has performed - Environments and places he/she visited, etc. 22:45 Dimiccoli, M., Bolaños, M​., Talavera, E., Aghaei, M., Nikolov, S., and Radeva, P. (2015). “SRClustering: Semantic ​ Regularized Clustering for Egocentric Photo Streams Segmentation”. In Computer Vision and Image Understanding Journal (CVIU) (In press). Preprint: http://arxiv.org/abs/1512.07143
  • 53. Egocentric vision progress 22:45 Bolaños, M.​, Dimiccoli, M. & Radeva, P. (2015). “Towards Storytelling from Visual Lifelogging: An ​ Overview”. In Transactions on HumanMachine Systems Journal (THMS) (IN PRESS). Preprint: http://arxiv.org/abs/1507.06120
  • 54. Towards healthy habits Towards visualizing summarized lifestyle data to ease the management of the user’s healthy habits (sedentary lifestyles, nutritional activity, etc.). 22:45 M. Aeghai, M. Dimiccoli, P. Radeva. Extended Bag-of-Tracklets for Multi-Face Tracking in Egocentric Photo Streams. Computer Vision and Image Understanding, Volume 149, 146-156, 2016. Special Issue on Assistive Computer Vision and Robotics, Elsevier, 2016. doi: 10.1016/j.cviu.2016.02.013
  • 55. Conclusions  Healthy habits – one of the main health concern for people, society, and governments  Deep learning – a technology that came to stay  A new technological trend that is affecting directly our environment  Food analysis and recognition – a new challenge with huge potential for applications  We need food databases of millions of images and thousands of categories  A wide set of problems for food analysis – recognition, segmentation, habits characterization, image and video description, etc.  Egocentric vision and Lifelogging – a recent trend in Computer Vision and unexplored technology that hides big potential to help people monitor and describe their behaviour and thus improve their lifestyle. 22:45

Editor's Notes

  1. 51% de la població catalana de 18 a 74 anys pateix un excés de pes important –un 15% són obesos–, aquesta situació afecta un 62% dels que no tenen estudis o no van superar els de Primària, i un 36% de les famílies amb formació universitària.
  2. “Deep learning: In recent years, some of the most impressive advancements in machine learning have been in the subfield of deep learning, also known as deep network learning. Deep learning uses structures loosely inspired by the human brain, consisting of a set of units (or “neurons”). Each unit combines a set of input values to produce an output value, which in turn is passed on to other neurons downstream. …”
  3. Exponential linear units- ELU all benefits of relu, does not die, closer to zero meanoutputs, but computation requires exp()