Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Robert Figiel
Co-Founder & CTO
Deep Learning
Image Classification
Deep Learning
Image Classification
Javier Abadía
Lead De...
WHAT DO WE DO AT STYLESAGE?
Web-Crawling of
100M+ e-commerce
products daily.
Analysis of text,
machine learning,
image-rec...
CHALLENGE: CLASSIFY PRODUCTS FROM IMAGES
• Category: Dress
SOLUTION: CONVOLUTIONAL NEURAL NETWORKS (CNN)
Input
(Image Data)
BLACK BOX
(for now)
Convolutional
Neural Network
Output
(...
TRADITIONAL COMPUTING
algorithminput output
MACHINE LEARNING
model
training
input
output
algorithm
new
input
new
output
MACHINE LEARNING -
CLASSIFICATION
Features Classes
Supervised Learning
MACHINE LEARNING -
CLASSIFICATION
• Supervised Learning
– Decision Trees
– Bayesian Algorithms
– Regression
– Clustering
–...
LETTER RECOGNITION
28x28 – gray levels
784
LOGISTIC CLASSIFIER
WX+b=Y
784 x 35
+ =784 35 35
weights
input
features
bias scores
35
probabilities
P = softmax(Y)
GRADIENT DESCENT
CODE USING python/scikit-learn
""" based on
http://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.h...
CODE WITH tensorflow
import tensorflow as tf
graph = tf.Graph()
with graph.as_default():
# Input data placeholder
tf_train...
LINEAR METHODS ARE LIMITED
TO LINEAR RELATIONSHIPS
PX ✕W1 +b1 Y s(Y)
PX ✕W1 +b1 Y s(Y)✕W2 +b2
activation
function
(RELU)
CODE WITH tensorflow
import tensorflow as tf
graph = tf.Graph()
with graph.as_default():
# Input data placeholder
tf_train...
NN VS CNN
Neural Network
(ANY numeric input)
Convolutional Neural Network
(IMAGE input)
DEEPLEARNING
MANY LAYERS FOR HIGHER ACCURACY
Example: GoogLeNet architecture (2014), 22 layers
Example:AlexNet (2012)
8 la...
ME ABURRO
CHOOSING A MODEL – OPEN SOURCE OPTIONS
• AlexNet (2012)
• 8 layers, 16.4% error rate on ImageNet
• GoogLeNet (2014)
• 22 l...
FRAMEWORK – OPEN SOURCE OPTIONS
• Caffe
• Developed by UC Berkley, Very efficient algorithms
• Implemented GoogLeNet, ResN...
IMPLEMENTING A CNN
MODEL – TRAIN - PREDICT
Select / Develop
MODEL
TRAIN/TEST model
with known images PREDICT
on new Images...
INFRASTRUCTURE – GPUS
Underlying CNN computations are mainly matrix multiplications
 GPUs (Graphical Processing Unit) 30-...
THANK YOU – WE ARE RECRUITING!
Team Slide - recruiting
www.stylesage.co/careers javier@stylesage.co
GRACIAS!
Deep learning image classification aplicado al mundo de la moda
Deep learning image classification aplicado al mundo de la moda
Deep learning image classification aplicado al mundo de la moda
Deep learning image classification aplicado al mundo de la moda
Nächste SlideShare
Wird geladen in …5
×

Deep learning image classification aplicado al mundo de la moda

824 Aufrufe

Veröffentlicht am

slides de la charla impartida en Codemotion 2016
http://2016.codemotion.es/agenda.html#5732408326356992/86464003

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

Deep learning image classification aplicado al mundo de la moda

  1. 1. Robert Figiel Co-Founder & CTO Deep Learning Image Classification Deep Learning Image Classification Javier Abadía Lead Developer
  2. 2. WHAT DO WE DO AT STYLESAGE? Web-Crawling of 100M+ e-commerce products daily. Analysis of text, machine learning, image-recognition Visualize insights for fashion brands & retailers Collect Data Analyze Products Visualize Insights
  3. 3. CHALLENGE: CLASSIFY PRODUCTS FROM IMAGES • Category: Dress
  4. 4. SOLUTION: CONVOLUTIONAL NEURAL NETWORKS (CNN) Input (Image Data) BLACK BOX (for now) Convolutional Neural Network Output (Probability Vector) • Dress : 94.8% • Skirt: 4.1% • Jacket: 1.2% • Pant: 0.1% • Socks: 0.01% • ...
  5. 5. TRADITIONAL COMPUTING algorithminput output
  6. 6. MACHINE LEARNING model training input output algorithm new input new output
  7. 7. MACHINE LEARNING - CLASSIFICATION Features Classes Supervised Learning
  8. 8. MACHINE LEARNING - CLASSIFICATION • Supervised Learning – Decision Trees – Bayesian Algorithms – Regression – Clustering – Neural Networks – … http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
  9. 9. LETTER RECOGNITION 28x28 – gray levels 784
  10. 10. LOGISTIC CLASSIFIER WX+b=Y 784 x 35 + =784 35 35 weights input features bias scores 35 probabilities P = softmax(Y)
  11. 11. GRADIENT DESCENT
  12. 12. CODE USING python/scikit-learn """ based on http://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html """ import numpy as np from sklearn import linear_model, metrics N = 50000 X = np.array([x.flatten() for x in data['train_dataset'][:N]]) Y = data['train_labels'][:N] solver = 'sag' C = 0.001 # train logreg = linear_model.LogisticRegression(C=C, solver=solver) logreg.fit(X, Y) # test VX = np.array([x.flatten() for x in data['test_dataset']]) predicted_labels = logreg.predict(VX) print "%.3f" % (metrics.accuracy_score(predicted_labels, data['test_labels']),)
  13. 13. CODE WITH tensorflow import tensorflow as tf graph = tf.Graph() with graph.as_default(): # Input data placeholder tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size)) tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels)) # Variables. weights = tf.Variable( tf.truncated_normal([image_size * image_size, num_labels])) biases = tf.Variable(tf.zeros([num_labels])) # Training computation. logits = tf.matmul(tf_train_dataset, weights) + biases loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels)) # Optimizer. optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  14. 14. LINEAR METHODS ARE LIMITED TO LINEAR RELATIONSHIPS PX ✕W1 +b1 Y s(Y) PX ✕W1 +b1 Y s(Y)✕W2 +b2 activation function (RELU)
  15. 15. CODE WITH tensorflow import tensorflow as tf graph = tf.Graph() with graph.as_default(): # Input data placeholder tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size)) tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels)) # Variables weights1 = tf.Variable(tf.truncated_normal([image_size * image_size, n_hidden_moves])) biases1 = tf.Variable(tf.zeros([n_hidden_moves])) weights2 = tf.Variable(tf.truncated_normal([n_hidden_moves, num_labels])) biases2 = tf.Variable(tf.zeros([num_labels])) # Training model logits1 = tf.matmul(tf_train_dataset, weights1) + biases1 relu_output = tf.nn.relu(logits1) logits2 = tf.matmul(relu_output, weights2) + biases2 loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits2, tf_train_labels)) # Optimizer optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
  16. 16. NN VS CNN Neural Network (ANY numeric input) Convolutional Neural Network (IMAGE input)
  17. 17. DEEPLEARNING MANY LAYERS FOR HIGHER ACCURACY Example: GoogLeNet architecture (2014), 22 layers Example:AlexNet (2012) 8 layers
  18. 18. ME ABURRO
  19. 19. CHOOSING A MODEL – OPEN SOURCE OPTIONS • AlexNet (2012) • 8 layers, 16.4% error rate on ImageNet • GoogLeNet (2014) • 22 layers, 6.66% error rate on ImageNet • Google Inception v3 (2015) • 48 layers, 3.46% error rate • Microsoft ResNet (2015) • 152 layers, 3.57% error rate Yearly competition on ImageNet dataset with 1M images across 1000 object classes – models available open source Many models open source. No need to re-invent the wheel.
  20. 20. FRAMEWORK – OPEN SOURCE OPTIONS • Caffe • Developed by UC Berkley, Very efficient algorithms • Implemented GoogLeNet, ResNet • Large community • Tensorflow • Released 2015 by Google • Ready-to-use Implementions of GoogLeNet, Inception v3 • Tensorboard for visualizing training progress • Torch, Theano, Keras, ... Many Python frameworks available, all with many examples, good documentation and pre-implemented models Chose a Python Frame- work that fits your needs
  21. 21. IMPLEMENTING A CNN MODEL – TRAIN - PREDICT Select / Develop MODEL TRAIN/TEST model with known images PREDICT on new Images Feedback loop Additional Training Data
  22. 22. INFRASTRUCTURE – GPUS Underlying CNN computations are mainly matrix multiplications  GPUs (Graphical Processing Unit) 30-50X faster than CPUs 1 CPU: 2 sec 1 GPU: 50ms 30-50X faster vs. Use GPU based servers for faster training and predictions
  23. 23. THANK YOU – WE ARE RECRUITING! Team Slide - recruiting www.stylesage.co/careers javier@stylesage.co
  24. 24. GRACIAS!

×