Deep learning image classification aplicado al mundo de la moda

Robert Figiel
Co-Founder & CTO
Deep Learning
Image Classification
Deep Learning
Image Classification
Javier Abadía
Lead Developer

WHAT DO WE DO AT STYLESAGE?
Web-Crawling of
100M+ e-commerce
products daily.
Analysis of text,
machine learning,
image-recognition
Visualize insights for
fashion brands &
retailers
Collect Data Analyze Products Visualize Insights

CHALLENGE: CLASSIFY PRODUCTS FROM IMAGES
• Category: Dress

SOLUTION: CONVOLUTIONAL NEURAL NETWORKS (CNN)
Input
(Image Data)
BLACK BOX
(for now)
Convolutional
Neural Network
Output
(Probability Vector)
• Dress : 94.8%
• Skirt: 4.1%
• Jacket: 1.2%
• Pant: 0.1%
• Socks: 0.01%
• ...

TRADITIONAL COMPUTING
algorithminput output

MACHINE LEARNING
model
training
input
output
algorithm
new
input
new
output

MACHINE LEARNING -
CLASSIFICATION
Features Classes
Supervised Learning

MACHINE LEARNING -
CLASSIFICATION
• Supervised Learning
– Decision Trees
– Bayesian Algorithms
– Regression
– Clustering
– Neural Networks
– …
http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/

LETTER RECOGNITION
28x28 – gray levels
784

LOGISTIC CLASSIFIER
WX+b=Y
784 x 35
+ =784 35 35
weights
input
features
bias scores
35
probabilities
P = softmax(Y)

CODE USING python/scikit-learn
""" based on
http://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html """
import numpy as np
from sklearn import linear_model, metrics
N = 50000
X = np.array([x.flatten() for x in data['train_dataset'][:N]])
Y = data['train_labels'][:N]
solver = 'sag'
C = 0.001
# train
logreg = linear_model.LogisticRegression(C=C, solver=solver)
logreg.fit(X, Y)
# test
VX = np.array([x.flatten() for x in data['test_dataset']])
predicted_labels = logreg.predict(VX)
print "%.3f" % (metrics.accuracy_score(predicted_labels, data['test_labels']),)

CODE WITH tensorflow
import tensorflow as tf
graph = tf.Graph()
with graph.as_default():
# Input data placeholder
tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
# Variables.
weights = tf.Variable(
tf.truncated_normal([image_size * image_size, num_labels]))
biases = tf.Variable(tf.zeros([num_labels]))
# Training computation.
logits = tf.matmul(tf_train_dataset, weights) + biases
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
# Optimizer.
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

LINEAR METHODS ARE LIMITED
TO LINEAR RELATIONSHIPS
PX ✕W1 +b1 Y s(Y)
PX ✕W1 +b1 Y s(Y)✕W2 +b2
activation
function
(RELU)

CODE WITH tensorflow
import tensorflow as tf
graph = tf.Graph()
with graph.as_default():
# Input data placeholder
tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
# Variables
weights1 = tf.Variable(tf.truncated_normal([image_size * image_size, n_hidden_moves]))
biases1 = tf.Variable(tf.zeros([n_hidden_moves]))
weights2 = tf.Variable(tf.truncated_normal([n_hidden_moves, num_labels]))
biases2 = tf.Variable(tf.zeros([num_labels]))
# Training model
logits1 = tf.matmul(tf_train_dataset, weights1) + biases1
relu_output = tf.nn.relu(logits1)
logits2 = tf.matmul(relu_output, weights2) + biases2
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(logits2, tf_train_labels))
# Optimizer
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

NN VS CNN
Neural Network
(ANY numeric input)
Convolutional Neural Network
(IMAGE input)

DEEPLEARNING
MANY LAYERS FOR HIGHER ACCURACY
Example: GoogLeNet architecture (2014), 22 layers
Example:AlexNet (2012)
8 layers

CHOOSING A MODEL – OPEN SOURCE OPTIONS
• AlexNet (2012)
• 8 layers, 16.4% error rate on ImageNet
• GoogLeNet (2014)
• 22 layers, 6.66% error rate on ImageNet
• Google Inception v3 (2015)
• 48 layers, 3.46% error rate
• Microsoft ResNet (2015)
• 152 layers, 3.57% error rate
Yearly competition on ImageNet dataset with 1M images
across 1000 object classes – models available open source
Many models open source.
No need to re-invent the
wheel.

FRAMEWORK – OPEN SOURCE OPTIONS
• Caffe
• Developed by UC Berkley, Very efficient algorithms
• Implemented GoogLeNet, ResNet
• Large community
• Tensorflow
• Released 2015 by Google
• Ready-to-use Implementions of GoogLeNet, Inception v3
• Tensorboard for visualizing training progress
• Torch, Theano, Keras, ...
Many Python frameworks available, all with many examples,
good documentation and pre-implemented models
Chose a Python Frame-
work that fits your needs

IMPLEMENTING A CNN
MODEL – TRAIN - PREDICT
Select / Develop
MODEL
TRAIN/TEST model
with known images PREDICT
on new Images
Feedback loop
Additional Training Data

INFRASTRUCTURE – GPUS
Underlying CNN computations are mainly matrix multiplications
 GPUs (Graphical Processing Unit) 30-50X faster than CPUs
1 CPU: 2 sec
1 GPU: 50ms
30-50X faster
vs.
Use GPU based
servers for faster
training and
predictions

THANK YOU – WE ARE RECRUITING!
Team Slide - recruiting
www.stylesage.co/careers javier@stylesage.co

Deep learning image classification aplicado al mundo de la moda

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (8)

Ähnlich wie Deep learning image classification aplicado al mundo de la moda

Ähnlich wie Deep learning image classification aplicado al mundo de la moda (20)

Mehr von Javier Abadía

Mehr von Javier Abadía (12)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Deep learning image classification aplicado al mundo de la moda

Hinweis der Redaktion