SlideShare ist ein Scribd-Unternehmen logo
1 von 104
Downloaden Sie, um offline zu lesen
Creation and Optimization of a Logo
Recognition System
Haozhi Qi, Owen Richfield, Xiaohui Zeng, Michael Zhao
Academic Mentor: Dr. Albert Ku
Industrial Mentor: Mr. Sun Lin
August 6, 2015
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Problem Description
Problem: What if there was an
app that could provide a
smartphone user with
information about a company
just by recognizing that
company’s logo in an image?
Goal: Create this app.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
Model Introduction
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
Model Introduction
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
Model Introduction
Bag of Features Model
Convolutional Neural Network
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
Model Introduction
Bag of Features Model
Convolutional Neural Network
Model Testing and Results
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
Model Introduction
Bag of Features Model
Convolutional Neural Network
Model Testing and Results
Application Demonstration
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
Model Introduction
Bag of Features Model
Convolutional Neural Network
Model Testing and Results
Application Demonstration
Conclusions and Future Work
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction and description: SURF
Interest points detection
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction and description: SURF
Interest points detection
Rotational and scale-invariant features
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction and description: SURF
Interest points detection
Rotational and scale-invariant features
Interest points description
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction and description: SURF
Interest points detection
Rotational and scale-invariant features
Interest points description
Good representation form of image
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction and description: SURF
Interest points detection
Rotational and scale-invariant features
Interest points description
Good representation form of image
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
SURF: Interest points detection
Use determinant of Hessian to detect blob-like structure
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
SURF: Interest points detection
Use determinant of Hessian to detect blob-like structure
Use box filter to approximate the second order derivative of Gaussian filter
Second-order box filter
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
SURF: Interest points detection
Use determinant of Hessian to detect blob-like structure
Use box filter to approximate the second order derivative of Gaussian filter
Taking advantages of integral image
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
SURF: Interest points detection
Use determinant of Hessian to detect blob-like structure
Use box filter to approximate the second order derivative of Gaussian filter
Taking advantages of integral domain
Apply scale-space analysis to choose
the appropriate points scale
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
SURF: Interest points description
Calculate dominant orientation based on Haar wavelet analysis
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
SURF: Interest points description
Calculate dominant orientation based on Haar wavelet analysis
Build 4*4 descriptor
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
BOW Training
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Vector Clustering
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
Clustering Method in N-dimensional Space
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
Clustering Method in N-dimensional Space
Algorithmic Steps:
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
Clustering Method in N-dimensional Space
Algorithmic Steps:
With a given set of data, choose k cluster centers
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
Clustering Method in N-dimensional Space
Algorithmic Steps:
With a given set of data, choose k cluster centers
Calculate distances between each data point and each
cluster
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
Clustering Method in N-dimensional Space
Algorithmic Steps:
With a given set of data, choose k cluster centers
Calculate distances between each data point and each
cluster
Cluster points based on min distance
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
Clustering Method in N-dimensional Space
Algorithmic Steps:
With a given set of data, choose k cluster centers
Calculate distances between each data point and each
cluster
Cluster points based on min distance
Recalculate cluster centers:
vi =
1
ci
ci
j=1
xj
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
Clustering Method in N-dimensional Space
Algorithmic Steps:
With a given set of data, choose k cluster centers
Calculate distances between each data point and each
cluster
Cluster points based on min distance
Recalculate cluster centers:
vi =
1
ci
ci
j=1
xj
vi=new cluster center, ci=number of data points in ith
cluster, xj=jth
data point in ith
cluster.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
K-means Clustering
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Hierarchical K-means
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
FEATURE VECTORS
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
X
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
X X
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
X XXXXXX
X X X
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
word
1
word
2
word
3
word
4
word
5
0
2
4
6
8
3
8
2
5
1
matches
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Inverted File Index
word 1:
word 2
word 3
word 4
word 5
word 6
...
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Inverted File Index
word 1: image 1, image 3, image 5, ...
word 2: image 4, image 9, image 16, ...
word 3: image 4, image 12, image 13, ...
word 4: image 1, image 5, image 7, ...
word 5: image 2, image 3, image 9, ...
word 6: image 7, image 12, image 17, ...
...
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Classification: Inverted File Index
Benefit: retrieval via the inverted file is faster than
searching every image
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Classification: Inverted File Index
Benefit: retrieval via the inverted file is faster than
searching every image
Drawback: lack of spatial accuracy
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Classification: Inverted File Index
Benefit: retrieval via the inverted file is faster than
searching every image
Drawback: lack of spatial accuracy
Need additional verification to re-rank the retrieval images
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Simple Algorithm:
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Simple Algorithm:
Match each descriptor in query image to its nearest
neighbor descriptor from list image.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Simple Algorithm:
Match each descriptor in query image to its nearest
neighbor descriptor from list image.
Compare L2 norm of the pair to the norm of the query
descriptor and every other descriptor in list image.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Simple Algorithm:
Match each descriptor in query image to its nearest
neighbor descriptor from list image.
Compare L2 norm of the pair to the norm of the query
descriptor and every other descriptor in list image.
If original norm is significantly smaller, count as “match”.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Simple Algorithm:
Match each descriptor in query image to its nearest
neighbor descriptor from list image.
Compare L2 norm of the pair to the norm of the query
descriptor and every other descriptor in list image.
If original norm is significantly smaller, count as “match”.
Sum up number of “matches” for each list image and divide
by total number of features.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
Match descriptors of query image to descriptors in images
in returned list.
Simple Algorithm:
Match each descriptor in query image to its nearest
neighbor descriptor from list image.
Compare L2 norm of the pair to the norm of the query
descriptor and every other descriptor in list image.
If original norm is significantly smaller, count as “match”.
Sum up number of “matches” for each list image and divide
by total number of features.
The returned list is then re-ranked based on this “match
ratio” and returned to the user.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Convolutional Neural
Networks (CNNs)
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Neural Networks
Figure: Neural network from http://www.texample.net/media/
tikz/examples/PNG/neural-network.png
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Convolutional Neural Networks
Convolutional neural networks are neural networks with an
additional biological inspiration.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Convolutional Neural Networks
Convolutional neural networks are neural networks with an
additional biological inspiration. Each layer is of two basic
types: convolution and pooling.
Convolution is the process of convolving an image with a
kernel. This idea comes from image processing where it
has been used for things like edge detection. Here, we
want to learn kernels specific to the data.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Convolutional Neural Networks
Convolutional neural networks are neural networks with an
additional biological inspiration. Each layer is of two basic
types: convolution and pooling.
Convolution is the process of convolving an image with a
kernel. This idea comes from image processing where it
has been used for things like edge detection. Here, we
want to learn kernels specific to the data.
Pooling refers to the process of providing a statistical
summary of the outputs of several nearby “neurons”, e.g.
by taking an average or max.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Figure: Description of convolution process from http://www.
songho.ca/dsp/convolution/files/conv2d_matrix.jpg.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only had
around 16,000 images, so we used two pre-trained models to
do fine-tuning:
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only had
around 16,000 images, so we used two pre-trained models to
do fine-tuning:
AlexNet [?], the winner of the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) 2012.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only had
around 16,000 images, so we used two pre-trained models to
do fine-tuning:
AlexNet [?], the winner of the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) 2012.
GoogLeNet [?], the winner of the ILSVRC 2014.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only had
around 16,000 images, so we used two pre-trained models to
do fine-tuning:
AlexNet [?], the winner of the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) 2012.
GoogLeNet [?], the winner of the ILSVRC 2014.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only had
around 16,000 images, so we used two pre-trained models to
do fine-tuning:
AlexNet [?], the winner of the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) 2012.
GoogLeNet [?], the winner of the ILSVRC 2014.
Both of these are provided in Caffe’s Model Zoo, with a file that
stores the weights of these models after training on ImageNet.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
AlexNet
Figure: Image of AlexNet architecture (from [?]). This also illustrates
how original the network was split to train on two GPUs.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
GoogLeNet
Figure: Image of GoogLeNet architecture (from [?]). Deeper, and 12x
fewer parameters than AlexNet.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Filter/Layer Visualization
Let’s do some filter/layer visualization!
143.89.75.120/filayer.html
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Model Testing
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Dataset Construction
We gathered a data set of images of logos of 167 brands using
Bing Search API (on average, 100 images per brand),
searching for things like “<brand>”, “<brand>building”,
“<brand><product>”.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Dataset Construction
We gathered a data set of images of logos of 167 brands using
Bing Search API (on average, 100 images per brand),
searching for things like “<brand>”, “<brand>building”,
“<brand><product>”. One problem we faced was that we
downloaded either mislabeled images or irrelevant images. We
filtered the dataset using two methods:
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Dataset Construction
We gathered a data set of images of logos of 167 brands using
Bing Search API (on average, 100 images per brand),
searching for things like “<brand>”, “<brand>building”,
“<brand><product>”. One problem we faced was that we
downloaded either mislabeled images or irrelevant images. We
filtered the dataset using two methods:
compute the proportion of matching SIFT descriptors
between the downloaded image and a reference image for
that brand, and toss the image if it doesn’t meet some
threshold
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Dataset Construction
We gathered a data set of images of logos of 167 brands using
Bing Search API (on average, 100 images per brand),
searching for things like “<brand>”, “<brand>building”,
“<brand><product>”. One problem we faced was that we
downloaded either mislabeled images or irrelevant images. We
filtered the dataset using two methods:
compute the proportion of matching SIFT descriptors
between the downloaded image and a reference image for
that brand, and toss the image if it doesn’t meet some
threshold
import ManualLabor
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing the original pipeline
parameter tuning
cross validation
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
BOW structure: how to choose vocabulary size:
words = BL
B: number of branch; L: number of level
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
BOW structure: how to choose vocabulary size:
words = BL
B: number of branch; L: number of level
Too large: lack of generalization, overfitting
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
BOW structure: how to choose vocabulary size:
words = BL
B: number of branch; L: number of level
Too large: lack of generalization, overfitting
Too small: lack of discrimination,mismatched
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
vocabulary size
How to choose the number of images returned by inverted
file index search
accuracy
the computation time of re-ranking
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
vocabulary size
How to choose the number of images returned by inverted
file index search
accuracy
the computation time of re-ranking
How to choose the number of image shown in the client
side
accuracy
mobile application, the size of screen
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
vocabulary size
How to choose the number of images returned by inverted
file index search
accuracy
the computation time of re-ranking
How to choose the number of image shown in the client
side
accuracy
mobile application, the size of screen
post
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
vocabulary size
the number of images returned by searching
the number of image shown
Re-ranking: how to determine weight factor w in the
weighted function
scores = w ∗ I + (1 − w) ∗ F
I: number of inliers
F: frequency of the brands in the return images
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameters for Evaluation
vocabulary size
number of branch
number of level
the number of images returned by searching
the number of image shown
weight factor w in the weighted function
calculation of the accuracy
one correct return then accuracy = 1
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Cross Validation
application
model selection
model assessment
procedure
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Cross Validation
randomly divide the data into K
equal sized parts.
leave out part k, fit the
model to the other K-1
parts(combined), and then
obtain predictions for the
left-out kth part
this is done in turn for each
part k=1,2,...K, and then
the results are combined
choose k = 5
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing Result
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing Result
test on vocabulary size
optimal number of words: 500000 to 800000
number of branch = 14 or 15
number of level = 5
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing Result
With other
parameters fixed,
test on
weight factor
number of return
image
number of image
shown on the
client side
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing Result
optimal parameter
setting:
number of image
shown = 6
set number of
return image to
be 15, saving
about 0.3s
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing Summary
optimal parameter setting:
number of words: 500000 to 800000
number of image return: 15
number of image shown: 6
stability of the system was also test:
standard deviation of 5 fold cross validation range from
0.005 to 0.007
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Evaluation of Deep Learning framework
Cross-validation for AlexNet (Top-5 Accuracy)
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
1000
6000
11000
16000
21000
26000
31000
36000
41000
46000
51000
56000
61000
66000
71000
76000
81000
86000
91000
96000
101000
106000
111000
116000
121000
126000
131000
136000
141000
146000
151000
156000
161000
166000
171000
176000
181000
186000
191000
196000
Cross Validation Example
94.63%
94.02%
93.80%
94.02%
93.90%
93.59%
94.11%
93.44%
94.54%
93.80%
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Evaluation of Deep Learning framework
Cross-validation for AlexNet
Final Accuracy reaches: (AlexNet)
AlexNet
Top-1 Accuracy 93.33%
Top-5 Accuracy 96.73%
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Evaluation of Deep Learning framework
Cross-validation for GoogleNet (Top-5 Accuracy)
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Evaluation of Deep Learning framework
Cross-validation for AlexNet
Cross-validation for GoogleNet
Final Accuracy reaches: (GoogleNet)
GoogleNet
Top-1 Accuracy 94.05%
Top-5 Accuracy 97.39%
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Evaluation of Deep Learning framework
Final Comparison
GoogleNet AlexNet Visual Bag of Words
Accuracy (Top-5) 97.39% 96.73% 87.6%
Efficiency
Preprocess 8.47ms 7.5ms 6ms
Classification 17.7ms 6.94ms
SURF Feature
extraction
24ms
Total Time
(Including some
system level
operation)
129ms 170ms 281ms
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Demonstration
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Future development
There is still something we can do to improve the system
We can enlarge the data set. (Currently 167 classes and
16,000 images)
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Future development
There is still something we can do to improve the system
We can enlarge the data set. (Currently 167 classes and
16,000 images)
Test different deep learning frameworks.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Future development
There is still something we can do to improve the system
We can enlarge the data set. (Currently 167 classes and
16,000 images)
Test different deep learning frameworks.
Combine locally hand-crafted feature and globally deep
learned feature to achieve better accuracy.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
We would like to thank
Mr. Sun Lin and Lenovo-Hong Kong.
Professor Shingyu Leung, Dr. Ku Yin Bon and Hong Kong
University of Science and Technology.
Professor Susanna Serna and the Institute for Pure and
Applied Mathematics.
The National Science Foundation for program funding -
Grant DMS #0931852.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo

Weitere ähnliche Inhalte

Ähnlich wie rips-hk-lenovo (1)

Ähnlich wie rips-hk-lenovo (1) (20)

Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
 
Lec07 aggregation-and-retrieval-system
Lec07 aggregation-and-retrieval-systemLec07 aggregation-and-retrieval-system
Lec07 aggregation-and-retrieval-system
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 
Region-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object RetrievalRegion-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object Retrieval
 
Ai use cases
Ai use casesAi use cases
Ai use cases
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NET
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Api specification based function search engine using natural language query-S...
Api specification based function search engine using natural language query-S...Api specification based function search engine using natural language query-S...
Api specification based function search engine using natural language query-S...
 
Anomaly Detection with Azure and .net
Anomaly Detection with Azure and .netAnomaly Detection with Azure and .net
Anomaly Detection with Azure and .net
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
object-detection.pptx
object-detection.pptxobject-detection.pptx
object-detection.pptx
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
Visual search
Visual searchVisual search
Visual search
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Final exam in advance dbms
Final exam in advance dbmsFinal exam in advance dbms
Final exam in advance dbms
 
Video + Language 2019
Video + Language 2019Video + Language 2019
Video + Language 2019
 
Video + Language
Video + LanguageVideo + Language
Video + Language
 

rips-hk-lenovo (1)

  • 1. Creation and Optimization of a Logo Recognition System Haozhi Qi, Owen Richfield, Xiaohui Zeng, Michael Zhao Academic Mentor: Dr. Albert Ku Industrial Mentor: Mr. Sun Lin August 6, 2015 Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 2. Problem Description Problem: What if there was an app that could provide a smartphone user with information about a company just by recognizing that company’s logo in an image? Goal: Create this app. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 3. Outline Model Introduction Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 4. Outline Model Introduction Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 5. Outline Model Introduction Bag of Features Model Convolutional Neural Network Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 6. Outline Model Introduction Bag of Features Model Convolutional Neural Network Model Testing and Results Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 7. Outline Model Introduction Bag of Features Model Convolutional Neural Network Model Testing and Results Application Demonstration Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 8. Outline Model Introduction Bag of Features Model Convolutional Neural Network Model Testing and Results Application Demonstration Conclusions and Future Work Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 9. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 10. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 11. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 12. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 13. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 14. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 15. Feature Extraction Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 16. Feature Extraction and description: SURF Interest points detection Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 17. Feature Extraction and description: SURF Interest points detection Rotational and scale-invariant features Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 18. Feature Extraction and description: SURF Interest points detection Rotational and scale-invariant features Interest points description Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 19. Feature Extraction and description: SURF Interest points detection Rotational and scale-invariant features Interest points description Good representation form of image Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 20. Feature Extraction and description: SURF Interest points detection Rotational and scale-invariant features Interest points description Good representation form of image Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 21. SURF: Interest points detection Use determinant of Hessian to detect blob-like structure Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 22. SURF: Interest points detection Use determinant of Hessian to detect blob-like structure Use box filter to approximate the second order derivative of Gaussian filter Second-order box filter Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 23. SURF: Interest points detection Use determinant of Hessian to detect blob-like structure Use box filter to approximate the second order derivative of Gaussian filter Taking advantages of integral image Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 24. SURF: Interest points detection Use determinant of Hessian to detect blob-like structure Use box filter to approximate the second order derivative of Gaussian filter Taking advantages of integral domain Apply scale-space analysis to choose the appropriate points scale Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 25. SURF: Interest points description Calculate dominant orientation based on Haar wavelet analysis Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 26. SURF: Interest points description Calculate dominant orientation based on Haar wavelet analysis Build 4*4 descriptor Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 27. BOW Training Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 28. Feature Vector Clustering Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 29. Basics of K-means Clustering Method in N-dimensional Space Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 30. Basics of K-means Clustering Method in N-dimensional Space Algorithmic Steps: Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 31. Basics of K-means Clustering Method in N-dimensional Space Algorithmic Steps: With a given set of data, choose k cluster centers Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 32. Basics of K-means Clustering Method in N-dimensional Space Algorithmic Steps: With a given set of data, choose k cluster centers Calculate distances between each data point and each cluster Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 33. Basics of K-means Clustering Method in N-dimensional Space Algorithmic Steps: With a given set of data, choose k cluster centers Calculate distances between each data point and each cluster Cluster points based on min distance Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 34. Basics of K-means Clustering Method in N-dimensional Space Algorithmic Steps: With a given set of data, choose k cluster centers Calculate distances between each data point and each cluster Cluster points based on min distance Recalculate cluster centers: vi = 1 ci ci j=1 xj Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 35. Basics of K-means Clustering Method in N-dimensional Space Algorithmic Steps: With a given set of data, choose k cluster centers Calculate distances between each data point and each cluster Cluster points based on min distance Recalculate cluster centers: vi = 1 ci ci j=1 xj vi=new cluster center, ci=number of data points in ith cluster, xj=jth data point in ith cluster. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 36. K-means Clustering Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 37. Hierarchical K-means Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 38. Bag of Words and Hierarchical K-means FEATURE VECTORS CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 39. Bag of Words and Hierarchical K-means CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 40. Bag of Words and Hierarchical K-means CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 41. Bag of Words and Hierarchical K-means CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. X Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 42. Bag of Words and Hierarchical K-means CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. X X Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 43. Bag of Words and Hierarchical K-means CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. CL. X XXXXXX X X X Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 44. Bag of Words and Hierarchical K-means word 1 word 2 word 3 word 4 word 5 0 2 4 6 8 3 8 2 5 1 matches Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 45. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 46. Inverted File Index word 1: word 2 word 3 word 4 word 5 word 6 ... Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 47. Inverted File Index word 1: image 1, image 3, image 5, ... word 2: image 4, image 9, image 16, ... word 3: image 4, image 12, image 13, ... word 4: image 1, image 5, image 7, ... word 5: image 2, image 3, image 9, ... word 6: image 7, image 12, image 17, ... ... Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 48. Classification: Inverted File Index Benefit: retrieval via the inverted file is faster than searching every image Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 49. Classification: Inverted File Index Benefit: retrieval via the inverted file is faster than searching every image Drawback: lack of spatial accuracy Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 50. Classification: Inverted File Index Benefit: retrieval via the inverted file is faster than searching every image Drawback: lack of spatial accuracy Need additional verification to re-rank the retrieval images Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 51. Bag of Features Model Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 52. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 53. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Simple Algorithm: Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 54. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Simple Algorithm: Match each descriptor in query image to its nearest neighbor descriptor from list image. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 55. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Simple Algorithm: Match each descriptor in query image to its nearest neighbor descriptor from list image. Compare L2 norm of the pair to the norm of the query descriptor and every other descriptor in list image. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 56. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Simple Algorithm: Match each descriptor in query image to its nearest neighbor descriptor from list image. Compare L2 norm of the pair to the norm of the query descriptor and every other descriptor in list image. If original norm is significantly smaller, count as “match”. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 57. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Simple Algorithm: Match each descriptor in query image to its nearest neighbor descriptor from list image. Compare L2 norm of the pair to the norm of the query descriptor and every other descriptor in list image. If original norm is significantly smaller, count as “match”. Sum up number of “matches” for each list image and divide by total number of features. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 58. Re-ranking of Return Images Match descriptors of query image to descriptors in images in returned list. Simple Algorithm: Match each descriptor in query image to its nearest neighbor descriptor from list image. Compare L2 norm of the pair to the norm of the query descriptor and every other descriptor in list image. If original norm is significantly smaller, count as “match”. Sum up number of “matches” for each list image and divide by total number of features. The returned list is then re-ranked based on this “match ratio” and returned to the user. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 59. Convolutional Neural Networks (CNNs) Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 60. Neural Networks Figure: Neural network from http://www.texample.net/media/ tikz/examples/PNG/neural-network.png Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 61. Convolutional Neural Networks Convolutional neural networks are neural networks with an additional biological inspiration. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 62. Convolutional Neural Networks Convolutional neural networks are neural networks with an additional biological inspiration. Each layer is of two basic types: convolution and pooling. Convolution is the process of convolving an image with a kernel. This idea comes from image processing where it has been used for things like edge detection. Here, we want to learn kernels specific to the data. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 63. Convolutional Neural Networks Convolutional neural networks are neural networks with an additional biological inspiration. Each layer is of two basic types: convolution and pooling. Convolution is the process of convolving an image with a kernel. This idea comes from image processing where it has been used for things like edge detection. Here, we want to learn kernels specific to the data. Pooling refers to the process of providing a statistical summary of the outputs of several nearby “neurons”, e.g. by taking an average or max. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 64. Figure: Description of convolution process from http://www. songho.ca/dsp/convolution/files/conv2d_matrix.jpg. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 65. Implementation and Architecture For implementation of CNNs, we used Caffe [?]. We only had around 16,000 images, so we used two pre-trained models to do fine-tuning: Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 66. Implementation and Architecture For implementation of CNNs, we used Caffe [?]. We only had around 16,000 images, so we used two pre-trained models to do fine-tuning: AlexNet [?], the winner of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 67. Implementation and Architecture For implementation of CNNs, we used Caffe [?]. We only had around 16,000 images, so we used two pre-trained models to do fine-tuning: AlexNet [?], the winner of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012. GoogLeNet [?], the winner of the ILSVRC 2014. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 68. Implementation and Architecture For implementation of CNNs, we used Caffe [?]. We only had around 16,000 images, so we used two pre-trained models to do fine-tuning: AlexNet [?], the winner of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012. GoogLeNet [?], the winner of the ILSVRC 2014. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 69. Implementation and Architecture For implementation of CNNs, we used Caffe [?]. We only had around 16,000 images, so we used two pre-trained models to do fine-tuning: AlexNet [?], the winner of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012. GoogLeNet [?], the winner of the ILSVRC 2014. Both of these are provided in Caffe’s Model Zoo, with a file that stores the weights of these models after training on ImageNet. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 70. AlexNet Figure: Image of AlexNet architecture (from [?]). This also illustrates how original the network was split to train on two GPUs. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 71. GoogLeNet Figure: Image of GoogLeNet architecture (from [?]). Deeper, and 12x fewer parameters than AlexNet. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 72. Filter/Layer Visualization Let’s do some filter/layer visualization! 143.89.75.120/filayer.html Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 73. Model Testing Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 74. Dataset Construction We gathered a data set of images of logos of 167 brands using Bing Search API (on average, 100 images per brand), searching for things like “<brand>”, “<brand>building”, “<brand><product>”. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 75. Dataset Construction We gathered a data set of images of logos of 167 brands using Bing Search API (on average, 100 images per brand), searching for things like “<brand>”, “<brand>building”, “<brand><product>”. One problem we faced was that we downloaded either mislabeled images or irrelevant images. We filtered the dataset using two methods: Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 76. Dataset Construction We gathered a data set of images of logos of 167 brands using Bing Search API (on average, 100 images per brand), searching for things like “<brand>”, “<brand>building”, “<brand><product>”. One problem we faced was that we downloaded either mislabeled images or irrelevant images. We filtered the dataset using two methods: compute the proportion of matching SIFT descriptors between the downloaded image and a reference image for that brand, and toss the image if it doesn’t meet some threshold Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 77. Dataset Construction We gathered a data set of images of logos of 167 brands using Bing Search API (on average, 100 images per brand), searching for things like “<brand>”, “<brand>building”, “<brand><product>”. One problem we faced was that we downloaded either mislabeled images or irrelevant images. We filtered the dataset using two methods: compute the proportion of matching SIFT descriptors between the downloaded image and a reference image for that brand, and toss the image if it doesn’t meet some threshold import ManualLabor Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 78. Testing the original pipeline parameter tuning cross validation Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 79. Parameter Tuning BOW structure: how to choose vocabulary size: words = BL B: number of branch; L: number of level Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 80. Parameter Tuning BOW structure: how to choose vocabulary size: words = BL B: number of branch; L: number of level Too large: lack of generalization, overfitting Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 81. Parameter Tuning BOW structure: how to choose vocabulary size: words = BL B: number of branch; L: number of level Too large: lack of generalization, overfitting Too small: lack of discrimination,mismatched Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 82. Parameter Tuning vocabulary size How to choose the number of images returned by inverted file index search accuracy the computation time of re-ranking Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 83. Parameter Tuning vocabulary size How to choose the number of images returned by inverted file index search accuracy the computation time of re-ranking How to choose the number of image shown in the client side accuracy mobile application, the size of screen Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 84. Parameter Tuning vocabulary size How to choose the number of images returned by inverted file index search accuracy the computation time of re-ranking How to choose the number of image shown in the client side accuracy mobile application, the size of screen post Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 85. Parameter Tuning vocabulary size the number of images returned by searching the number of image shown Re-ranking: how to determine weight factor w in the weighted function scores = w ∗ I + (1 − w) ∗ F I: number of inliers F: frequency of the brands in the return images Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 86. Parameters for Evaluation vocabulary size number of branch number of level the number of images returned by searching the number of image shown weight factor w in the weighted function calculation of the accuracy one correct return then accuracy = 1 Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 87. Cross Validation application model selection model assessment procedure Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 88. Cross Validation randomly divide the data into K equal sized parts. leave out part k, fit the model to the other K-1 parts(combined), and then obtain predictions for the left-out kth part this is done in turn for each part k=1,2,...K, and then the results are combined choose k = 5 Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 89. Testing Result Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 90. Testing Result test on vocabulary size optimal number of words: 500000 to 800000 number of branch = 14 or 15 number of level = 5 Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 91. Testing Result With other parameters fixed, test on weight factor number of return image number of image shown on the client side Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 92. Testing Result optimal parameter setting: number of image shown = 6 set number of return image to be 15, saving about 0.3s Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 93. Testing Summary optimal parameter setting: number of words: 500000 to 800000 number of image return: 15 number of image shown: 6 stability of the system was also test: standard deviation of 5 fold cross validation range from 0.005 to 0.007 Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 94. Evaluation of Deep Learning framework Cross-validation for AlexNet (Top-5 Accuracy) 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 1000 6000 11000 16000 21000 26000 31000 36000 41000 46000 51000 56000 61000 66000 71000 76000 81000 86000 91000 96000 101000 106000 111000 116000 121000 126000 131000 136000 141000 146000 151000 156000 161000 166000 171000 176000 181000 186000 191000 196000 Cross Validation Example 94.63% 94.02% 93.80% 94.02% 93.90% 93.59% 94.11% 93.44% 94.54% 93.80% Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 95. Evaluation of Deep Learning framework Cross-validation for AlexNet Final Accuracy reaches: (AlexNet) AlexNet Top-1 Accuracy 93.33% Top-5 Accuracy 96.73% Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 96. Evaluation of Deep Learning framework Cross-validation for GoogleNet (Top-5 Accuracy) Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 97. Evaluation of Deep Learning framework Cross-validation for AlexNet Cross-validation for GoogleNet Final Accuracy reaches: (GoogleNet) GoogleNet Top-1 Accuracy 94.05% Top-5 Accuracy 97.39% Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 98. Evaluation of Deep Learning framework Final Comparison GoogleNet AlexNet Visual Bag of Words Accuracy (Top-5) 97.39% 96.73% 87.6% Efficiency Preprocess 8.47ms 7.5ms 6ms Classification 17.7ms 6.94ms SURF Feature extraction 24ms Total Time (Including some system level operation) 129ms 170ms 281ms Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 99. Demonstration Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 100. Future development There is still something we can do to improve the system We can enlarge the data set. (Currently 167 classes and 16,000 images) Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 101. Future development There is still something we can do to improve the system We can enlarge the data set. (Currently 167 classes and 16,000 images) Test different deep learning frameworks. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 102. Future development There is still something we can do to improve the system We can enlarge the data set. (Currently 167 classes and 16,000 images) Test different deep learning frameworks. Combine locally hand-crafted feature and globally deep learned feature to achieve better accuracy. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 103. We would like to thank Mr. Sun Lin and Lenovo-Hong Kong. Professor Shingyu Leung, Dr. Ku Yin Bon and Hong Kong University of Science and Technology. Professor Susanna Serna and the Institute for Pure and Applied Mathematics. The National Science Foundation for program funding - Grant DMS #0931852. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo
  • 104. Qi, Richfield, Zeng, Zhao RIPS-HK: Lenovo