SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 11 | Nov -2017 www.irjet.net p-ISSN: 2395-0072
Comparative Study of Object Detection Algorithms
Nikhil Yadav1 , Utkarsh Binay2
1Student,Computer Engineering Department, Rizvi College of Engineering, Maharashtra, India
2Student, Computer Science Department, Mukesh Patel School of Technology And Management,
Maharashtra, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - This paper aims to find the best possible
combination of speed and accuracy while comparing
different object detection algorithms that use convolutional
neural networks to perform object detection. Models which
are usually computationally expensive, achieve the best
accuracy, but cannot be deployed in a simple setting with
limited resources, whereas, faster models fail to achieve
similar results compared to their bigger and more memory
intensive counterparts. We discuss two ends of a spectrum
here comparing three different models ,i.e Single Shot
Detector (SSD) [1], Faster R-CNN (Region - based
Convolutional
Neural Networks)[3], R-FCN(Region based-Fully
Convolutional Networks)[2], where on one end we get a
model which can be deployed on a mobile device because of
its speed and another model which is at the cutting edge of
performance when it comes to accuracy. These models are
trained and their performance metrics tested on the COCO
( common objects in context) dataset.
Key Words: COCO dataset, Bounding boxes, Single Shot
Detector, Inception Resnet, Resnet-101, Mobile Net, VGG-
16(Visual Geometry Group),Faster R-CNN, R-FCN, Hyper
parameter tuning, mAP( mean average precision)
1.INTRODUCTION
State of the art computer vision systems have involved a
range of object detection models that use convolutional
neural networks in their working. The deployment of
these models like YOLO[9], SSD[1], R-FCN[2],R-CNN, etc
depends on the kind of usage we want. Google photos has
deployed models like SSD Mobile Net which is known for
its speed and isn’t memory intensive, here performance
isn’t the most important factor ,but memory efficiency is.
In case of self driving cars though, the requirement for an
extremely accurate model is a priority, as these real time
systems are performing tasks which can lead to a life and
death situation in their surrounding. We need to evaluate
these models using several evaluation metrics like mAP,
memory requirement ,testing time. We have only
considered models which are quite similar in their
architecture on a high level overview. The models that are
considered are Faster R-CNN[3], R-FCN[2], SSD[1] , these
models have used sliding window style predictions and
have only used a single CNN network. These models are
responsible for the object detection part, whereas the
feature extraction part is carried out by the different
image classification models , the state of the art ones
which have participated in the image net competitions. We
refer to the object detection models as meta architectures
and they are coupled with different feature extractors for
eg( VGG, Resnet, Mobile Net[10]), to evaluate the various
combinations we get through them. We would first
describe the different meta architectures and then the
different feature extractors, and then compare their
combinations by testing them on the objects present in the
COCO dataset. In this paper we have tried to perform this
experiment in controlled conditions with same hardware
conditions for comparison and run these tests multiple
times so that they prove to be statistically consistent. We
have used the deep learning library Tensorflow[5] for our
implementation. Previous experiments by others have
been performed on caffe, and tried on other standard
datasets like PASCAL VOC(Visual Object Classes). A few
have fine tuned the detectors for their specific objects and
noted the test times for different models. The main reason
for using tensor flow[5] was its portability and ease of use.
Fig-1: An example of an object detector in use
2. META ARCHITECTURE
Modern meta architectures all use CNN for object
detection , let us have a look at the history of a few meta
architectures that we are discussing, and have an in depth
look at their inner working.
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 586
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 11 | Nov -2017 www.irjet.net p-ISSN: 2395-0072
2.1 R-CNN
The R-CNN model was one of the first models to use
convolutional neural networks for object detection. The
goal of R-CNN is to take in an image, and correctly identify
where the main objects (via a bounding box) in the image.
But how do we find out where these bounding boxes are?
R-CNN does what we might intuitively do as well -propose
a bunch of boxes in the image and see if any of them
actually correspond to an object. R-CNN creates these
bounding boxes, or region proposals using a process called
Selective Search. Selective search, looks at the image
through windows of different sizes, and for each size tries
to group together adjacent pixels by texture, color, or
intensity to identify objects. Once the proposals are
created, R-CNN warps the region to a standard square size
and passes it through to a feature extractor or image
classifier , which is a CNN. On the final layer of the CNN, R-
CNN adds a Support Vector Machine (SVM) that simply
classifies whether this is an object, and if yes, which object
is it.
2.2 Fast R-CNN
R-CNN works really well, but is really quite slow for a few
simple reasons It requires a forward pass of the CNN for
every single region proposal for every single image. It has
to train three different models separately - the CNN to
generate image features, the classifier that predicts the
class, and the regression model to tighten the bounding
boxes. This makes the pipeline extremely hard to train.
Both these problems were solved in Fast R-CNN[4] by the
creator of R-CNN himself. For the forward pass of the CNN,
Girshick realized that for each image, a lot of proposed
regions for the image invariably overlapped causing us to
run the same CNN computation again and again . His
insight was simple — Why not run the CNN just once per
image and then find a way to share that computation
across the proposals? .This is exactly what Fast R-CNN
does using a technique known as RoI Pool (Region of
Interest Pooling). At its core, RoI Pool shares the forward
pass of a CNN for an image across its subregions. In the
image above, notice how the CNN features for each region
are obtained by selecting a corresponding region from the
CNN’s feature map. Then, the features in each region are
pooled (usually using max pooling). So all it takes us is one
pass of the original image .The second insight of Fast R-
CNN is to jointly train the CNN, classifier, and bounding
box regressor in a single model. Where earlier we had
different models to extract image features (CNN), classify
(SVM), and tighten bounding boxes (regressor),Fast R-
CNN[4] instead used a single network to compute all three.
Fast R-CNN replaced the SVM classifier with a softmax
layer on top of the CNN to output a classification. It also
added a linear regression layer parallel to the softmax
layer to output bounding box coordinates. In this way, all
the outputs needed came from one single network.
2.3 Faster R-CNN
There was still one bottleneck with Fast R-CNN which had
to be sorted, that was the region proposer. The very first
step to detecting the locations of objects is generating a
bunch of potential bounding boxes or regions of interest to
test. In Fast R-CNN, these proposals were created using
Selective Search, a fairly slow process that was found to be
the bottleneck of the overall process. Faster R-CNN[3]
found a way a to make the step of region proposal almost
cost free. The insight of Faster R-CNN was that region
proposals depended on features of the image that were
already calculated with the forward pass of the CNN (first
step of classification). So why not reuse those same CNN
results for region proposals instead of running a separate
selective search algorithm? Indeed, this is just what the
Faster R-CNN team achieved. In this model, a single CNN is
used to both carry out region proposals and classification.
This way,only one CNN needs to be trained and we get
region proposals almost for free. How the Regions are
Generated? Let’s take a moment to see how Faster R-CNN
generates these region proposals from CNN features.
Faster R-CNN adds a Fully Convolutional Network on top
of the features of the CNN creating what’s known as the
Region Proposal Network.
Fig-2 : The Region Proposal Network slides a
window over the features of the CNN. At each
window location, the network outputs a score and
a bounding box per anchor (hence 4k box coordinates
where k is the number of anchors)
2.4 R-FCN
With the increase in performance, Faster R-CNN was an
order of magnitude swifter than its earlier counterpart
fast R-CNN .But there was issue of applying the region-
specific component had to be applied several times in an
image, this issue was resolved in R-FCN (Region based
Fully Convolutional networks) where the computation
required per image was reduced drastically, where instead
of cropping features from the same layer where the crops
are predicted, the crops are taken from last layer of
features prior to predictions. The algorithm works faster
than Faster R-CNN while achieving comparable accuracy
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 587
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 11 | Nov -2017 www.irjet.net p-ISSN: 2395-0072
scores when run using Resnet101 as the feature extractor,
on the hindsight, it also respects translational invariance,
as it is a position sensitive cropping mechanism.
2.5 SSD(Single Shot Detector)
SSD works by converting discrete output spaces for
bounding boxes into sets of default boxes for different
aspect ratios and for every feature map location. During
predictions, the model generates the scores in each default
box for every object detected and scales the default box to
fit the object shape. SSD is simpler than other networks as
it performs all computations in a single network. SSD
combines the predictions from numerous feature maps
having different resolutions to handle objects of various
sizes. It doesn’t involve regional proposal generating or
feature resampling like previous networks did, which
makes it easy to train and integrate into systems where
detection is required.
3. EXPERIMENTAL SETUP
3.1 Feature Extractors
We will be discussing a few feature extractors to get the
gist of the entire architecture, where meta architectures
are combined with feature extractors like VGG, Resnet101
etc. The way it works is that we first apply convolutional
neural networks via feature extractors to extract all the
high level information from the images. The feature
extractors must be wisely selected as types of layers, no of
parameters directly affect the speed and memory
performance of the object detectors. We will compare four
different feature extractors to combine with our meta
architectures and test their performances. Out of the four
chosen feature extractors , apart from Mobile Net, every
other feature extractor has an open source Tensor flow
implementation, which one can use via GitHub. We
compare VGG-16, Resnet101, Inception Resnet(v2)[6]
,which combines the optimization quality of ResNet with
computational efficiency of inception modules and Mobile
Net which achieved accuracies similar to VGG-16 in Image
net competition with 1/30 of its computational cost and
model size. As the name suggests, Mobile Net because of
its low computational cost has been at the forefront of
various vision applications in mobile devices. Its building
blocks are depth wise separable convolutions which
factorize a standard convolution into a depth wise
convolution and a 1x1 convolution, effectively reducing
both computational cost and number of parameters. In R-
FCN and R-CNN , where we must choose the layer that
must be used for predicting region proposals, we have
used ‘Conv5’ layer in VGG-16, and ‘Conv_4_x’ layers in
ResNet-101, for other feature extractors we choose similar
layers. In SSD, following previous methodologies, we have
also selected the topmost convolutional feature map and a
higher resolution feature map at a lower level, then adding
a sequence of convolutional layers with spatial resolution
decaying by a factor of 2 with each additional layer used
for prediction. We have used batch normalization in all
layers of the single shot detector.
3.2 Number Of Proposals
The no of region proposals that can be sent across to the
box classifier at test time can be varied. The default
number is 300 but we have tried a range of numbers .It
was noted that there was a trade-off between computation
cost and recall, so there was a balance to be maintained.
We have tried no of boxes ranging from 10 to 300 in our
exploration.
3.3 Loss Function Configuration
In our experiments we use Argmax matching throughout
with thresholds set as suggested in the original paper for
each meta-architecture. Also, using the ratios described in
the original papers of each meta architecture , we have
fixed the ratios of positive anchors and negative anchors.
3.4 Training and Hyperparameter Tuning
For Faster R-CNN and R-FCN, we have used Stochastic
gradient descent with momentum with batch size, as the
image sizes used were different, so they were being
trained independently, for SSD we have used 32 as the
batch size as they were resized to a fixed shape. The
learning rates were tuned manually for each feature
extractor. The networks were trained on the COCO dataset
,where we held 7000 images for validation. The model was
evaluated in the official COCO API where the mAP is
measured.
3.5 Hardware
We ran the experiment on a Nvidia Titan X GPU card ,on a
32 GB RAM device with Intel Xeon E5-1650 v2 processor.
The images were resized and the dimensions were k x k,
where k is either 300 or 600 . The timings were averaged
for 450 images.
3.6 Model Details
Here, we mention the intricate details of the different
combination of meta architectures with the feature
extractors.
3.6.1 Faster R-CNN
Tensorflow’s “crop_and_resize” is used instead of standard
ROI pooling, also batch normalization is used in all
convolutional layers. Optimizer used is SGD with
momentum set to 0.9. The learning rate was set as per the
feature extractor.
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 588
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 11 | Nov -2017 www.irjet.net p-ISSN: 2395-0072
VGG-16: The initial learning rate is set to 5e-4,we extract
features from the conv5 layers, we resize feature maps to
14x14 and maxpool layers of 7x7 is used .
Resnet-101: The initial learning rate is set to 3e-4, we
extract features from the final layer of conv4 block, stride
size is of 16 pixels.
Inception Resnet: The stride size is 8 in atrous mode, and
16 otherwise. The features are extracted from the
Mixed_6a layers including the residual layers. Features
maps are 17x17 size and learning rate is 1e-3.
MobileNet: The initial learning rate is 3e-3 and stride size
is set to 16 pixels. We extract features from Conv2D_11
layers and features maps of size 14x14 are used.
3.6.2 R-FCN
The parameters set are identical to those of R-CNN with
use of crop_and_resize, SGD with momentum 0.9, use of
batch normalization,etc. R-FCN was trained using Resnet,
Inception Resnet and MobileNet[10] feature extractors.
Resnet 101: Features are extracted from block3 layer, the
stride size is set to 16. The learning rate is initially set to
3e-4 with a decaying factor reducing it 10 times after
every million steps, and the images are resized to 21x21.
Inception Resnet: The initial learning rate is set to 7e-4,
the strides are set to 16, the features are extracted from
Mixed_6a layer.
MobileNet: The initial learning rate is set to 2e-4, the
features are extracted from Conv2d_11, the stride is set to
16 pixels, and the batch size is set to 128. The activation
function of ReLU is used.
3.6.3 SSD
Batch normalization is used in all layers and the weights
are initialized with a standard deviation of 0.03. The
convolutional feature maps were added and all of them
used for prediction, with the convolutional layers being
added with a spatial resolution, decaying by a factor of 2.
VGG-16: L2 normalization was used in the conv4_3 layer,
it was used along with fc7 layers and appended with other
layers with depth 512, and 4 other layers with depth
256.We use an initial learning rate of 0.003 .
Resnet- 101: The feature map is used from the last layer
of Conv4 block, the strides used is 16. Additional layers are
appended , who have spatial resolutions of
512,256,256,256,256 respectively. An initial learning rate
of 3e-4 is used.
Inception Resnet: The initial learning rate is set to 0.005,
with a decaying factor of 0.8 after every 700k steps. The
activation function ReLU is used . Mixed_6a and
Conv2d_7b is used by appending with additional
convolutional layers of depth 512,256,128 respectively.
MobileNet: The initial learning rate is set to 0.004 and we
have used conv_11 and conv_13 layers with four
additional layers with decaying spatial resolution with
depths of 512,256,256 ,128.The activation function ReLU
is used .
4. RESULTS
In this section we compare the training and testing times,
with more focus on the testing times of the different model
combinations and what different we obtain with different
hyperparameter tuning. The GPU times that have been
clocked are mentioned. Since the feature extractors used
are pretrained on the Imagenet dataset, so one can
intuitively think that a good performance on Imagenet
dataset must correlate with a good mAP score on the
COCO dataset.
Table -1:The comparison of different feature extractor
model’s accuracy on imagenet and their max and min mAP
scores on COCO dataset
Model
Name
Max mAP
score(COCO
)
Min mAP
score(COCO
)
Imagenet
Accuracy
VGG-16 23 19.8 0.71
Resnet-101 29 22.9 0.764
Inception
Resnet
30 20 0.804
MobileNet 18.8 14 0.712
Let us get into more specifics of the process and discuss
the training and testing times of different combinations,
and later on their mAP scores as well. The fastest model
combination was SSD MobileNet trained on a low
resolution image of 300, The slowest combination was
Faster R-CNN Inception Resnet trained on a resolution of
600. The least accurate model was R-FCN MobileNet,
whereas the most accurate model was Faster R-CNN
Inception Resnet.
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 589
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 11 | Nov -2017 www.irjet.net p-ISSN: 2395-0072
Table -2: Some interesting frontiers of the models on the
test sets , in terms on their mAP score on the COCO
dataset.
Model Mini validation
mAP
Test dev mAP
SSD MobileNet(faste
st)
19.3 19
(100 proposals)
Faster
R-CNN Resnet-
101 ( good balance)
31 30.9
(300 proposals)
Faster R-CNN
Resnet-101( good
balance)
Model
33.2 33
(Most Accurate)
Faster R-CNN
Inception
Resnet
34.7 34.2
(least Accurate)
R-FCN
MobileNet
13.8 13.4
4.1 Analysis
It is noted that usually SSD and R-FCN models are faster
than their counterparts, also Faster R-CNN leads to much
slower times, taking more than 100ms per image and
giving the most accurate models of them all. Although, R-
FCN being the most accurate, they can be tuned to be
faster by changing the no of regions proposed. It is also
found that the intuition of feature extractors accuracy
correlating with the mAP scores on the COCO dataset
seems to be true only for Faster R-CNN and R-FCN as SSD
models as less reliant on the classification accuracy of
their feature extractors. The number of region proposals
for Faster R-CNN and R-FCN can be reduced without
affecting the mAP score by much, in turn saving lots of
computation. We found that , in Faster R-CNN Inception
Resnet, the mAP score with 300 proposals was 34.2, and it
gave a mAP score of 28.7 with just 10 proposals, although
the sweet spot is when we used 50 proposals as we are
able to achieve more than 93% accuracy of those trained
with 300 proposals by saving 3 times the running time. As
far as memory utilization is considered, Faster R-CNN
inception Resnet consumes the most memory whereas
SSD MobileNet consumes the least, unsurprisingly, it is
correlating with the speed of these models. The models
performed much better on bigger images with more
resolutions. In fact, SSD performed better than Faster R-
CNN and R-FCN on bigger images with light feature
extractors.
Table -3:Testing time on GPU with mAP scores of all the
combinations , the R-FCN and Faster R-CNN rows use 300
proposals in the above table.
Model Combination mAP
score
GPU
time
SSD MobileNet 19 40
SSD VGG-16 20.5 130
SSD Resnet-101 26.2 175
SSD Inception Resnet 20.3 80
Faster R-CNN MobileNet 19 118
Faster R-CNN VGG-16 24.9 250
Faster R-CNN Resnet-101 33 396
Faster R-CNN Inception Resnet 34.2 860
R-FCN MobileNet 13.4 75
R-FCN Resnet-101 30.5 386
R-FCN Inception Resnet 30.7 388
5. CONCLUSION
We have put forward a comparative study of state of the
art object detectors which use convolutional neural
networks. We have addressed their issues and
performance on a common hardware and also tested
different combinations of them, all on the COCO dataset.
We discovered that SSD performs much better with light
weight feature extractors on bigger images, competing
with the most accurate of models. We also found that
fewer proposals increase speed without compromising
much on the mAP scores.We wish to try different
combinations and find better results and sweet spots
which can be applied to specific use cases.
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 590
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 11 | Nov -2017 www.irjet.net p-ISSN: 2395-0072
REFERENCES
[1]Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu,
C.Y. and Berg, A.C., 2016,October. Ssd: Single shot multibox
detector. In European conference on computer vision (pp.
21-37). Springer, Cham.
[2]Dai, J., Li, Y., He, K. and Sun, J., 2016. R-fcn:Object
detection via region-based fully convolutional networks.
In Advances in neural information processing systems (pp.
379-387).
[3]Ren, S., He, K., Girshick, R. and Sun, J., 2015. Faster R-
CNN: Towards real-time object detection with region
proposal networks. In Advances in neural information
processing systems (pp. 91-99)
[4]Girshick, R., 2015. Fast r-cnn. In Proceedings of the
IEEE international conference on computer vision (pp.
1440-1448).
[5]Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z.,
Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M. and
Ghemawat, S., 2016. Tensorflow: Large-scale machine
learning on heterogeneous distributed systems. arXiv
preprint arXiv:1603.04467.
[6]N.Silberman and S.Guadarrama .Tf-slim: A high level
library to define complex models in tensorflow.
https://research.googleblog.com/2016/08/tf- slim-high-
level-library-to-define.html,2016. [Online;accessed-
November-2016]
[7]T. Y. Lin and P. Dollar. Ms coco api.
https://github.com/pdollar/coco, 2016 .
[8]Ren, S., He, K., Girshick, R., Zhang, X. and Sun, J., 2017.
Object detection networks on convolutional feature maps.
IEEE transactions on pattern analysis and machine
intelligence,39(7), pp.1476-1481.
[9]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You
only look once: Unified, real-time object detection.arXiv
preprint arXiv:1506.02640, 2015.
[10]A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W.
Wang,T. Weyand, M. Andreetto, and H. Adam. Mobilenets:
Efficient convolutional neural networks for mobile vision
applications. arXiv preprint arXiv:1704.04861,2017
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 591

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataYu Huang
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingYu Huang
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processingYu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIYu Huang
 
Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors IIIYu Huang
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Yu Huang
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
 
Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep LearningYu Huang
 
Fisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIFisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIYu Huang
 
Camera-based road Lane detection by deep learning III
Camera-based road Lane detection by deep learning IIICamera-based road Lane detection by deep learning III
Camera-based road Lane detection by deep learning IIIYu Huang
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in VisionSangmin Woo
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)Yu Huang
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving IYu Huang
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VYu Huang
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving IIYu Huang
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IVYu Huang
 

Was ist angesagt? (20)

Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal Data
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous Driving
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors III
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep Learning
 
Fisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIFisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving II
 
Camera-based road Lane detection by deep learning III
Camera-based road Lane detection by deep learning IIICamera-based road Lane detection by deep learning III
Camera-based road Lane detection by deep learning III
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
key.net
key.netkey.net
key.net
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving V
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 

Ähnlich wie Comparative Study of Object Detection Algorithms

IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET Journal
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...dbpublications
 
Object Single Frame Using YOLO Model
Object Single Frame Using YOLO ModelObject Single Frame Using YOLO Model
Object Single Frame Using YOLO ModelIRJET Journal
 
Automatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMAutomatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMIRJET Journal
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
IRJET- Capsearch - An Image Caption Generation Based Search
IRJET- Capsearch - An Image Caption Generation Based SearchIRJET- Capsearch - An Image Caption Generation Based Search
IRJET- Capsearch - An Image Caption Generation Based SearchIRJET Journal
 
Real time Traffic Signs Recognition using Deep Learning
Real time Traffic Signs Recognition using Deep LearningReal time Traffic Signs Recognition using Deep Learning
Real time Traffic Signs Recognition using Deep LearningIRJET Journal
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsIRJET Journal
 
Object Detection is a very powerful field.pptx
Object Detection is a very powerful field.pptxObject Detection is a very powerful field.pptx
Object Detection is a very powerful field.pptxusmanyaseen16
 
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...IRJET Journal
 
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMDEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMIRJET Journal
 
Recognition of Handwritten Mathematical Equations
Recognition of  Handwritten Mathematical EquationsRecognition of  Handwritten Mathematical Equations
Recognition of Handwritten Mathematical EquationsIRJET Journal
 
A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...
A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...
A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...IRJET Journal
 
Object gripping algorithm for robotic assistance by means of deep leaning
Object gripping algorithm for robotic assistance by means of deep leaning Object gripping algorithm for robotic assistance by means of deep leaning
Object gripping algorithm for robotic assistance by means of deep leaning IJECEIAES
 
Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...IAESIJAI
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
 
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET Journal
 
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION cscpconf
 
Text Detection and Recognition in Natural Images
Text Detection and Recognition in Natural ImagesText Detection and Recognition in Natural Images
Text Detection and Recognition in Natural ImagesIRJET Journal
 

Ähnlich wie Comparative Study of Object Detection Algorithms (20)

IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
Object Single Frame Using YOLO Model
Object Single Frame Using YOLO ModelObject Single Frame Using YOLO Model
Object Single Frame Using YOLO Model
 
Automatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMAutomatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVM
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
IRJET- Capsearch - An Image Caption Generation Based Search
IRJET- Capsearch - An Image Caption Generation Based SearchIRJET- Capsearch - An Image Caption Generation Based Search
IRJET- Capsearch - An Image Caption Generation Based Search
 
Real time Traffic Signs Recognition using Deep Learning
Real time Traffic Signs Recognition using Deep LearningReal time Traffic Signs Recognition using Deep Learning
Real time Traffic Signs Recognition using Deep Learning
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction Algorithms
 
Object Detection is a very powerful field.pptx
Object Detection is a very powerful field.pptxObject Detection is a very powerful field.pptx
Object Detection is a very powerful field.pptx
 
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
 
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTMDEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
DEEP LEARNING BASED IMAGE CAPTIONING IN REGIONAL LANGUAGE USING CNN AND LSTM
 
Recognition of Handwritten Mathematical Equations
Recognition of  Handwritten Mathematical EquationsRecognition of  Handwritten Mathematical Equations
Recognition of Handwritten Mathematical Equations
 
A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...
A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...
A REVIEW ON IMPROVING TRAFFIC-SIGN DETECTION USING YOLO ALGORITHM FOR OBJECT ...
 
Object gripping algorithm for robotic assistance by means of deep leaning
Object gripping algorithm for robotic assistance by means of deep leaning Object gripping algorithm for robotic assistance by means of deep leaning
Object gripping algorithm for robotic assistance by means of deep leaning
 
Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...Backbone search for object detection for applications in intrusion warning sy...
Backbone search for object detection for applications in intrusion warning sy...
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
 
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
 
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
 
Text Detection and Recognition in Natural Images
Text Detection and Recognition in Natural ImagesText Detection and Recognition in Natural Images
Text Detection and Recognition in Natural Images
 

Mehr von IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

Mehr von IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Kürzlich hochgeladen

Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 

Kürzlich hochgeladen (20)

Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 

Comparative Study of Object Detection Algorithms

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 11 | Nov -2017 www.irjet.net p-ISSN: 2395-0072 Comparative Study of Object Detection Algorithms Nikhil Yadav1 , Utkarsh Binay2 1Student,Computer Engineering Department, Rizvi College of Engineering, Maharashtra, India 2Student, Computer Science Department, Mukesh Patel School of Technology And Management, Maharashtra, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - This paper aims to find the best possible combination of speed and accuracy while comparing different object detection algorithms that use convolutional neural networks to perform object detection. Models which are usually computationally expensive, achieve the best accuracy, but cannot be deployed in a simple setting with limited resources, whereas, faster models fail to achieve similar results compared to their bigger and more memory intensive counterparts. We discuss two ends of a spectrum here comparing three different models ,i.e Single Shot Detector (SSD) [1], Faster R-CNN (Region - based Convolutional Neural Networks)[3], R-FCN(Region based-Fully Convolutional Networks)[2], where on one end we get a model which can be deployed on a mobile device because of its speed and another model which is at the cutting edge of performance when it comes to accuracy. These models are trained and their performance metrics tested on the COCO ( common objects in context) dataset. Key Words: COCO dataset, Bounding boxes, Single Shot Detector, Inception Resnet, Resnet-101, Mobile Net, VGG- 16(Visual Geometry Group),Faster R-CNN, R-FCN, Hyper parameter tuning, mAP( mean average precision) 1.INTRODUCTION State of the art computer vision systems have involved a range of object detection models that use convolutional neural networks in their working. The deployment of these models like YOLO[9], SSD[1], R-FCN[2],R-CNN, etc depends on the kind of usage we want. Google photos has deployed models like SSD Mobile Net which is known for its speed and isn’t memory intensive, here performance isn’t the most important factor ,but memory efficiency is. In case of self driving cars though, the requirement for an extremely accurate model is a priority, as these real time systems are performing tasks which can lead to a life and death situation in their surrounding. We need to evaluate these models using several evaluation metrics like mAP, memory requirement ,testing time. We have only considered models which are quite similar in their architecture on a high level overview. The models that are considered are Faster R-CNN[3], R-FCN[2], SSD[1] , these models have used sliding window style predictions and have only used a single CNN network. These models are responsible for the object detection part, whereas the feature extraction part is carried out by the different image classification models , the state of the art ones which have participated in the image net competitions. We refer to the object detection models as meta architectures and they are coupled with different feature extractors for eg( VGG, Resnet, Mobile Net[10]), to evaluate the various combinations we get through them. We would first describe the different meta architectures and then the different feature extractors, and then compare their combinations by testing them on the objects present in the COCO dataset. In this paper we have tried to perform this experiment in controlled conditions with same hardware conditions for comparison and run these tests multiple times so that they prove to be statistically consistent. We have used the deep learning library Tensorflow[5] for our implementation. Previous experiments by others have been performed on caffe, and tried on other standard datasets like PASCAL VOC(Visual Object Classes). A few have fine tuned the detectors for their specific objects and noted the test times for different models. The main reason for using tensor flow[5] was its portability and ease of use. Fig-1: An example of an object detector in use 2. META ARCHITECTURE Modern meta architectures all use CNN for object detection , let us have a look at the history of a few meta architectures that we are discussing, and have an in depth look at their inner working. © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 586
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 11 | Nov -2017 www.irjet.net p-ISSN: 2395-0072 2.1 R-CNN The R-CNN model was one of the first models to use convolutional neural networks for object detection. The goal of R-CNN is to take in an image, and correctly identify where the main objects (via a bounding box) in the image. But how do we find out where these bounding boxes are? R-CNN does what we might intuitively do as well -propose a bunch of boxes in the image and see if any of them actually correspond to an object. R-CNN creates these bounding boxes, or region proposals using a process called Selective Search. Selective search, looks at the image through windows of different sizes, and for each size tries to group together adjacent pixels by texture, color, or intensity to identify objects. Once the proposals are created, R-CNN warps the region to a standard square size and passes it through to a feature extractor or image classifier , which is a CNN. On the final layer of the CNN, R- CNN adds a Support Vector Machine (SVM) that simply classifies whether this is an object, and if yes, which object is it. 2.2 Fast R-CNN R-CNN works really well, but is really quite slow for a few simple reasons It requires a forward pass of the CNN for every single region proposal for every single image. It has to train three different models separately - the CNN to generate image features, the classifier that predicts the class, and the regression model to tighten the bounding boxes. This makes the pipeline extremely hard to train. Both these problems were solved in Fast R-CNN[4] by the creator of R-CNN himself. For the forward pass of the CNN, Girshick realized that for each image, a lot of proposed regions for the image invariably overlapped causing us to run the same CNN computation again and again . His insight was simple — Why not run the CNN just once per image and then find a way to share that computation across the proposals? .This is exactly what Fast R-CNN does using a technique known as RoI Pool (Region of Interest Pooling). At its core, RoI Pool shares the forward pass of a CNN for an image across its subregions. In the image above, notice how the CNN features for each region are obtained by selecting a corresponding region from the CNN’s feature map. Then, the features in each region are pooled (usually using max pooling). So all it takes us is one pass of the original image .The second insight of Fast R- CNN is to jointly train the CNN, classifier, and bounding box regressor in a single model. Where earlier we had different models to extract image features (CNN), classify (SVM), and tighten bounding boxes (regressor),Fast R- CNN[4] instead used a single network to compute all three. Fast R-CNN replaced the SVM classifier with a softmax layer on top of the CNN to output a classification. It also added a linear regression layer parallel to the softmax layer to output bounding box coordinates. In this way, all the outputs needed came from one single network. 2.3 Faster R-CNN There was still one bottleneck with Fast R-CNN which had to be sorted, that was the region proposer. The very first step to detecting the locations of objects is generating a bunch of potential bounding boxes or regions of interest to test. In Fast R-CNN, these proposals were created using Selective Search, a fairly slow process that was found to be the bottleneck of the overall process. Faster R-CNN[3] found a way a to make the step of region proposal almost cost free. The insight of Faster R-CNN was that region proposals depended on features of the image that were already calculated with the forward pass of the CNN (first step of classification). So why not reuse those same CNN results for region proposals instead of running a separate selective search algorithm? Indeed, this is just what the Faster R-CNN team achieved. In this model, a single CNN is used to both carry out region proposals and classification. This way,only one CNN needs to be trained and we get region proposals almost for free. How the Regions are Generated? Let’s take a moment to see how Faster R-CNN generates these region proposals from CNN features. Faster R-CNN adds a Fully Convolutional Network on top of the features of the CNN creating what’s known as the Region Proposal Network. Fig-2 : The Region Proposal Network slides a window over the features of the CNN. At each window location, the network outputs a score and a bounding box per anchor (hence 4k box coordinates where k is the number of anchors) 2.4 R-FCN With the increase in performance, Faster R-CNN was an order of magnitude swifter than its earlier counterpart fast R-CNN .But there was issue of applying the region- specific component had to be applied several times in an image, this issue was resolved in R-FCN (Region based Fully Convolutional networks) where the computation required per image was reduced drastically, where instead of cropping features from the same layer where the crops are predicted, the crops are taken from last layer of features prior to predictions. The algorithm works faster than Faster R-CNN while achieving comparable accuracy © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 587
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 11 | Nov -2017 www.irjet.net p-ISSN: 2395-0072 scores when run using Resnet101 as the feature extractor, on the hindsight, it also respects translational invariance, as it is a position sensitive cropping mechanism. 2.5 SSD(Single Shot Detector) SSD works by converting discrete output spaces for bounding boxes into sets of default boxes for different aspect ratios and for every feature map location. During predictions, the model generates the scores in each default box for every object detected and scales the default box to fit the object shape. SSD is simpler than other networks as it performs all computations in a single network. SSD combines the predictions from numerous feature maps having different resolutions to handle objects of various sizes. It doesn’t involve regional proposal generating or feature resampling like previous networks did, which makes it easy to train and integrate into systems where detection is required. 3. EXPERIMENTAL SETUP 3.1 Feature Extractors We will be discussing a few feature extractors to get the gist of the entire architecture, where meta architectures are combined with feature extractors like VGG, Resnet101 etc. The way it works is that we first apply convolutional neural networks via feature extractors to extract all the high level information from the images. The feature extractors must be wisely selected as types of layers, no of parameters directly affect the speed and memory performance of the object detectors. We will compare four different feature extractors to combine with our meta architectures and test their performances. Out of the four chosen feature extractors , apart from Mobile Net, every other feature extractor has an open source Tensor flow implementation, which one can use via GitHub. We compare VGG-16, Resnet101, Inception Resnet(v2)[6] ,which combines the optimization quality of ResNet with computational efficiency of inception modules and Mobile Net which achieved accuracies similar to VGG-16 in Image net competition with 1/30 of its computational cost and model size. As the name suggests, Mobile Net because of its low computational cost has been at the forefront of various vision applications in mobile devices. Its building blocks are depth wise separable convolutions which factorize a standard convolution into a depth wise convolution and a 1x1 convolution, effectively reducing both computational cost and number of parameters. In R- FCN and R-CNN , where we must choose the layer that must be used for predicting region proposals, we have used ‘Conv5’ layer in VGG-16, and ‘Conv_4_x’ layers in ResNet-101, for other feature extractors we choose similar layers. In SSD, following previous methodologies, we have also selected the topmost convolutional feature map and a higher resolution feature map at a lower level, then adding a sequence of convolutional layers with spatial resolution decaying by a factor of 2 with each additional layer used for prediction. We have used batch normalization in all layers of the single shot detector. 3.2 Number Of Proposals The no of region proposals that can be sent across to the box classifier at test time can be varied. The default number is 300 but we have tried a range of numbers .It was noted that there was a trade-off between computation cost and recall, so there was a balance to be maintained. We have tried no of boxes ranging from 10 to 300 in our exploration. 3.3 Loss Function Configuration In our experiments we use Argmax matching throughout with thresholds set as suggested in the original paper for each meta-architecture. Also, using the ratios described in the original papers of each meta architecture , we have fixed the ratios of positive anchors and negative anchors. 3.4 Training and Hyperparameter Tuning For Faster R-CNN and R-FCN, we have used Stochastic gradient descent with momentum with batch size, as the image sizes used were different, so they were being trained independently, for SSD we have used 32 as the batch size as they were resized to a fixed shape. The learning rates were tuned manually for each feature extractor. The networks were trained on the COCO dataset ,where we held 7000 images for validation. The model was evaluated in the official COCO API where the mAP is measured. 3.5 Hardware We ran the experiment on a Nvidia Titan X GPU card ,on a 32 GB RAM device with Intel Xeon E5-1650 v2 processor. The images were resized and the dimensions were k x k, where k is either 300 or 600 . The timings were averaged for 450 images. 3.6 Model Details Here, we mention the intricate details of the different combination of meta architectures with the feature extractors. 3.6.1 Faster R-CNN Tensorflow’s “crop_and_resize” is used instead of standard ROI pooling, also batch normalization is used in all convolutional layers. Optimizer used is SGD with momentum set to 0.9. The learning rate was set as per the feature extractor. © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 588
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 11 | Nov -2017 www.irjet.net p-ISSN: 2395-0072 VGG-16: The initial learning rate is set to 5e-4,we extract features from the conv5 layers, we resize feature maps to 14x14 and maxpool layers of 7x7 is used . Resnet-101: The initial learning rate is set to 3e-4, we extract features from the final layer of conv4 block, stride size is of 16 pixels. Inception Resnet: The stride size is 8 in atrous mode, and 16 otherwise. The features are extracted from the Mixed_6a layers including the residual layers. Features maps are 17x17 size and learning rate is 1e-3. MobileNet: The initial learning rate is 3e-3 and stride size is set to 16 pixels. We extract features from Conv2D_11 layers and features maps of size 14x14 are used. 3.6.2 R-FCN The parameters set are identical to those of R-CNN with use of crop_and_resize, SGD with momentum 0.9, use of batch normalization,etc. R-FCN was trained using Resnet, Inception Resnet and MobileNet[10] feature extractors. Resnet 101: Features are extracted from block3 layer, the stride size is set to 16. The learning rate is initially set to 3e-4 with a decaying factor reducing it 10 times after every million steps, and the images are resized to 21x21. Inception Resnet: The initial learning rate is set to 7e-4, the strides are set to 16, the features are extracted from Mixed_6a layer. MobileNet: The initial learning rate is set to 2e-4, the features are extracted from Conv2d_11, the stride is set to 16 pixels, and the batch size is set to 128. The activation function of ReLU is used. 3.6.3 SSD Batch normalization is used in all layers and the weights are initialized with a standard deviation of 0.03. The convolutional feature maps were added and all of them used for prediction, with the convolutional layers being added with a spatial resolution, decaying by a factor of 2. VGG-16: L2 normalization was used in the conv4_3 layer, it was used along with fc7 layers and appended with other layers with depth 512, and 4 other layers with depth 256.We use an initial learning rate of 0.003 . Resnet- 101: The feature map is used from the last layer of Conv4 block, the strides used is 16. Additional layers are appended , who have spatial resolutions of 512,256,256,256,256 respectively. An initial learning rate of 3e-4 is used. Inception Resnet: The initial learning rate is set to 0.005, with a decaying factor of 0.8 after every 700k steps. The activation function ReLU is used . Mixed_6a and Conv2d_7b is used by appending with additional convolutional layers of depth 512,256,128 respectively. MobileNet: The initial learning rate is set to 0.004 and we have used conv_11 and conv_13 layers with four additional layers with decaying spatial resolution with depths of 512,256,256 ,128.The activation function ReLU is used . 4. RESULTS In this section we compare the training and testing times, with more focus on the testing times of the different model combinations and what different we obtain with different hyperparameter tuning. The GPU times that have been clocked are mentioned. Since the feature extractors used are pretrained on the Imagenet dataset, so one can intuitively think that a good performance on Imagenet dataset must correlate with a good mAP score on the COCO dataset. Table -1:The comparison of different feature extractor model’s accuracy on imagenet and their max and min mAP scores on COCO dataset Model Name Max mAP score(COCO ) Min mAP score(COCO ) Imagenet Accuracy VGG-16 23 19.8 0.71 Resnet-101 29 22.9 0.764 Inception Resnet 30 20 0.804 MobileNet 18.8 14 0.712 Let us get into more specifics of the process and discuss the training and testing times of different combinations, and later on their mAP scores as well. The fastest model combination was SSD MobileNet trained on a low resolution image of 300, The slowest combination was Faster R-CNN Inception Resnet trained on a resolution of 600. The least accurate model was R-FCN MobileNet, whereas the most accurate model was Faster R-CNN Inception Resnet. © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 589
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 11 | Nov -2017 www.irjet.net p-ISSN: 2395-0072 Table -2: Some interesting frontiers of the models on the test sets , in terms on their mAP score on the COCO dataset. Model Mini validation mAP Test dev mAP SSD MobileNet(faste st) 19.3 19 (100 proposals) Faster R-CNN Resnet- 101 ( good balance) 31 30.9 (300 proposals) Faster R-CNN Resnet-101( good balance) Model 33.2 33 (Most Accurate) Faster R-CNN Inception Resnet 34.7 34.2 (least Accurate) R-FCN MobileNet 13.8 13.4 4.1 Analysis It is noted that usually SSD and R-FCN models are faster than their counterparts, also Faster R-CNN leads to much slower times, taking more than 100ms per image and giving the most accurate models of them all. Although, R- FCN being the most accurate, they can be tuned to be faster by changing the no of regions proposed. It is also found that the intuition of feature extractors accuracy correlating with the mAP scores on the COCO dataset seems to be true only for Faster R-CNN and R-FCN as SSD models as less reliant on the classification accuracy of their feature extractors. The number of region proposals for Faster R-CNN and R-FCN can be reduced without affecting the mAP score by much, in turn saving lots of computation. We found that , in Faster R-CNN Inception Resnet, the mAP score with 300 proposals was 34.2, and it gave a mAP score of 28.7 with just 10 proposals, although the sweet spot is when we used 50 proposals as we are able to achieve more than 93% accuracy of those trained with 300 proposals by saving 3 times the running time. As far as memory utilization is considered, Faster R-CNN inception Resnet consumes the most memory whereas SSD MobileNet consumes the least, unsurprisingly, it is correlating with the speed of these models. The models performed much better on bigger images with more resolutions. In fact, SSD performed better than Faster R- CNN and R-FCN on bigger images with light feature extractors. Table -3:Testing time on GPU with mAP scores of all the combinations , the R-FCN and Faster R-CNN rows use 300 proposals in the above table. Model Combination mAP score GPU time SSD MobileNet 19 40 SSD VGG-16 20.5 130 SSD Resnet-101 26.2 175 SSD Inception Resnet 20.3 80 Faster R-CNN MobileNet 19 118 Faster R-CNN VGG-16 24.9 250 Faster R-CNN Resnet-101 33 396 Faster R-CNN Inception Resnet 34.2 860 R-FCN MobileNet 13.4 75 R-FCN Resnet-101 30.5 386 R-FCN Inception Resnet 30.7 388 5. CONCLUSION We have put forward a comparative study of state of the art object detectors which use convolutional neural networks. We have addressed their issues and performance on a common hardware and also tested different combinations of them, all on the COCO dataset. We discovered that SSD performs much better with light weight feature extractors on bigger images, competing with the most accurate of models. We also found that fewer proposals increase speed without compromising much on the mAP scores.We wish to try different combinations and find better results and sweet spots which can be applied to specific use cases. © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 590
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 11 | Nov -2017 www.irjet.net p-ISSN: 2395-0072 REFERENCES [1]Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y. and Berg, A.C., 2016,October. Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham. [2]Dai, J., Li, Y., He, K. and Sun, J., 2016. R-fcn:Object detection via region-based fully convolutional networks. In Advances in neural information processing systems (pp. 379-387). [3]Ren, S., He, K., Girshick, R. and Sun, J., 2015. Faster R- CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99) [4]Girshick, R., 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448). [5]Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M. and Ghemawat, S., 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467. [6]N.Silberman and S.Guadarrama .Tf-slim: A high level library to define complex models in tensorflow. https://research.googleblog.com/2016/08/tf- slim-high- level-library-to-define.html,2016. [Online;accessed- November-2016] [7]T. Y. Lin and P. Dollar. Ms coco api. https://github.com/pdollar/coco, 2016 . [8]Ren, S., He, K., Girshick, R., Zhang, X. and Sun, J., 2017. Object detection networks on convolutional feature maps. IEEE transactions on pattern analysis and machine intelligence,39(7), pp.1476-1481. [9]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection.arXiv preprint arXiv:1506.02640, 2015. [10]A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861,2017 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 591