SlideShare a Scribd company logo
1 of 25
DAVE: A Unified Framework for Fast
Vehicle Detection and Annotation
YI ZHOU1(B), LI LIU1, LING SHAO1, AND MATT MELLOR2
NORTHUMBRIA UNIVERSITY, NEWCASTLE UPON TYNE NE1 8ST, UK
Introduction and Related Work
Automatic analysis of urban traffic activities is an urgent need due to essential traffic
management and increased vehicle violations
In these research areas, efficient and accurate vehicle detection and attributes annotation is the
most important component of traffic surveillance systems.
Vehicle detection is a fundamental objective of traffic surveillance
Introduction and Related Work
Traditional vehicle detection methods can be categorized into frame-based and motion-based
approaches
For motion-based approaches
◦ frames subtraction [3]
◦ adaptive background modeling [4]
◦ optical flow [5, 6] are often utilized
Introduction and Related Work
To achieve higher detection performance
◦ recently, the deformable part-based model (DPM) [7] employs a star-structured architecture consisting
of root and parts filters with associated deformation models for object detection
◦ DPM can successfully handle deformable object detection even when the target is partially occluded.
◦ However, it leads to heavy computational costs due to the use of the sliding window procedure for
appearance features extraction
Introduction and Related Work
a Region-based CNN (RCNN) [13]
◦ combines object proposals, CNN learned features and an SVM classifier for accurate object detection
◦ To further increase the detection speed and accuracy
◦ Fast RCNN [14] adopts a region of interest (ROI) pooling layer and the multi-task loss to estimate object classes while predicting
bounding-box positions
◦ proposal methods such as Selective Search [15] and Edgeboxes [16] can be introduced in RCNN and Fast RCNN to improve the
efficiency compared to the traditional sliding window fashion
All these deep models target general object detection. In our task, we aim for real-time
detection of a special object type, vehicle.
◦ vehicle attributes annotation
◦ travel direction (i.e., pose)
◦ inherent color, type
◦ other more fine-grained information with respect to the headlight, grille and wheel
Introduction and Related Work
Lin et al. [18] presents an auto-masking neural network for vehicle detection and pose
estimation
In [19], an approach by vector matching of template is introduced for vehicle color recognition
However, independent analysis of different attributes makes visual information not well
explored and the process inefficient, and little work has been done for annotating these vehicle
attributes simultaneously
Therefore, we believe multi-task learning [21, 22] can be helpful since such joint learning
schemes can implicitly learn the common features shared by these correlated tasks
Introduction and Related Work
contributions
We unify multiple vehicle-related tasks into one deep vehicle annotation framework DAVE which
can effectively and efficiently annotate each vehicle’s bounding-box, pose, color and type
simultaneously.
Two CNNs proposed in our method, i.e., the Fast Vehicle Proposal Network (FVPN) and the
vehicle Attributes Learning Network (ALN), are optimized in a joint manner by bridging two
CNNs with latent data-driven knowledge guidance. In this way, the deeper ALN can benefit the
performance of the shallow FVPN.
We introduce a new Urban Traffic Surveillance (UTS) vehicle dataset consisting of six 1920 ×
1080 (FHD) resolution videos with varying illumination conditions and viewpoints.
Detection and Annotation for Vehicles
(DAVE)
DAVE
Fast Vehicle Proposal Network (FVPN)
Searching the whole image to locate potential vehicle positions in a sliding window fashion is
prohibitive for real-time applications
Particularly for one specific object, we expect very fast and accurate proposal performance can
be achieved
fast vehicle proposal network (FVPN)
◦ a shallow fully convolutional network
◦ RPN과 사실상 동일(Net구조가 얕고 feature sharing을 하지 않을 뿐)
Attributes Learning Network (ALN)
Modeling vehicle pose estimation, color recognition and type classification separately is less
accurate and inefficient
The network architecture of the ALN is mainly inspired by the GoogLeNet [26] model
Specifically, we design the ALN by adding 4 fully connected layers to extend the GoogLeNet into
a multi- attribute learning model.
The ALN is a multi-task network optimized with four softmax loss layers for vehicle annotation
tasks
Deep Nets Training
Training Dataset and Data Augmentation.
◦ We adopt the large-scale CompCars dataset [28] with more than 100,000 web-nature data as the
positive training samples which are annotated with tight bounding-boxes and rich vehicle attributes
such as pose, type, make and model.
◦ In detail, the web-nature part of the CompCars dataset provides
◦ five viewpoints as front, rear, side, frontside and rearside
◦ twelve vehicle types as MPV, SUV, sedan, hatchback, minibus, pickup, fastback, estate, hardtop-
convertible, sports, crossover and convertible.
◦ since color is another important attribute of a vehicle, we additionally annotate colors on more than
10,000 images with five common vehicle colors as black, white, silver, red and blue to train our final
model
Deep Nets Training
Deep Nets Training
For data augmentation
◦ we first triple the training data with increased and decreased image intensities for making our DAVE
more robust under different lighting conditions.
◦ In addition, image downsampling up to 20 % of the original size and image blurring are introduced to
enable that detected vehicles with small sizes can be even annotated as precisely as possible.
Deep Nets Training
Jointly Training with Latent Knowledge Guidance
◦ The input resolution of the ALN is 224 × 224 for fine-grained vehicle attributes learning,
◦ while it is decreased to 60×60 for the FVPN to fit smaller scales of the test image pyramid for efficiency
in the inference phase.
◦ The pre-trained GoogLeNet model for 1000-class ImageNet classification is used to initialize all the
convolutional layers in the ALN, while the FVPN is trained from scratch
Deep Nets Training
Two-Stage Deep Nets Inference
◦ Vehicle proposals inferred from the FVPN are taken as inputs into the ALN
◦ Although verifying each proposal and annotation of attributes are at the same stage, we assume that
verification has a higher priority.
◦ For instance, in the inference phase, if a proposal is predicted as a positive vehicle, it will then be
annotated with a bounding-box and inferred pose, color and type. However, a proposal predicted as the
background will be neglected in spite of its inferred attributes
◦ Finally, we perform non-maximum suppression as in RCNN [13] to eliminate duplicate detections. The
full inference scheme is demonstrated in Fig. 4
Deep Nets Training
Experiments and Results
In this section, we evaluate our DAVE for detection and annotation of pose, color and type for
each detected vehicle.
Experiments are mainly divided as two parts:
◦ vehicle detection
◦ Attributes learning
DAVE is implemented based on the deep learning framework Caffe [29] and run on a
workstation configured with an NVIDIA TITAN X GPU.
Experiments and Results
Experiments and Results
Experiments and Results
Experiments and Results
Experiments and Results
Experiments and Results
Conclusion
In this paper, we developed a unified framework for fast vehicle detection and annotation: DAVE
◦ which consists of two convolutional neural networks FVPN and ALN
◦ The proposal and attributes learning networks predict bounding- boxes for vehicles and infer their
attributes: pose, color and type, simultaneously
◦ Extensive experiments show that our method outperforms state-of-the-art frame- works on vehicle
detection and is also effective for vehicle attributes annotation
In our on-going work,
◦ we are integrating more vehicle attributes such as make and model into the ALN with high accuracy, and
exploiting these attributes to investigate vehicle re-identification tasks.

More Related Content

What's hot

Automatic road environment classification 20121002
Automatic road environment classification 20121002Automatic road environment classification 20121002
Automatic road environment classification 20121002
es712
 

What's hot (20)

Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep Learning
 
Lane Detection
Lane DetectionLane Detection
Lane Detection
 
Traffic sign detection via graph based ranking and segmentation
Traffic sign detection via graph based ranking and segmentationTraffic sign detection via graph based ranking and segmentation
Traffic sign detection via graph based ranking and segmentation
 
Lane detection by use of canny edge
Lane detection by use of canny edgeLane detection by use of canny edge
Lane detection by use of canny edge
 
Driving Behavior for ADAS and Autonomous Driving V
Driving Behavior for ADAS and Autonomous Driving VDriving Behavior for ADAS and Autonomous Driving V
Driving Behavior for ADAS and Autonomous Driving V
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VII
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
YUV, Y CB CR and Subsampling
YUV, Y CB CR and SubsamplingYUV, Y CB CR and Subsampling
YUV, Y CB CR and Subsampling
 
Improved Alpha-Tested Magnification for Vector Textures and Special Effects
Improved Alpha-Tested Magnification for Vector Textures and Special EffectsImproved Alpha-Tested Magnification for Vector Textures and Special Effects
Improved Alpha-Tested Magnification for Vector Textures and Special Effects
 
[Paper research] GOSELO: for Robot navigation using Reactive neural networks
[Paper research] GOSELO: for Robot navigation using Reactive neural networks[Paper research] GOSELO: for Robot navigation using Reactive neural networks
[Paper research] GOSELO: for Robot navigation using Reactive neural networks
 
Image processing on matlab presentation
Image processing on matlab presentationImage processing on matlab presentation
Image processing on matlab presentation
 
Implementing Neural Style Transfer
Implementing Neural Style Transfer Implementing Neural Style Transfer
Implementing Neural Style Transfer
 
Automatic road environment classification 20121002
Automatic road environment classification 20121002Automatic road environment classification 20121002
Automatic road environment classification 20121002
 
Al4103216222
Al4103216222Al4103216222
Al4103216222
 
Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors III
 
Advanced Lane Finding
Advanced Lane FindingAdvanced Lane Finding
Advanced Lane Finding
 
Survey on Local Color Image Descriptors
Survey on Local Color Image DescriptorsSurvey on Local Color Image Descriptors
Survey on Local Color Image Descriptors
 
Driving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIDriving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving III
 
Driving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIIIDriving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIII
 
Fisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIFisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving II
 

Similar to Dave

deep-reinforcement-learning-framework.pdf
deep-reinforcement-learning-framework.pdfdeep-reinforcement-learning-framework.pdf
deep-reinforcement-learning-framework.pdf
Yugank Aman
 
A hierarchical RCNN for vehicle and vehicle license plate detection and recog...
A hierarchical RCNN for vehicle and vehicle license plate detection and recog...A hierarchical RCNN for vehicle and vehicle license plate detection and recog...
A hierarchical RCNN for vehicle and vehicle license plate detection and recog...
IJECEIAES
 

Similar to Dave (20)

deep-reinforcement-learning-framework.pdf
deep-reinforcement-learning-framework.pdfdeep-reinforcement-learning-framework.pdf
deep-reinforcement-learning-framework.pdf
 
ObjectDetection.pptx
ObjectDetection.pptxObjectDetection.pptx
ObjectDetection.pptx
 
Automatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMAutomatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVM
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
self driving car nvidia whitepaper
self driving car nvidia whitepaperself driving car nvidia whitepaper
self driving car nvidia whitepaper
 
IRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep LearningIRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep Learning
 
An Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image ProcessingAn Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image Processing
 
Identifying Parking Spots from Surveillance Cameras using CNN
Identifying Parking Spots from Surveillance Cameras using CNNIdentifying Parking Spots from Surveillance Cameras using CNN
Identifying Parking Spots from Surveillance Cameras using CNN
 
VEHICLE DETECTION USING YOLO V3 FOR COUNTING THE VEHICLES AND TRAFFIC ANALYSIS
VEHICLE DETECTION USING YOLO V3 FOR COUNTING THE VEHICLES AND TRAFFIC ANALYSISVEHICLE DETECTION USING YOLO V3 FOR COUNTING THE VEHICLES AND TRAFFIC ANALYSIS
VEHICLE DETECTION USING YOLO V3 FOR COUNTING THE VEHICLES AND TRAFFIC ANALYSIS
 
CAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNINGCAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNING
 
A hierarchical RCNN for vehicle and vehicle license plate detection and recog...
A hierarchical RCNN for vehicle and vehicle license plate detection and recog...A hierarchical RCNN for vehicle and vehicle license plate detection and recog...
A hierarchical RCNN for vehicle and vehicle license plate detection and recog...
 
Self Driving Car
Self Driving CarSelf Driving Car
Self Driving Car
 
Malaysian car plate localization using region-based convolutional neural network
Malaysian car plate localization using region-based convolutional neural networkMalaysian car plate localization using region-based convolutional neural network
Malaysian car plate localization using region-based convolutional neural network
 
Vision-Based Motorcycle Crash Detection and Reporting Using Deep Learning
Vision-Based Motorcycle Crash Detection and Reporting Using Deep LearningVision-Based Motorcycle Crash Detection and Reporting Using Deep Learning
Vision-Based Motorcycle Crash Detection and Reporting Using Deep Learning
 
Accident vehicle types classification: a comparative study between different...
Accident vehicle types classification: a comparative study  between different...Accident vehicle types classification: a comparative study  between different...
Accident vehicle types classification: a comparative study between different...
 
Predicting Steering Angle for Self Driving Vehicles
Predicting Steering Angle for Self Driving VehiclesPredicting Steering Angle for Self Driving Vehicles
Predicting Steering Angle for Self Driving Vehicles
 
Self-Driving Car to Drive Autonomously using Image Processing and Deep Learning
Self-Driving Car to Drive Autonomously using Image Processing and Deep LearningSelf-Driving Car to Drive Autonomously using Image Processing and Deep Learning
Self-Driving Car to Drive Autonomously using Image Processing and Deep Learning
 
Object Detection for Autonomous Cars using AI/ML
Object Detection for Autonomous Cars using AI/MLObject Detection for Autonomous Cars using AI/ML
Object Detection for Autonomous Cars using AI/ML
 
IRJET- Self Driving RC Car using Behavioral Cloning
IRJET-  	  Self Driving RC Car using Behavioral CloningIRJET-  	  Self Driving RC Car using Behavioral Cloning
IRJET- Self Driving RC Car using Behavioral Cloning
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Recently uploaded (20)

SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

Dave

  • 1. DAVE: A Unified Framework for Fast Vehicle Detection and Annotation YI ZHOU1(B), LI LIU1, LING SHAO1, AND MATT MELLOR2 NORTHUMBRIA UNIVERSITY, NEWCASTLE UPON TYNE NE1 8ST, UK
  • 2. Introduction and Related Work Automatic analysis of urban traffic activities is an urgent need due to essential traffic management and increased vehicle violations In these research areas, efficient and accurate vehicle detection and attributes annotation is the most important component of traffic surveillance systems. Vehicle detection is a fundamental objective of traffic surveillance
  • 3. Introduction and Related Work Traditional vehicle detection methods can be categorized into frame-based and motion-based approaches For motion-based approaches ◦ frames subtraction [3] ◦ adaptive background modeling [4] ◦ optical flow [5, 6] are often utilized
  • 4. Introduction and Related Work To achieve higher detection performance ◦ recently, the deformable part-based model (DPM) [7] employs a star-structured architecture consisting of root and parts filters with associated deformation models for object detection ◦ DPM can successfully handle deformable object detection even when the target is partially occluded. ◦ However, it leads to heavy computational costs due to the use of the sliding window procedure for appearance features extraction
  • 5. Introduction and Related Work a Region-based CNN (RCNN) [13] ◦ combines object proposals, CNN learned features and an SVM classifier for accurate object detection ◦ To further increase the detection speed and accuracy ◦ Fast RCNN [14] adopts a region of interest (ROI) pooling layer and the multi-task loss to estimate object classes while predicting bounding-box positions ◦ proposal methods such as Selective Search [15] and Edgeboxes [16] can be introduced in RCNN and Fast RCNN to improve the efficiency compared to the traditional sliding window fashion All these deep models target general object detection. In our task, we aim for real-time detection of a special object type, vehicle. ◦ vehicle attributes annotation ◦ travel direction (i.e., pose) ◦ inherent color, type ◦ other more fine-grained information with respect to the headlight, grille and wheel
  • 6. Introduction and Related Work Lin et al. [18] presents an auto-masking neural network for vehicle detection and pose estimation In [19], an approach by vector matching of template is introduced for vehicle color recognition However, independent analysis of different attributes makes visual information not well explored and the process inefficient, and little work has been done for annotating these vehicle attributes simultaneously Therefore, we believe multi-task learning [21, 22] can be helpful since such joint learning schemes can implicitly learn the common features shared by these correlated tasks
  • 8. contributions We unify multiple vehicle-related tasks into one deep vehicle annotation framework DAVE which can effectively and efficiently annotate each vehicle’s bounding-box, pose, color and type simultaneously. Two CNNs proposed in our method, i.e., the Fast Vehicle Proposal Network (FVPN) and the vehicle Attributes Learning Network (ALN), are optimized in a joint manner by bridging two CNNs with latent data-driven knowledge guidance. In this way, the deeper ALN can benefit the performance of the shallow FVPN. We introduce a new Urban Traffic Surveillance (UTS) vehicle dataset consisting of six 1920 × 1080 (FHD) resolution videos with varying illumination conditions and viewpoints.
  • 9. Detection and Annotation for Vehicles (DAVE) DAVE
  • 10. Fast Vehicle Proposal Network (FVPN) Searching the whole image to locate potential vehicle positions in a sliding window fashion is prohibitive for real-time applications Particularly for one specific object, we expect very fast and accurate proposal performance can be achieved fast vehicle proposal network (FVPN) ◦ a shallow fully convolutional network ◦ RPN과 사실상 동일(Net구조가 얕고 feature sharing을 하지 않을 뿐)
  • 11. Attributes Learning Network (ALN) Modeling vehicle pose estimation, color recognition and type classification separately is less accurate and inefficient The network architecture of the ALN is mainly inspired by the GoogLeNet [26] model Specifically, we design the ALN by adding 4 fully connected layers to extend the GoogLeNet into a multi- attribute learning model. The ALN is a multi-task network optimized with four softmax loss layers for vehicle annotation tasks
  • 12. Deep Nets Training Training Dataset and Data Augmentation. ◦ We adopt the large-scale CompCars dataset [28] with more than 100,000 web-nature data as the positive training samples which are annotated with tight bounding-boxes and rich vehicle attributes such as pose, type, make and model. ◦ In detail, the web-nature part of the CompCars dataset provides ◦ five viewpoints as front, rear, side, frontside and rearside ◦ twelve vehicle types as MPV, SUV, sedan, hatchback, minibus, pickup, fastback, estate, hardtop- convertible, sports, crossover and convertible. ◦ since color is another important attribute of a vehicle, we additionally annotate colors on more than 10,000 images with five common vehicle colors as black, white, silver, red and blue to train our final model
  • 14. Deep Nets Training For data augmentation ◦ we first triple the training data with increased and decreased image intensities for making our DAVE more robust under different lighting conditions. ◦ In addition, image downsampling up to 20 % of the original size and image blurring are introduced to enable that detected vehicles with small sizes can be even annotated as precisely as possible.
  • 15. Deep Nets Training Jointly Training with Latent Knowledge Guidance ◦ The input resolution of the ALN is 224 × 224 for fine-grained vehicle attributes learning, ◦ while it is decreased to 60×60 for the FVPN to fit smaller scales of the test image pyramid for efficiency in the inference phase. ◦ The pre-trained GoogLeNet model for 1000-class ImageNet classification is used to initialize all the convolutional layers in the ALN, while the FVPN is trained from scratch
  • 16. Deep Nets Training Two-Stage Deep Nets Inference ◦ Vehicle proposals inferred from the FVPN are taken as inputs into the ALN ◦ Although verifying each proposal and annotation of attributes are at the same stage, we assume that verification has a higher priority. ◦ For instance, in the inference phase, if a proposal is predicted as a positive vehicle, it will then be annotated with a bounding-box and inferred pose, color and type. However, a proposal predicted as the background will be neglected in spite of its inferred attributes ◦ Finally, we perform non-maximum suppression as in RCNN [13] to eliminate duplicate detections. The full inference scheme is demonstrated in Fig. 4
  • 18. Experiments and Results In this section, we evaluate our DAVE for detection and annotation of pose, color and type for each detected vehicle. Experiments are mainly divided as two parts: ◦ vehicle detection ◦ Attributes learning DAVE is implemented based on the deep learning framework Caffe [29] and run on a workstation configured with an NVIDIA TITAN X GPU.
  • 25. Conclusion In this paper, we developed a unified framework for fast vehicle detection and annotation: DAVE ◦ which consists of two convolutional neural networks FVPN and ALN ◦ The proposal and attributes learning networks predict bounding- boxes for vehicles and infer their attributes: pose, color and type, simultaneously ◦ Extensive experiments show that our method outperforms state-of-the-art frame- works on vehicle detection and is also effective for vehicle attributes annotation In our on-going work, ◦ we are integrating more vehicle attributes such as make and model into the ALN with high accuracy, and exploiting these attributes to investigate vehicle re-identification tasks.