SlideShare a Scribd company logo
1 of 60
YOLO
龙少毅
目录
先验知识
YOLO系列时间线
《You Only Look Once: Unified,
Real-Time Object Detection》
2015年
YOLOv1
YOLOv2
2016年
《YOLO9000: Better, Faster,
Stronger》
《YOLOv3: An Incremental
Improvement》
2018年
YOLOv3
YOLOv4
2020.04
Alexey Bochkovskiy
《YOLOv4: Optimal Speed and
Accuracy of Object Detection》
Ultralytics公司
2020.06
YOLOv5算法
YOLOv1
1)YOLO系列目标检测流程
YOLOv1
yolov1的检测流程
YOLOv1
网络结构图
YOLOv1
• Resize输入为448*448
• 数据增强
• 对原始图像进行20%的随机缩放和平移
• 在HSV颜色空间中调整曝光和饱和度,最多调整1.5倍
训练
YOLOv1
Loss 函数
判断第i个网格中的第j个bbox是否含有object:有则为1,没有为0
判断方法:与object的ground true的IOU最大的bbox负责预测该object
判断第i个网格中的第j个bbox是否含有object:没有为1,有为0
4:x,y,w,h
1个C:
20个类别pi:
YOLOv1
NMS后
NMS前
YOLOv2(YOLO9000)
YOLOv2
• Batch normalization
• High Resolution Classifier
• Convolutional With Anchor Boxes
• Dimension Clusters
• Direct location prediction
• New network:Darknet-19
• Fine-Grained Features
• Multi-state training
Batch normalization
• Batch Normalization可以提升模型收敛速度,而且可以起到一定正
则化效果,降低模型的过拟合。在YOLOv2中,每个卷积层后面都
添加了Batch Normalization层,并且不再使用droput。使用Batch
Normalization后,YOLOv2的mAP提升了2.4%。
High Resolution Classifier
• 在ImageNet分类数据集上预训练模型的主干(224*224)
• 使用448*448在ImageNet上fine-tune(10 epochs)
Convolutional With Anchor Boxes
• 去掉了全连接层
• 检测时将输入缩减为416*416
• YOLOv1只能预测7*7*2=98个目标框
• YOLOv2可以预测13*13*num_anchors个目标框
Dimension Clusters
• Anchor boxes
• 在VOC数据集和COCO数据集上聚类分析的结果:
Direct location prediction
• YOLOv1直接预测目标框的中心点和长宽:x,y,w,h
• YOLOv2预测先验框和目标框的偏移值:tx, ty, tw, th
Darknet-19
Fine-Grained Features(细粒度特征)
• 13*13对于检测大物体是足够的,但无法准确地检测小物体
• 通过passthrough层来获取更精细的特征图
• 26*26*512->13*13*2048
Multi-state Training
YOLOv2
• 训练时的三个阶段
YOLOv2
YOLOv3
YOLOv3
YOLOv3
YOLOv3
YOLOv3
YOLOv3
YOLOv3
YOLOv3
YOLOv3
YOLOv3
YOLOv3
YOLOv3
Loss 函数
YOLOv3
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
YOLOv4
yolov3 yolov4
YOLOv5
YOLOv5
请见下回分解
参考内容
• YOLOv1-v4论文
• https://github.com/ultralytics/yolov5
• https://blog.csdn.net/wfei101/article/details/79398563
• https://zhuanlan.zhihu.com/p/35325884
• https://github.com/bubbliiiing/yolov4-pytorch
• https://zhuanlan.zhihu.com/p/143747206
• https://zhuanlan.zhihu.com/p/172121380
• https://www.jiangdabai.com/

More Related Content

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
Simplilearn
 

Featured (20)

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
 

yolo introduction.pptx

Editor's Notes

  1. yolov1-yolov3都是同一个作者,原作者宣布不继续yolo系列后,俄罗斯的Alexey扛起了大旗
  2. 1)将输入图片resize变换到448*448的大小;2)通过卷积神经网络预测出可能存在物体的框;3)通过NMS滤掉多余的框得到检测结果
  3. yolov1将输入划分为7*7的格子,每个格子预测两个目标框,每个框的预测参数包含4个目标框(bounding box)的值和一个置信度,
  4. 最终的输出为7*7*30:将输入图像划分为7*7的区域,每个区域预测2个物体框,每个框包含5个参数,分别为框的4个位置信息和1个置信度,20个(VOC数据集)类别的预测概率。30=2*5+20
  5. 受faster R-CNN启发引入先验框(anchor boxes) 对边框的预测使用相对先验款的偏移值offsets
  6. 使用416*416的原因:这一步的目的是为了让后面产生的卷积特征图宽高都为奇数,这样就可以产生一个center cell 没有anchor boxes,模型recall为81%,mAP为69.5%;加入anchor boxes,模型recall为88%,mAP为69.2%
  7. 使用sigmoid函数来归一化,cx,cy表示第几个中心点的位置,如图中所示,此时cx=cy=1, pw和ph是先验框的宽度与长度
  8. 前面26 * 26 * 512的特征图使用按行和按列隔行采样的方法,就可以得到4个新的特征图,维度都是13 * 13 * 512,然后做concat操作,得到13 * 13 * 2048的特征图,将其拼接到后面的层,相当于做了一次特征融合,有利于检测小目标。
  9. 在测试时,YOLOv2可以采用不同大小的图片作为输入,在VOC 2007数据集上的效果如下图所示。可以看到采用较小分辨率时,YOLOv2的mAP值略低,但是速度更快,而采用高分辨输入时,mAP值更高,但是速度略有下降 mAP:对所有类的AP去平均 AP:
  10. Dropout:随机失活防止过拟合,每次都随机去除一半的神经元,大量应用在全连接网络里,但在卷积网络里效果不好
  11. Dropout效果不好的原因:卷积层对这种随机丢弃的点不敏感,因为在卷积神经网络中,常用的三连操作,卷积层+激活+池化层,池化层本身就是对相邻单元作用的,因此即使随机丢弃,通过三连操作,也可以从相邻的位置学习到相同的信息,所以dropout在卷积神经网络里效果并不好
  12. Dropblock研究者的启发来源于Cutout数据增强
  13. 注意:这里最大池化采用padding操作,移动的步长为1,比如13×13的输入特征图,使用5×5大小的池化核池化,padding=2,因此池化后的特征图仍然是13×13大小。
  14. YOLOv5有四种模型,Yolov5s、Yolov5m、Yolov5l、Yolov5x