Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Can We Make Maps from Videos?
~From AI Algorithm to Engineering for Continuous Improvement~
Kazuyuki Miyazawa Kosuke Kuzuo...
Agenda
Background
Computer Vision Technologies for Video-Based Map
Creation/Maintenance
Engineering for Continuous Improve...
Who am I?
Kazuyuki Miyazawa
Work Experience
• April 2019 - Present
AI Research Engineer @DeNA Co., Ltd.
• April 2010 - Mar...
Background
•Maps are an essential ingredient for every mobility service
•Higher & higher map quality is in demand to enabl...
Problems for Current Map Creation/Maintenance
•Manual processes are labor-intensive and time-consuming
•Using a special me...
What Can DeNA Do About It?
•Dashcams are becoming popular, and can capture a lot of useful information for maps
•Current A...
What Do We Need to Do?
MapImage
©️OpenStreetMap contributors
https://en.wikipedia.org/wiki/Geographic_coordinate_system
Wa...
What Do We Need to Do?
Map
x
y
z
©️OpenStreetMap contributors
https://en.wikipedia.org/wiki/Geographic_coordinate_system
N...
How Do We Know the 3D Position from a 2D
Image?
?
?
?
From a single 2D image, we cannot
decide the 3D position of the obje...
How Do We Know the 3D Position from 2D Images?
If we have two (or more) views, we can
decide the 3D object position as the...
Dashcam Video = Multi-View Images
time: t1
time: t2
time: t3
Dashcam video can be seen as a set of
multi-view images becau...
Dashcam Video = Multi-View Images
time: t1
time: t2
time: t3
Dashcam video can be seen as a set of
multi-view images becau...
Camera Pose Estimation from Video
•SfM*1 or Visual SLAM*2 is used as a core technology
•Estimate the camera poses by track...
Coordinate Conversion
Map
x
y
z
©️OpenStreetMap contributors
https://en.wikipedia.org/wiki/Geographic_coordinate_system
Co...
Dataset Creation for Accuracy Evaluation
•Built our own dataset of dashcam videos and corresponding highly accurate 3D dat...
Sample Results
Dashcam Video Estimated Position
Estimated camera positions
Estimated object position
Ground-truth object p...
Sample Results
Dashcam Video Estimated Position
Estimated camera positions
Estimated object position
Ground-truth object p...
Results Summary
0 0.5 1.0 1.5 2.0 2.5
Error [m]
Frequency
Average Error: 0.74m
Average error of object position estimation...
Wait, How Do You Find Objects in Images?
MapImage
©️OpenStreetMap contributors
https://en.wikipedia.org/wiki/Geographic_co...
Of Course, Deep Learning!
R-FCN: Object Detection via Region-based Fully ConvolutionalNetworks
https://arxiv.org/pdf/1605....
Traffic Light/Sign Detection using CNN
• Use Faster R-CNN to detect traffic lights/signs in each frame of dashcam videos
•...
https://youtu.be/7iZmOIN0wwI
Traffic Signal/Sign Detection Result
Q. Is It Easy to Achieve This?
Q. Is It Easy to Achieve This? A. NO!
Data
Preparation
Model
Training
Parameter
Tuning
Model
Verification
Deploy
Monitorin...
Q. Is It Easy to Achieve This? A. NO!
Data
Preparation
Model
Training
Parameter
Tuning
Model
Verification
Deploy
Monitorin...
Who am I?
Profile
• Kosuke Kuzuoka (23)
• Love Tesla, Elon Musk and cats
Experience
• February 2020 - Present
Software Eng...
Brief Intro to Object Detection
• An active research area among
computer vision community
• Task is detecting objects
(lik...
Photo by Paul Hanaoka on Unsplash
A cat is detected as a cat,
hence it’s a true positive.
Wrongly detected as cats,
hence ...
Problems in Development Processes
1. Train, validate and test models (computationally expensive)
2. Evaluate, visualize an...
Problems in Development Processes
1. Train, validate and test models (computationally expensive)
2. Evaluate, visualize an...
Problems in Development Processes
1. Train, validate and test models (computationally expensive)
2. Evaluate, visualize an...
Problems in Development Processes
1. Train, validate and test models (computationally expensive)
2. Evaluate, visualize an...
Problems in Development Processes
1. Train, validate and test models (computationally expensive)
2. Evaluate, visualize an...
Some of Problems are:
• Error-prone process (misspelling commands, etc.)
• Going back and forth between EC2 instances…
• I...
Solutions!
• Work harder and harder...
• Automating tasks via workflow engine
• Flexible internal tool to evaluate,
visual...
Solutions!
• Work harder and harder...
• Automating tasks via workflow engine
• Flexible internal tool to evaluate,
visual...
What We Wanted...
• A system that automatically evaluates,
visualizes and analyzes models and datasets.
• A tool that lets...
• Easy to develop
• Easy to collaborate
• Good performance
• AI engineer friendly
(Python… )
Yet, We Want It to Be:
Going Serverless!
• Easy to deploy and maintain
• Collaborations made easy
• Cost effective, yet performant
• You can use Python
Image sourc...
Serverless Computing
• No need to manage servers,
cloud providers do it for you!
• Consists of small deployable
unit of fu...
• No need to manage servers,
cloud providers do it for you!
• Consists of small deployable
unit of functions
• Scales as y...
• No need to manage servers,
cloud providers do it for you!
• Consists of small deployable
unit of functions
• Scales as y...
• No need to manage servers,
cloud providers do it for you!
• Consists of small deployable
unit of functions
• Scales as y...
Serverless Computing
• No need to manage servers,
cloud providers do it for you!
• Consists of small deployable
unit of fu...
Introducing Kaiseki-Kun
Kaiseki-Kun Architecture
1. Prediction JSON from GPU instance
2. Evaluation begins, store results
3. Users can see results...
Kaiseki-Kun Architecture
1. Prediction JSON from GPU instance
2. Evaluation begins, store results
3. Users can see results...
1. Prediction JSON from GPU instance
2. Evaluation begins, store results
3. Users can see results &
run evaluations
Kaisek...
1. Prediction JSON from GPU instance
2. Evaluation begins, store results
3. Users can see results &
run evaluations
Kaisek...
Kaiseki-Kun Tech Stack
• Backend app made of
100% serverless
• Front app made of
React app
Kaiseki-Kun Tech Stack
• Backend app made of
100% serverless app
• Front app made of
React app
53
54
Hmm, there is a FN in Red Box.
What if we adjust the threshold?
55
Ta-da! Perhaps, the model wasn’t
confident enough?
Model is missing lots of small
objects. We need more data!
Evaluation with different config
is as easy as pushing a button
More Functionalities On Its Way...
• Model version control
• Dataset analysis and version control
• Automating training an...
Summing It Up
• Speed is important. You don’t want to
spend too much time on an internal tool
• Collaboration should be ea...
Wrap Up
AI Technologies for Map Creation/Maintenance
• Dashcam videos contain a lot of useful information for maps
• Devel...
Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuous Improvement~【DeNA TechCon 2020 ライブ配信】
Nächste SlideShare
Wird geladen in …5
×

Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuous Improvement~【DeNA TechCon 2020 ライブ配信】

377 Aufrufe

Veröffentlicht am

ドライブレコーダ映像を AI で解析することで地図の作成やメンテナンスを低コスト化するための取り組みについて発表します。
コアとなるコンピュータビジョン技術の他、効率的な開発を支える DeNAならでのエンジニアリング事例もご紹介します。

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuous Improvement~【DeNA TechCon 2020 ライブ配信】

  1. 1. Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuous Improvement~ Kazuyuki Miyazawa Kosuke Kuzuoka
  2. 2. Agenda Background Computer Vision Technologies for Video-Based Map Creation/Maintenance Engineering for Continuous Improvement 1 2 3 Wrap Up4
  3. 3. Who am I? Kazuyuki Miyazawa Work Experience • April 2019 - Present AI Research Engineer @DeNA Co., Ltd. • April 2010 - March 2019 Research Scientist @Mitsubishi Electric Corp. Education • PhD in Information Science @Tohoku Univ. @kzykmyzw
  4. 4. Background •Maps are an essential ingredient for every mobility service •Higher & higher map quality is in demand to enable advanced services (e.g., autonomous vehicle) -1980s 1980s-20XXs 20XXs-
  5. 5. Problems for Current Map Creation/Maintenance •Manual processes are labor-intensive and time-consuming •Using a special measurement system (e.g., mobile mapping system) is costly and difficult to scale to achieve high coverage for various types of mobility services https://www.infradoctor.jp/details/detail20190313.pdf https://www.google.com/streetview/explore/
  6. 6. What Can DeNA Do About It? •Dashcams are becoming popular, and can capture a lot of useful information for maps •Current AI shows an amazing performance for image/video analysis •We are developing low-cost and rapid map creation (or maintenance) technology using dashcam videos collected via cloud servers 2014 2015 2016 2017 2018 160 120 80 40 0 Dashcam sales volume (Japan)(million units) GfKジャパン, “2018年ドライブレコーダーの販売動向,” 2019 https://www.gfk.com/fileadmin/user_upload/dyna_content/JP/20190328_drivinngrecorders.pdf
  7. 7. What Do We Need to Do? MapImage ©️OpenStreetMap contributors https://en.wikipedia.org/wiki/Geographic_coordinate_system Want to place the newly found object on the map
  8. 8. What Do We Need to Do? Map x y z ©️OpenStreetMap contributors https://en.wikipedia.org/wiki/Geographic_coordinate_system Need to know the 3D position of the object! Image (x, y, z)
  9. 9. How Do We Know the 3D Position from a 2D Image? ? ? ? From a single 2D image, we cannot decide the 3D position of the object
  10. 10. How Do We Know the 3D Position from 2D Images? If we have two (or more) views, we can decide the 3D object position as the intersection of camera rays
  11. 11. Dashcam Video = Multi-View Images time: t1 time: t2 time: t3 Dashcam video can be seen as a set of multi-view images because the vehicle moves while capturing
  12. 12. Dashcam Video = Multi-View Images time: t1 time: t2 time: t3 Dashcam video can be seen as a set of multi-view images because the vehicle moves while capturing Camera pose for each frame is necessary to calculate the 3D object position
  13. 13. Camera Pose Estimation from Video •SfM*1 or Visual SLAM*2 is used as a core technology •Estimate the camera poses by tracking salient points in the video *1 Structure from Motion *2 Simultaneous Localization And Mapping
  14. 14. Coordinate Conversion Map x y z ©️OpenStreetMap contributors https://en.wikipedia.org/wiki/Geographic_coordinate_system Convert the estimated object position to the geospatial coordinate system using the GNSS signal received by dashcam (x, y, z) Image (lat, lon, alt) GNSS
  15. 15. Dataset Creation for Accuracy Evaluation •Built our own dataset of dashcam videos and corresponding highly accurate 3D data as ground truth for evaluation purposes •Manually annotated various objects (e.g., traffic signs, lanes, etc.) Videos from Dashcams 3D Point Clouds from LiDAR
  16. 16. Sample Results Dashcam Video Estimated Position Estimated camera positions Estimated object position Ground-truth object position Error: 0.20m
  17. 17. Sample Results Dashcam Video Estimated Position Estimated camera positions Estimated object position Ground-truth object position Error: 1.2m
  18. 18. Results Summary 0 0.5 1.0 1.5 2.0 2.5 Error [m] Frequency Average Error: 0.74m Average error of object position estimation is below 1m!
  19. 19. Wait, How Do You Find Objects in Images? MapImage ©️OpenStreetMap contributors https://en.wikipedia.org/wiki/Geographic_coordinate_system Want to place the newly found object on the map
  20. 20. Of Course, Deep Learning! R-FCN: Object Detection via Region-based Fully ConvolutionalNetworks https://arxiv.org/pdf/1605.06409v2.pdf OpenPose: RealtimeMulti-Person 2D Pose Estimation using Part AffinityFields https://arxiv.org/pdf/1812.08008.pdf Panoptic Segmentation https://arxiv.org/pdf/1801.00868.pdf
  21. 21. Traffic Light/Sign Detection using CNN • Use Faster R-CNN to detect traffic lights/signs in each frame of dashcam videos • Faster R-CNN is one of the most successful object detection methods proposed in 2016 • Main drawback is speed, but acceptable for off-line applications Classification Regression Traffic light Stop Speed limit No right turn Position … CNN Region Proposals
  22. 22. https://youtu.be/7iZmOIN0wwI Traffic Signal/Sign Detection Result
  23. 23. Q. Is It Easy to Achieve This?
  24. 24. Q. Is It Easy to Achieve This? A. NO! Data Preparation Model Training Parameter Tuning Model Verification Deploy Monitoring Data Analysis Model Development Need to iterate again and again
  25. 25. Q. Is It Easy to Achieve This? A. NO! Data Preparation Model Training Parameter Tuning Model Verification Deploy Monitoring Data Analysis Model Development Rapid iteration is the key
  26. 26. Who am I? Profile • Kosuke Kuzuoka (23) • Love Tesla, Elon Musk and cats Experience • February 2020 - Present Software Engineer, ML @Mercari, Inc. • June 2018 – February 2020 AI Research Engineer @DeNA Co., Ltd. • March 2017 – June 2018 R&D Manager @Photoruction, inc.
  27. 27. Brief Intro to Object Detection • An active research area among computer vision community • Task is detecting objects (like cats) in an image • Modern algorithms heavily rely on deep learning • Takes hours to train a model Photo by Paul Hanaoka on Unsplash
  28. 28. Photo by Paul Hanaoka on Unsplash A cat is detected as a cat, hence it’s a true positive. Wrongly detected as cats, hence they are false positives
  29. 29. Problems in Development Processes 1. Train, validate and test models (computationally expensive) 2. Evaluate, visualize and analyze models (time consuming) 3. Adjust hyper-param, then go back to 1
  30. 30. Problems in Development Processes 1. Train, validate and test models (computationally expensive) 2. Evaluate, visualize and analyze models (time consuming) 3. Adjust hyper-param, then go back to 1
  31. 31. Problems in Development Processes 1. Train, validate and test models (computationally expensive) 2. Evaluate, visualize and analyze models (time consuming) 3. Adjust hyper-param, then go back to 1
  32. 32. Problems in Development Processes 1. Train, validate and test models (computationally expensive) 2. Evaluate, visualize and analyze models (time consuming) 3. Adjust hyper-param, then go back to 1
  33. 33. Problems in Development Processes 1. Train, validate and test models (computationally expensive) 2. Evaluate, visualize and analyze models (time consuming) 3. Adjust hyper-param, then go back to 1 Not essential, yet very important...
  34. 34. Some of Problems are: • Error-prone process (misspelling commands, etc.) • Going back and forth between EC2 instances… • Inefficient process, like drawing boxes, uploading to third party app for visualization etc. • Researchers not being able to focus on essential work (developing models etc.)
  35. 35. Solutions! • Work harder and harder... • Automating tasks via workflow engine • Flexible internal tool to evaluate, visualize and analyze models
  36. 36. Solutions! • Work harder and harder... • Automating tasks via workflow engine • Flexible internal tool to evaluate, visualize and analyze models But I’m busy with AI dev...
  37. 37. What We Wanted... • A system that automatically evaluates, visualizes and analyzes models and datasets. • A tool that lets researchers focus on essential work (parameter tuning etc.) • User-friendly web app
  38. 38. • Easy to develop • Easy to collaborate • Good performance • AI engineer friendly (Python… ) Yet, We Want It to Be:
  39. 39. Going Serverless!
  40. 40. • Easy to deploy and maintain • Collaborations made easy • Cost effective, yet performant • You can use Python Image source: https://serverless.com/
  41. 41. Serverless Computing • No need to manage servers, cloud providers do it for you! • Consists of small deployable unit of functions • Scales as your app grows • No idle fee, pay as you go
  42. 42. • No need to manage servers, cloud providers do it for you! • Consists of small deployable unit of functions • Scales as your app grows • No idle fee, pay as you go Serverless Computing Image source: https://aws.amazon.com/
  43. 43. • No need to manage servers, cloud providers do it for you! • Consists of small deployable unit of functions • Scales as your app grows • No idle fee, pay as you go Serverless Computing
  44. 44. • No need to manage servers, cloud providers do it for you! • Consists of small deployable unit of functions • Scales as your app grows • No idle fee, pay as you go Serverless Computing
  45. 45. Serverless Computing • No need to manage servers, cloud providers do it for you! • Consists of small deployable unit of functions • Scales as your app grows • No idle fee, pay as you go
  46. 46. Introducing Kaiseki-Kun
  47. 47. Kaiseki-Kun Architecture 1. Prediction JSON from GPU instance 2. Evaluation begins, store results 3. Users can see results & run evaluations
  48. 48. Kaiseki-Kun Architecture 1. Prediction JSON from GPU instance 2. Evaluation begins, store results 3. Users can see results & run evaluations
  49. 49. 1. Prediction JSON from GPU instance 2. Evaluation begins, store results 3. Users can see results & run evaluations Kaiseki-Kun Architecture
  50. 50. 1. Prediction JSON from GPU instance 2. Evaluation begins, store results 3. Users can see results & run evaluations Kaiseki-Kun Architecture
  51. 51. Kaiseki-Kun Tech Stack • Backend app made of 100% serverless • Front app made of React app
  52. 52. Kaiseki-Kun Tech Stack • Backend app made of 100% serverless app • Front app made of React app
  53. 53. 53
  54. 54. 54 Hmm, there is a FN in Red Box. What if we adjust the threshold?
  55. 55. 55 Ta-da! Perhaps, the model wasn’t confident enough?
  56. 56. Model is missing lots of small objects. We need more data!
  57. 57. Evaluation with different config is as easy as pushing a button
  58. 58. More Functionalities On Its Way... • Model version control • Dataset analysis and version control • Automating training and testing
  59. 59. Summing It Up • Speed is important. You don’t want to spend too much time on an internal tool • Collaboration should be easy. Every engineer should be able to contribute • With little effort, researchers can focus on more essential work
  60. 60. Wrap Up AI Technologies for Map Creation/Maintenance • Dashcam videos contain a lot of useful information for maps • Develop computer vision technology to estimate objects’ positions • Experimental evaluation shows the estimation error is less than 1m Engineering for Continuous Improvement • Rapid development cycle is important • Serverless architecture is a cost-effective choice to develop and maintain support tools for continuous improvement of AI

×