SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Intelligence Machine Vision Lab
Strictly Confidential
2019 CVPR paper overview
SUALAB
Ho Seong Lee
2Type A-3
Contents
• CVPR 2019 Statistics
• 20 paper-one page summary
3Type A-3
CVPR 2019 Statistics
• What is CVPR?
• Conference on Computer Vision and Pattern Recognition (CVPR)
• CVPR was first held in 1983 and has been held annually
• CVPR 2019: June 16th – June 20th in Long Beach, CA
4Type A-3
CVPR 2019 Statistics
• CVPR 2019 statistics
• The total number of papers is increasing every year and this year has increased significantly!
• We can visualize main topic using title of paper and simple python script!
• https://github.com/hoya012/CVPR-Paper-Statistics
28.4% 30%
29.9%
29.6%
25.1%
5Type A-3
CVPR 2019 Statistics
2019 CVPR paper statistics
6Type A-3
CVPR 2018 Statistics
2018 CVPR paper statistics
Compared to 2018 Statistics..
7Type A-3
CVPR 2018 vs CVPR 2019 Statistics
• Most of the top keywords were maintained
• Image, detection, 3d, object, video, segmentation, adversarial, recognition, visual …
• “graph”, “cloud”, “representation” are about twice as frequent
• graph : 15 → 45
• representation: 25 → 48
• cloud: 16 → 35
8Type A-3
Before beginning..
• It does not mean that it is not an interesting article because it is not in the list.
• Since I mainly studied Computer Vision, most papers that I will discuss today are
Computer Vision papers..
• Topics not covered today
• Natural Language Processing
• Reinforcement Learning
• Robotics
• Etc..?
9Type A-3
1. Learning to Synthesize Motion Blur (oral)
• Synthesizing a motion blurred image from a pair of unblurred sequential images
• Motion blur is important in cinematography, and artful photo
• Generate a large-scale synthetic training dataset of motion blurred images
Recommended reference: “Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation”, 2018 CVPR
10Type A-3
2. Semantic Image Synthesis with Spatially-Adaptive Normalization (oral)
• Synthesizing photorealistic images given an input semantic layout
• Spatially-adaptive normalization can keep semantic information
• This model allows user control over both semantic and style as synthesizing images
Demo Code: https://github.com/NVlabs/SPADE
11Type A-3
3. SiCloPe: Silhouette-Based Clothed People (Oral)
• Reconstruct a complete and textured 3D model of a person from a single image
• Use 2D Silhouettes and 3D joints of a body pose to reconstruct 3D mesh
• An effective two-stage 3D shape reconstruction pipeline
• Predicting multi-view 2D silhouettes from single input segmentation
• Deep visual hull based mesh reconstruction technique
Recommended reference: “BodyNet: Volumetric Inference of 3D Human Body Shapes”, 2018 ECCV
12Type A-3
4. Im2Pencil: Controllable Pencil Illustration from Photographs
• Propose controllable photo-to-pencil translation method
• Modeling pencil outline(rough, clean), pencil shading(4 types)
• Create training data pairs from online websites(e.g., Pinterest) and use image filtering techniques
Demo Code: https://github.com/Yijunmaverick/Im2Pencil
13Type A-3
5. End-to-End Time-Lapse Video Synthesis from a Single Outdoor Image
• End-to-end solution to synthesize a time-lapse video from single image
• Use time-lapse videos and image sequences during training
• Use only single image during inference
Input image(single)
14Type A-3
6. StoryGAN: A Sequential Conditional GAN for Story Visualization
• Propose a new task called Story Visualization using GAN
• Sequential conditional GAN based StoryGAN
• Story Encoder – stochastic mapping from story to an low-dimensional embedding vector
• Context Encoder – capture contextual information during sequential image generation
• Two Discriminator – Image Discriminator & Story Discriminator
Context Encoder
15Type A-3
7. Image Super-Resolution by Neural Texture Transfer (oral)
• Improve “RefSR” even when irrelevant reference images are provided
• Traditional Single Image Super-Resolution is extremely challenging (ill-posed problem)
• Reference-based(RefSR) utilizes rich texture from HR references .. but.. only similar Ref images
• Adaptively transferring the texture from Ref Images according to their texture similarity
Recommended reference: “CrossNet: An end-to-end reference-based super resolution network using cross-scale warping”, 2018 ECCV
Similar
Different
16Type A-3
8. DVC: An End-to-end Deep Video Compression Framework (oral)
• Propose the first end-to-end video compression deep model
• Conventional video compression use predictive coding architecture and encode corresponding
motion information and residual information
• Taking advantage of both classical compression and neural network
• Use learning based optical flow estimation
17Type A-3
9. Defense Against Adversarial Images using Web-Scale Nearest-Neighbor Search (oral)
• Defense adversarial attack using Big-data and image manifold
• Assume that adversarial attack move the image away from the image manifold
• A successful defense mechanism should aim to project the images back on the image manifold
• For tens of billions of images, search a nearest-neighbor images (K=50) and use them
• Also propose two novel attack methods to break nearest neighbor defenses
18Type A-3
10. Bag of Tricks for Image Classification with Convolutional Neural Networks
• Examine a collection of some refinements and empirically evaluate their impact
• Improve ResNet-50’s accuracy from 75.3% to 79.29% on ImageNet with some refinements
• Efficient Training
• FP32 with BS=256 → FP16 with BS=1024 with some techniques
• Training Refinements:
• Cosine Learning Rate Decay / Label Smoothing / Knowledge Distillation / Mixup Training
• Transfer from classification to Object Detection, Semantic Segmentation
Linear scaling LR
LR warmup
Zero γ initialization in BN
No bias decay
Result of Efficient Training
Result of Training refinements
ResNet tweaks
19Type A-3
11. Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
• Automatically learn the group structure in training stage with end-to-end manner
• Outperform standard group convolution
• Propose an efficient strategy for index re-ordering
20Type A-3
12. ScratchDet:Exploring to Train Single-Shot Object Detectors from Scratch (oral)
• Explore to train object detectors from scratch robustly
• Almost SOTA detectors are fine-tuned from pretrained CNN (e.g., ImageNet)
• The classification and detection have different degrees of sensitivity to translation
• The architecture is limited by the classification network(backbone) → inconvenience!
• Find that one of the overlooked points is BatchNorm!
Recommended reference: “DSOD: Learning Deeply Supervised Object Detectors from Scratch”, 2017 ICCV
21Type A-3
13. Precise Detection in Densely Packed Scenes
• Propose precise detection in densely packed scenes
• In real-world, there are many applications of object detection (ex, detection and count # of object)
• In densely packed scenes, SOTA detector can’t detect accurately
(1) layer for estimating the Jaccard index (2) a novel EM merging unit (3) release SKU-110K dataset
22Type A-3
14. SIXray: A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images
• Present a large-scale dataset and establish a baseline for security inspection X-ray
• Total 1,059,231 X-ray images in which 6 classes of 8,929 prohibited items
• Propose an approach named class-balanced hierarchical refinement(CHR) and class-balanced loss
function
23Type A-3
15. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regressio
• Address the weaknesses of IoU and introduce generalized version(GIoU)
• Intersection over Union(IoU) is the most popular evaluation metric used in object detection
• But, there is a gap between optimizing distance losses and maximizing IoU
• Introducing generalized IoU as both a new loss and a new metric
24Type A-3
16. Bounding Box Regression with Uncertainty for Accurate Object Detection
• Propose novel bounding box regression loss with uncertainty
• Most of datasets have ambiguities and labeling noise of bounding box coordinate
• Network can learns to predict localization variance for each coordinate
25Type A-3
17. UPSNet: A Unified Panoptic Segmentation Network (oral)
• Propose a unified panoptic segmentation network(UPSNet)
• Semantic segmentation + Instance segmentation = panoptic segmentation
• Semantic Head + Instance Head + Panoptic head → end-to-end manner
Recommended reference: “Panoptic Segmentation”, 2018 arXiv
Countable objects → things
Uncountable objects → stuff
Deformable Conv Mask R-CNN Parameter-free
26Type A-3
18. SFNet: Learning Object-aware Semantic Correspondence (Oral)
• Propose SFNet for semantic correspondence problem
• Propose to use images annotated with binary foreground masks and synthetic geometric
deformations during training
• Manually selecting point correspondences is so expensive!!
• Outperform SOTA on standard benchmarks by a significant margin
27Type A-3
19. Fast Interactive Object Annotation with Curve-GC
• Propose end-to-end fast interactive object annotation tool (Curve-GCN)
• Predict all vertices simultaneously using a Graph Convolutional Network, (→ Polygon-RNN X)
• Human annotator can correct any wrong point and only the neighboring points are affected
Recommended reference: “Efficient interactive annotation of segmentation datasets with polygon-rnn++ ”, 2018 CVPR
Code: https://github.com/fidler-lab/curve-gcn
Correction!
28Type A-3
20. FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference
• Propose image-level WSSS method using stochastic inference (dropout)
• Localization maps(CAM) only focus on the small parts of objects → Problem
• FickleNet allows a single network to generate multiple CAM from a single image
• Does not require any additional training steps and only adds a simple layer
Full Stochastic
Both Training
and Inference
29Type A-3
Related Post..
• In my personal blog, there are similar works
• SIGGRAPH 2018
• NeurIPS 2018
• ICLR 2019
https://hoya012.github.io/
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

初めてのグラフカット
初めてのグラフカット初めてのグラフカット
初めてのグラフカット
Tsubasa Hirakawa
 

Was ist angesagt? (20)

[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation
 
モデル高速化百選
モデル高速化百選モデル高速化百選
モデル高速化百選
 
モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019
 
初めてのグラフカット
初めてのグラフカット初めてのグラフカット
初めてのグラフカット
 
ConvNetの歴史とResNet亜種、ベストプラクティス
ConvNetの歴史とResNet亜種、ベストプラクティスConvNetの歴史とResNet亜種、ベストプラクティス
ConvNetの歴史とResNet亜種、ベストプラクティス
 
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
 
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
SSII2020SS: グラフデータでも深層学習 〜 Graph Neural Networks 入門 〜
 
機械学習モデルのハイパパラメータ最適化
機械学習モデルのハイパパラメータ最適化機械学習モデルのハイパパラメータ最適化
機械学習モデルのハイパパラメータ最適化
 
SSII2020 [O3-01] Extreme 3D センシング
SSII2020 [O3-01]  Extreme 3D センシングSSII2020 [O3-01]  Extreme 3D センシング
SSII2020 [O3-01] Extreme 3D センシング
 
ICCV 2019 論文紹介 (26 papers)
ICCV 2019 論文紹介 (26 papers)ICCV 2019 論文紹介 (26 papers)
ICCV 2019 論文紹介 (26 papers)
 
ResNetの仕組み
ResNetの仕組みResNetの仕組み
ResNetの仕組み
 
【DL輪読会】An Image is Worth One Word: Personalizing Text-to-Image Generation usi...
【DL輪読会】An Image is Worth One Word: Personalizing Text-to-Image Generation usi...【DL輪読会】An Image is Worth One Word: Personalizing Text-to-Image Generation usi...
【DL輪読会】An Image is Worth One Word: Personalizing Text-to-Image Generation usi...
 
Anomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめたAnomaly detection 系の論文を一言でまとめた
Anomaly detection 系の論文を一言でまとめた
 
Domain Adaptation 発展と動向まとめ(サーベイ資料)
Domain Adaptation 発展と動向まとめ(サーベイ資料)Domain Adaptation 発展と動向まとめ(サーベイ資料)
Domain Adaptation 発展と動向まとめ(サーベイ資料)
 
[DL輪読会]GQNと関連研究,世界モデルとの関係について
[DL輪読会]GQNと関連研究,世界モデルとの関係について[DL輪読会]GQNと関連研究,世界モデルとの関係について
[DL輪読会]GQNと関連研究,世界モデルとの関係について
 
SSII2021 [SS1] Transformer x Computer Visionの 実活用可能性と展望 〜 TransformerのCompute...
SSII2021 [SS1] Transformer x Computer Visionの 実活用可能性と展望 〜 TransformerのCompute...SSII2021 [SS1] Transformer x Computer Visionの 実活用可能性と展望 〜 TransformerのCompute...
SSII2021 [SS1] Transformer x Computer Visionの 実活用可能性と展望 〜 TransformerのCompute...
 
最近のSingle Shot系の物体検出のアーキテクチャまとめ
最近のSingle Shot系の物体検出のアーキテクチャまとめ最近のSingle Shot系の物体検出のアーキテクチャまとめ
最近のSingle Shot系の物体検出のアーキテクチャまとめ
 
High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文
High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文
High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文
 
SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII2022 [OS3-02] Federated Learningの基礎と応用SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII2022 [OS3-02] Federated Learningの基礎と応用
 
[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision
 

Ähnlich wie 2019 cvpr paper_overview

TechnicalBackgroundOverview
TechnicalBackgroundOverviewTechnicalBackgroundOverview
TechnicalBackgroundOverview
Motaz El-Saban
 
Neural Networks for Machine Learning and Deep Learning
Neural Networks for Machine Learning and Deep LearningNeural Networks for Machine Learning and Deep Learning
Neural Networks for Machine Learning and Deep Learning
comifa7406
 

Ähnlich wie 2019 cvpr paper_overview (20)

TechnicalBackgroundOverview
TechnicalBackgroundOverviewTechnicalBackgroundOverview
TechnicalBackgroundOverview
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
Neural Networks for Machine Learning and Deep Learning
Neural Networks for Machine Learning and Deep LearningNeural Networks for Machine Learning and Deep Learning
Neural Networks for Machine Learning and Deep Learning
 
Moving object detection in complex scene
Moving object detection in complex sceneMoving object detection in complex scene
Moving object detection in complex scene
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUS
 
PointNet
PointNetPointNet
PointNet
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 

Mehr von LEE HOSEONG

Mehr von LEE HOSEONG (20)

Unsupervised anomaly detection using style distillation
Unsupervised anomaly detection using style distillationUnsupervised anomaly detection using style distillation
Unsupervised anomaly detection using style distillation
 
do adversarially robust image net models transfer better
do adversarially robust image net models transfer betterdo adversarially robust image net models transfer better
do adversarially robust image net models transfer better
 
CNN Architecture A to Z
CNN Architecture A to ZCNN Architecture A to Z
CNN Architecture A to Z
 
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classification
 
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen..."The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
 
Mixed Precision Training Review
Mixed Precision Training ReviewMixed Precision Training Review
Mixed Precision Training Review
 
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly DetectionMVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
FixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceFixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidence
 
"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review
 
Unsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-SupervisionUnsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-Supervision
 
Human uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 ReviewHuman uncertainty makes classification more robust, ICCV 2019 Review
Human uncertainty makes classification more robust, ICCV 2019 Review
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
2019 ICLR Best Paper Review
2019 ICLR Best Paper Review2019 ICLR Best Paper Review
2019 ICLR Best Paper Review
 
"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review"Google Vizier: A Service for Black-Box Optimization" Paper Review
"Google Vizier: A Service for Black-Box Optimization" Paper Review
 
"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review"Searching for Activation Functions" Paper Review
"Searching for Activation Functions" Paper Review
 
"Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re..."Learning transferable architectures for scalable image recognition" Paper Re...
"Learning transferable architectures for scalable image recognition" Paper Re...
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review"Dataset and metrics for predicting local visible differences" Paper Review
"Dataset and metrics for predicting local visible differences" Paper Review
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

2019 cvpr paper_overview

  • 1. Intelligence Machine Vision Lab Strictly Confidential 2019 CVPR paper overview SUALAB Ho Seong Lee
  • 2. 2Type A-3 Contents • CVPR 2019 Statistics • 20 paper-one page summary
  • 3. 3Type A-3 CVPR 2019 Statistics • What is CVPR? • Conference on Computer Vision and Pattern Recognition (CVPR) • CVPR was first held in 1983 and has been held annually • CVPR 2019: June 16th – June 20th in Long Beach, CA
  • 4. 4Type A-3 CVPR 2019 Statistics • CVPR 2019 statistics • The total number of papers is increasing every year and this year has increased significantly! • We can visualize main topic using title of paper and simple python script! • https://github.com/hoya012/CVPR-Paper-Statistics 28.4% 30% 29.9% 29.6% 25.1%
  • 5. 5Type A-3 CVPR 2019 Statistics 2019 CVPR paper statistics
  • 6. 6Type A-3 CVPR 2018 Statistics 2018 CVPR paper statistics Compared to 2018 Statistics..
  • 7. 7Type A-3 CVPR 2018 vs CVPR 2019 Statistics • Most of the top keywords were maintained • Image, detection, 3d, object, video, segmentation, adversarial, recognition, visual … • “graph”, “cloud”, “representation” are about twice as frequent • graph : 15 → 45 • representation: 25 → 48 • cloud: 16 → 35
  • 8. 8Type A-3 Before beginning.. • It does not mean that it is not an interesting article because it is not in the list. • Since I mainly studied Computer Vision, most papers that I will discuss today are Computer Vision papers.. • Topics not covered today • Natural Language Processing • Reinforcement Learning • Robotics • Etc..?
  • 9. 9Type A-3 1. Learning to Synthesize Motion Blur (oral) • Synthesizing a motion blurred image from a pair of unblurred sequential images • Motion blur is important in cinematography, and artful photo • Generate a large-scale synthetic training dataset of motion blurred images Recommended reference: “Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation”, 2018 CVPR
  • 10. 10Type A-3 2. Semantic Image Synthesis with Spatially-Adaptive Normalization (oral) • Synthesizing photorealistic images given an input semantic layout • Spatially-adaptive normalization can keep semantic information • This model allows user control over both semantic and style as synthesizing images Demo Code: https://github.com/NVlabs/SPADE
  • 11. 11Type A-3 3. SiCloPe: Silhouette-Based Clothed People (Oral) • Reconstruct a complete and textured 3D model of a person from a single image • Use 2D Silhouettes and 3D joints of a body pose to reconstruct 3D mesh • An effective two-stage 3D shape reconstruction pipeline • Predicting multi-view 2D silhouettes from single input segmentation • Deep visual hull based mesh reconstruction technique Recommended reference: “BodyNet: Volumetric Inference of 3D Human Body Shapes”, 2018 ECCV
  • 12. 12Type A-3 4. Im2Pencil: Controllable Pencil Illustration from Photographs • Propose controllable photo-to-pencil translation method • Modeling pencil outline(rough, clean), pencil shading(4 types) • Create training data pairs from online websites(e.g., Pinterest) and use image filtering techniques Demo Code: https://github.com/Yijunmaverick/Im2Pencil
  • 13. 13Type A-3 5. End-to-End Time-Lapse Video Synthesis from a Single Outdoor Image • End-to-end solution to synthesize a time-lapse video from single image • Use time-lapse videos and image sequences during training • Use only single image during inference Input image(single)
  • 14. 14Type A-3 6. StoryGAN: A Sequential Conditional GAN for Story Visualization • Propose a new task called Story Visualization using GAN • Sequential conditional GAN based StoryGAN • Story Encoder – stochastic mapping from story to an low-dimensional embedding vector • Context Encoder – capture contextual information during sequential image generation • Two Discriminator – Image Discriminator & Story Discriminator Context Encoder
  • 15. 15Type A-3 7. Image Super-Resolution by Neural Texture Transfer (oral) • Improve “RefSR” even when irrelevant reference images are provided • Traditional Single Image Super-Resolution is extremely challenging (ill-posed problem) • Reference-based(RefSR) utilizes rich texture from HR references .. but.. only similar Ref images • Adaptively transferring the texture from Ref Images according to their texture similarity Recommended reference: “CrossNet: An end-to-end reference-based super resolution network using cross-scale warping”, 2018 ECCV Similar Different
  • 16. 16Type A-3 8. DVC: An End-to-end Deep Video Compression Framework (oral) • Propose the first end-to-end video compression deep model • Conventional video compression use predictive coding architecture and encode corresponding motion information and residual information • Taking advantage of both classical compression and neural network • Use learning based optical flow estimation
  • 17. 17Type A-3 9. Defense Against Adversarial Images using Web-Scale Nearest-Neighbor Search (oral) • Defense adversarial attack using Big-data and image manifold • Assume that adversarial attack move the image away from the image manifold • A successful defense mechanism should aim to project the images back on the image manifold • For tens of billions of images, search a nearest-neighbor images (K=50) and use them • Also propose two novel attack methods to break nearest neighbor defenses
  • 18. 18Type A-3 10. Bag of Tricks for Image Classification with Convolutional Neural Networks • Examine a collection of some refinements and empirically evaluate their impact • Improve ResNet-50’s accuracy from 75.3% to 79.29% on ImageNet with some refinements • Efficient Training • FP32 with BS=256 → FP16 with BS=1024 with some techniques • Training Refinements: • Cosine Learning Rate Decay / Label Smoothing / Knowledge Distillation / Mixup Training • Transfer from classification to Object Detection, Semantic Segmentation Linear scaling LR LR warmup Zero γ initialization in BN No bias decay Result of Efficient Training Result of Training refinements ResNet tweaks
  • 19. 19Type A-3 11. Fully Learnable Group Convolution for Acceleration of Deep Neural Networks • Automatically learn the group structure in training stage with end-to-end manner • Outperform standard group convolution • Propose an efficient strategy for index re-ordering
  • 20. 20Type A-3 12. ScratchDet:Exploring to Train Single-Shot Object Detectors from Scratch (oral) • Explore to train object detectors from scratch robustly • Almost SOTA detectors are fine-tuned from pretrained CNN (e.g., ImageNet) • The classification and detection have different degrees of sensitivity to translation • The architecture is limited by the classification network(backbone) → inconvenience! • Find that one of the overlooked points is BatchNorm! Recommended reference: “DSOD: Learning Deeply Supervised Object Detectors from Scratch”, 2017 ICCV
  • 21. 21Type A-3 13. Precise Detection in Densely Packed Scenes • Propose precise detection in densely packed scenes • In real-world, there are many applications of object detection (ex, detection and count # of object) • In densely packed scenes, SOTA detector can’t detect accurately (1) layer for estimating the Jaccard index (2) a novel EM merging unit (3) release SKU-110K dataset
  • 22. 22Type A-3 14. SIXray: A Large-scale Security Inspection X-ray Benchmark for Prohibited Item Discovery in Overlapping Images • Present a large-scale dataset and establish a baseline for security inspection X-ray • Total 1,059,231 X-ray images in which 6 classes of 8,929 prohibited items • Propose an approach named class-balanced hierarchical refinement(CHR) and class-balanced loss function
  • 23. 23Type A-3 15. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regressio • Address the weaknesses of IoU and introduce generalized version(GIoU) • Intersection over Union(IoU) is the most popular evaluation metric used in object detection • But, there is a gap between optimizing distance losses and maximizing IoU • Introducing generalized IoU as both a new loss and a new metric
  • 24. 24Type A-3 16. Bounding Box Regression with Uncertainty for Accurate Object Detection • Propose novel bounding box regression loss with uncertainty • Most of datasets have ambiguities and labeling noise of bounding box coordinate • Network can learns to predict localization variance for each coordinate
  • 25. 25Type A-3 17. UPSNet: A Unified Panoptic Segmentation Network (oral) • Propose a unified panoptic segmentation network(UPSNet) • Semantic segmentation + Instance segmentation = panoptic segmentation • Semantic Head + Instance Head + Panoptic head → end-to-end manner Recommended reference: “Panoptic Segmentation”, 2018 arXiv Countable objects → things Uncountable objects → stuff Deformable Conv Mask R-CNN Parameter-free
  • 26. 26Type A-3 18. SFNet: Learning Object-aware Semantic Correspondence (Oral) • Propose SFNet for semantic correspondence problem • Propose to use images annotated with binary foreground masks and synthetic geometric deformations during training • Manually selecting point correspondences is so expensive!! • Outperform SOTA on standard benchmarks by a significant margin
  • 27. 27Type A-3 19. Fast Interactive Object Annotation with Curve-GC • Propose end-to-end fast interactive object annotation tool (Curve-GCN) • Predict all vertices simultaneously using a Graph Convolutional Network, (→ Polygon-RNN X) • Human annotator can correct any wrong point and only the neighboring points are affected Recommended reference: “Efficient interactive annotation of segmentation datasets with polygon-rnn++ ”, 2018 CVPR Code: https://github.com/fidler-lab/curve-gcn Correction!
  • 28. 28Type A-3 20. FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference • Propose image-level WSSS method using stochastic inference (dropout) • Localization maps(CAM) only focus on the small parts of objects → Problem • FickleNet allows a single network to generate multiple CAM from a single image • Does not require any additional training steps and only adds a simple layer Full Stochastic Both Training and Inference
  • 29. 29Type A-3 Related Post.. • In my personal blog, there are similar works • SIGGRAPH 2018 • NeurIPS 2018 • ICLR 2019 https://hoya012.github.io/